SWIG's -builtin Option
Summary - Benchmarks of PyGAMMA Using SWIG's -builtin Option
SWIG 2.0.4 introduced a new -builtin
option that replaces pure Python class wrappers with Python objects implemented in C++. According to SWIG's documentation, it "is especially suitable for performance-critical libraries and applications that call wrapped methods repeatedly" (1). It "provides a significant performance improvement" (2). I performed benchmarks to see how SWIG's new built-in types affected some typical PyGAMMA scripts.
The option does indeed provide benefits, although they're modest. On my laptop, the performance gain was roughly 1 second per 100,000 PyGAMMA calls. Since there's no downside to using this flag, we'll enable it in our builds and recommend it, but keep expectations low.
Methods
I benchmarked two scripts on my laptop which is a MacBook Pro with two 2.53Ghz Core i5 processors and 8G of 1067Mhz DDR3 RAM. It was running OS X 10.6.8. During the benchmarks I disconnected from the Internet and ran only apps essential for benchmarking (terminal, text editor, Python).
PyGAMMA was compiled with gcc 4.2.1 using the current version ([browser:/trunk/platforms/OSX/Makefile?rev=377 r377]) of the Makefile with two changes. First, I changed the optimization from O3 to O2 because with O3 linking takes about 10 minutes on my machine and I'm impatient. Also, I obviously modified the SWIG options to include -builtin option when benchmarking that.
My Python is 32-bit Python 2.6.6 from Python.org.
I ran two scripts ([attachment:fid.py fid.py, attached to this page] from our GAMMA test suite, and [contrib:wiki:602e7450-8227-48eb-8252-38f8597d9342 basing3dj.py from the basing3dj sample]) three times for each scenario in the table below and took an average of the results. Averaging results wasn't necessary since the differences between results were nearly non-existent, but I did so anyway.
basing3dj.py
was chosen because it makes a lot of PyGAMMA calls, and the -builtin
option is supposed to make the Python/C++ transition more efficient. fid.py
was chosen for the same reason and the exact opposite quality. It makes very few PyGAMMA calls, and so shouldn't benefit from builtin types.
In order to count the number of PyGAMMA calls made by each script, I created a module (not included here) that wrapped all the PyGAMMA calls made by basing3dj.py
and fid.py
. This module merely counted the call and then called the appropriate PyGAMMA function. I didn't use that wrapper during benchmarking since it would have altered the runtimes, but it was useful for counting calls into PyGAMMA.
Although the purpose of this test was benchmarking, I did compare the output of the scripts using PyGAMMA with and without builtin types. The results were the same.
Results
basing3dj.py
This script made 379,484 calls into PyGAMMA.
-builtin? | Script | Metabolite | Runtime | |||||
basing3dj.py | gsh_test.sys | 7.90 | ||||||
YES | basing3dj.py | gsh_test.sys | 4.80 | |||||
basing3dj.py | creatine.sys | 126.67 | ||||||
YES | basing3dj.py | creatine.sys | 121.12 |
Using -builtin
was 7.90 - 4.80 = 3.10 seconds faster for gsh_test.sys
and 126.67 - 121.12 = 5.55 seconds faster for creatine.sys
.
fid.py
This script made 8 calls into PyGAMMA.
-builtin? | Script | Metabolite | Runtime | |||||
fid.py | phenylalanine.sys | 3.94 | ||||||
YES | fid.py | phenylalanine.sys | 3.95 | |||||
fid.py | phosphorylcholine2.sys | 52.78 | ||||||
YES | fid.py | phosphorylcholine2.sys | 52.78 |
Using -builtin
was .01 seconds slower for phenylalanine.sys
and no different for phosphorylcholine2.sys
.
Discussion of Results
The script which made a lot of calls into PyGAMMA (basing3dj.py
) ran faster when PyGAMMA was compiled with the -builtin
option, while the script which was computationally intense but made few calls into PyGAMMA (fid.py
) didn't benefit at all. This bears out the assertions that SWIG's -builtin
option (a) improves traffic flow across the Python/C++ border and (b) has no discernible effect otherwise.
If we divide the time that basing3dj.py
gained by using -builtin
(3.10 and 5.55 seconds) by the number of calls into PyGAMMA (379,484), we find that the improvement for each function call is ~8-14 microseconds. This implies that builtin types won't make much of a difference unless a script makes a lot of PyGAMMA calls.
To look at it another way, it's interesting to see how many PyGAMMA calls it will take to see 1 second of improvement. At a gain of 8 microseconds per call, it will take ~125,000 PyGAMMA calls. At a gain of 14 microseconds per call, it will take ~71,000 PyGAMMA calls to gain 1 second.