Cython seriously rocks, at least for much of what I need. It's still the killer feature of Python/Sage, IMHO. And meetings like EuroScipy last summer really confirmed that, where almost every other talk used Cython.
1. Check a bunch of conditions then do one single thing. 2. Check a bunch of conditions then do one single thing. ... 10^6. Check a bunch of conditions then do one single thing.Here's how compiled (C, Cython, etc.) can can be written:
1. Check some conditions (optional, but a good idea); 2. Do very unsafe stuff with no checks at all (but they in theory should be safe given 1). ... 10^6. Do very unsafe stuff with no checks at all (but they in theory should be safe given 1).The problem is that all the checks in step 1 (in either case) can easily take over 100 times as long as "do very unsafe stuff". TYPICAL EXAMPLE:
{{{id=1| def sum_sage(n): s = 1 for i in range(n): s += i return s /// }}} {{{id=5| timeit('sum_sage(100000r)') /// 5 loops, best of 3: 84.6 ms per loop }}} {{{id=2| %python def sum_python(n): s = 1 for i in range(n): s += i return s /// }}} {{{id=9| 84.6/5.88 /// 14.3877551020408 }}} {{{id=6| timeit('sum_python(100000r)') /// 125 loops, best of 3: 5.88 ms per loop }}} {{{id=3| %cython def sum_cython(int n): cdef int i, s = 1 for i in range(n): s += i return s /// }}} {{{id=4| timeit('sum_cython(100000r)') /// 625 loops, best of 3: 61.6 µs per loop }}} {{{id=8| 5.88 / 0.061 /// 96.3934426229508 }}} {{{id=12| 84.6 / 0.061 /// 1386.88524590164 }}} {{{id=11| sum_python(10^8) /// 4999999950000001 }}} {{{id=10| sum_cython(10^8) /// 887459713 }}}
Explain what's going in the above. How, e.g., in the first one (sum_sage), the program is doing a sort of monologue: "I have to add a Python int to a Sage int. I don't have any code to do that directly (that would get too complicated, and they are so big and complicated and different objects, and they might change, oh my). So I'll convert the Python int to Sage int, because that's the only conversion I know. OK, I do that via (it used to be base 10 string parsing!) some code Gonzalo Tornaria wrote that is scary complicated... and once that is done, I got my new MPIR-based Sage integer, which I think add to s. The addition takes some memory that points to the two MPIR integers, and since Python numbers are supposed to be immutable, I make yet another MPIR number (wrapped in a Python object), which is the result of asking MPIR to add them. MPIR numbers are also very complicated objects, involving stuff like limbs, and C structs, which hardly anybody fully understands. Despite these integers happening to be small, there is still quite some overhead in the addition, but it happens (taking a small fraction of the total runtime). Then we move on to the next step in the loop!"
With sum_python, the loop is similar, but MPIR isn't involved, and there are no conversions. This buys a 14-fold speedup. But it is still not super fast, since many new Python objects get created, the code is for "potentially huge integers", hence a potentially complicated data structure has to be checked for, etc.
With sum_cython, the integers are only C ints, which are a 32 or 64-bit location in memory. Doing "s += i" just modifies in place that position in memory. There's no conversions or type checks done at all at run time. It's really fast... 1386 times faster than the first version!!!
Key point: If you truly understand what is going on, you'll see that this isn't Sage being fundamentally broken. Instead, you'll hopefully be able to look at a block of Sage code and have a clue about how to figure out what it is really doing in order to see whether writing a new implementation of the same algorithm using Cython (which will likely mean directly working with C level data structures) is likely to give you a big speedup. If you look at the innermost statement in a loop, and there's a big monologue about what is really going on, then you might get a 1000-fold speedup by using Cython.
In mathematics, general theorems -- once we have them -- are almost always much better than proofs of special cases. In math, proving a special case can often seem more awkward and unnatural than proving the general case (e.g., how would you proof that ever integer of the form a^2 + 7*a + 5 factors uniquely as a product of primes!?). With general theorems in math, the statements are often simple and clear so applying them is easier than applying theorems that are only about some very special case, which has often more elaborate hypothesis. In mathematics, usually a general theorem is simply all around much better than a theorem about some very special cases (especially if both are available).
In contrast, when writing computer programs, algorithms to solve very general cases of problems often have significant drawbacks in terms of speed (and sometimes complexity) over algorithms for special cases. Since you are mathematicians, you should constantly guard against your instincts from math research which can point you in exactly the wrong direction for writing very fast code. Often implementations of very general algorithms _are_ easier to understand, and are much less buggy than a bunch of implementations of special cases. However, there are also usually very severe performance penalties in implementing only the general case. Watch out.
A huge part of understanding the point of Cython for writing fast math code is that you must accept that you're going to write a lot of "ugly" (from a mathematicians perspective) code that only deals with special cases. But it's beautiful from the perspective of somebody who absolutely needs fast code for a specific research application; your fast code can lead to whole new frontiers of research.
Continuing the example from above:
sage: sum_python(10^8) 4999999950000001 sage: sum_cython(10^8) 887459713
Yep, we just implemented only a special case in Cython!
Useful things to do if you want Cython enlightenment (all at once, no order):