[neurokernel-dev] memoization using content-based hashing

Lev Givon lev at columbia.edu
Mon Aug 4 00:08:36 EDT 2014

To speed up the performance of the new LPU interface machinery in the refactored
Neurokernel code, I've memoized several of the methods of the Interface and
Pattern classes using an LFU cache scheme based upon that in cachetools
[1]. Although the arguments of many of these methods can be memoized using
Python's builtin data structures, those with mutable/complex argument types
(e.g., numpy arrays) cannot because they are not hashable. To facilitate caching
for such methods, I implemented

- a Cython-based wrapper for an extremely fast hash algorithm called xxHash;
- a recursive content-based hash mechanism that employs the above xxHash wrapper;
- a memoization decorator for instance methods that uses the above mechanism to 
  cache the results of methods with mutable/complex arguments.

The above have been uploaded to Github as two separate packages [2, 3]; I also
made those packages dependencies of the refactored Neurokernel branch (they are
not on PyPI yet, however).

Preliminary profiling of the demo at the end of core.py for 1000 execution steps
on my workstation resulted in the following timings:

- no memoization:                                        117 s
- LFU memoization of Interface.interface_ports() method:  62 s
- LFU memoization of multiple methods with xxhash/chash:  12 s

Please try out the code and let me know if you encounter any issues. Note, of
course, that since the above memoization can cache mutable objects, the above
does introduce the possibility of returning incorrect answers if the cache
contents are modified after being added to the cache.

[1] https://github.com/tkem/cachetools
[2] https://github.com/lebedov/xxhash
[3] https://github.com/lebedov/chash
Lev Givon
Bionet Group | Neurokernel Project

More information about the neurokernel-dev mailing list