For the last few months, we’ve been doing occasional work on an approximate nearest neighbours (ANN) vector search tool, written in Python. It’s still not finished and there are many rough edges, but it comes with a working DynamoDB adaptor and hence operates out-of-memory, one our main requirements. On the down side, it isn’t as fast as it needs to be, and is untested in production.
Today, we are open sourcing it, in the hope that others can have some benefit from it, and perhaps even contribute to it. The vision we have for this tool is the following:
- It should operate out-of-memory: The size of the set of vectors from which you can recommend isn’t limited by how much memory your machine has.
- It should be stable and tested in a production environment.
- You should be able to control the number of recommendations it returns.
- It should be extensible in real time: You don’t have to shut down the server it runs on and re-index every time you add a new vector to the corpus.
- It should be scalable: it works quickly no matter how many vectors you have to search through.
- It should be as fast and accurate as possible (of course, it will never be able to match an in-memory NN search tool for speed).
How are we doing so far? At the moment, it isn’t as fast as we’d like, we’ve not benchmarked its accuracy and it’s not production-tested. However, it does satisfy the remaining requirements: users can determine the number of recommendations on a per-call basis, vectors can be added with no downtime, it’s scalable and it operates out-of-memory, using a DynamoDB adaptor.
We would’ve much preferred to use a pre-existing tool, but we couldn’t find one that ticked all of the above boxes, though there are some excellent open source projects that go some of the way: FLANN (see also the python binding pyflann), nearpy and annoy, for example. However, at the time of writing none of these operates out-of-memory, although nearpy does feature a Redis adaptor. We wanted it to operate out-of-memory both so that the size of the corpus we can recommend from isn’t dependent on how much RAM we have, and also so that we don’t have to wait for this to load into memory every time we reboot the server.
How does it work? I'll address this in more detail in a forthcoming post, but the short version is: using a collection of hyperplane hashers (AKA "random projections"), like annoy. Both the number of hashers and the number of hyperplanes per hasher can be specified by the user. This accounts for the scalability: if the number of vectors in the search space doubles, we need only add one extra hyperplane to each hasher to ensure the number of vectors per chamber remains constant. At search time, each hasher is queried, and the resulting candidate nearest neighbours are all placed in a big pool. Distances are then computed explicitly between the input vector and each vector in the pool, and the ids of the nearest "N" neighbours ("N" being specified by the user at query time) returned. So having more hashers gives more accurate results, but increases the query time.
Here are some things I'd like to see improved, but haven't had a chance to address myself:
- The hashers are queried on multiple threads using the python "threading" module. With 10 hashers on about 100,000 vectors of dimension 400, this results in a factor-of-two speedup (from around 1 second to 500ms) when compared to sequential querying with no threads. However, if it were properly asynchronous, I'd expect the speedup to be closer to the number of hashers -- around 10x. Perhaps this can be achieved with a properly asynchronous framework like Tornado?
- At present, the hyperplanes are randomly created when the HHEnsembleNNLookup object is created, and are forgotten when the machine is turned off. This means that all vectors must be re-hashed every time a HHEnsembleNNLookup object is initialised from a given set of vectors in DynamoDB. It would be better if the normal vectors could also be stored, preventing the need to re-hash in this way.