A 54KB client-side HNSW vector search engine in WASM

(github.com)

7 points | by anshulbasia27 1 hour ago

1 comments

anshulbasia27 1 hour ago
Hi HN, I built altor-vec, a client-side HNSW vector search engine written in Rust that compiles to 54KB of WebAssembly. It lets you run semantic search entirely in the browser with sub-millisecond latency.
I started working on this because of the typical SaaS search dependency. Tools like Algolia are great, but at $0.50 per 1,000 searches, costs scale linearly with usage. More importantly, sending every user keystroke to a third-party API isn't ideal for privacy. I wanted a way to run high -quality semantic search directly on the user's device, where queries cost nothing and data never leaves the browser.
Under the hood, altor-vec implements HNSW (Hierarchical Navigable Small World)—the same algorithm used by Pinecone, Qdrant, and pgvector. HNSW builds a multi-layer graph where upper layers act as express lanes for coarse navigation, and the bottom layer contains all vectors for fine-grained search. This allows queries to greedily descend and find nearest neighbors in O(log n) time.
To make this viable for the browser, keeping the binary size small was critical. By writing it in Rust and carefully managing dependencies, the compiled WASM is just 54KB gzipped (117KB raw). For performance, all vectors are L2-normalized at insert time. This means dot product distance equals cosine similarity, saving computation during search. The index can be serialized to a binary format, allowing you to build it once at compile time and load it instantly on the client via a Uint8Array.
I benchmarked it with 10,000 vectors at 384 dimensions using the all-MiniLM-L6-v2 model. The serialized index is 17MB, which translates directly to the memory footprint when loaded into the browser. In Chrome, the p95 search latency is 0.60ms. In Node.js, it's 0.50ms, and native Rust hits 0.26ms. Because it's so fast, you can easily run it in a Web Worker to keep the main UI thread completely unblocked.
To make this practically useful, I also just published a Docusaurus plugin (docusaurus-theme-search-altor) that replaces Algolia DocSearch. It extracts your site content at build time and generates both an HNSW index and a BM25 index. In the browser, keyword search runs fully client-side with zero external calls. For semantic search, it pings a lightweight embedding API to convert the query into a vector, then searches the HNSW index locally via the WASM engine. If the embedding API is unreachable, it gracefully degrades to BM25 keyword results.
You can try the live demo here: https://altor-lab.github.io/altor-vec/
The source code is on GitHub: https://github.com/Altor-lab/altor-vec
I'm currently looking into ways to further compress the index size, perhaps through product quantization, as 17MB for 10K vectors is still a bit heavy for initial page loads on slower connections.
I'd love to hear your thoughts on the implementation, any ideas for reducing the memory footprint, or feedback on the Docusaurus plugin. Happy to answer any questions.