Please tell us about the chunking algorithm itself. The flowchart says you have three paths: structured, fixed, and hybrid. What happens in each one? Can you access the library from Rust?
That's great if it works, but I have always been extremely skeptical of relying on chunks as a means of retrieving information. Considering chunks misses the surrounding context and nuanced conditions that qualify the chunk's usage. I believe it is better to use the entire document in large context instead, then have the LLM summarize its relevance into an accurate title and blurb. When retrieving, filter and match over the titles and blurbs, then again give the LLM the entire text.
In no universe would this mean production ready. That's just the bot traffic any package would get.
Brutal feedback from someone building RAG systems: nobody wants to use slop, the commit history is 6 initial commits with thousands of LOC with mostly README updates after that.
Also I get junk from a very simple PDF. How did you verify the claims of the capabilities of this project?
In no universe would this mean production ready. That's just the bot traffic any package would get.
Brutal feedback from someone building RAG systems: nobody wants to use slop, the commit history is 6 initial commits with thousands of LOC with mostly README updates after that.
Also I get junk from a very simple PDF. How did you verify the claims of the capabilities of this project?