Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could do a lot of stuff with pre-calculating things for your embeddings. Why cache when you can pre-calculate. That brings into play a whole lot of things people commonly do as part of ETL.

I come from a traditional search back ground. It's quite obvious to me that RAG is a bit of a naive strategy if you limit it to just using vector search with some off the shelf embedding model. Vector search simply isn't that good. You need additional information retrieval strategies if you want to improve the context you provide to the LLM. That is effectively what they are doing here.

Microsoft published an interesting paper on graph RAG some time ago where they combine RAG with vector search based on a conceptual graph that they construct from the indexed data using entity extraction. This allows them to pull in contextually relevant information for matching chunks.

I have a hunch that you could probably get quite far without doing any vector search at all. It would be a lot cheaper too. Simply use a traditional search engine and some tuned query. The trick is of course query tuning. Which may not work that well for general purpose use cases but it could work for more specialized use cases.



I have experience in traditional search as well and I think this is doing some limiting of my imagination when it comes to vector search. In the post, I did like the introduction of the Contextual BM25 compared to other hybrid approaches then doing rrf.

For question answering, vector/semantic search is clearly a better fit in my mind, and I can see how the contextual models can enable and bolster that. However, because I’ve implemented and used so many keyword based systems, that just doesn’t seem to be how my brain works.

An example I’m thinking of is finding a sushi restaurant near me with availability this weekend around dinner time. I’d love to be able to search for this as I’ve written it. How I would search for it would be search for sushi restaurant, sort by distance and hope the application does a proper job of surfacing time filtering.

Conversely, this is mostly how I would build this system. Perhaps with a layer to determine user intention to pull out restaurant type, location sorting, and time filtering.

I could see using semantic search for filtering down the restaurants to related to sushi, but do we then drop back into traditional search for filtering and sorting? Utilize function calling to have the LLM parameterize our search query?

As stated, perhaps I’m not thinking of these the right way because of my experiences with existing systems, which I find seem to give me better results when well built


Another approach I saw is to build a conceptual graph using entity extraction and have the LLM suggest search paths through that graph to enhance the retrieval step. The LMM is fine-tuned on the conceptual graph for this specific task. Could work in your case, but you need to deal with an ontology that suits your use case, in other words it must already contain restaurant location, type of dishes served and opening hours.


GraphRAG requires you define upfront the schema of entity and relation types. This works when you are in a known domain, but in general, when you want to just answer questions from a large reference, you don't know what you need to put in the graph.


Graph RAG is very cool and outstanding at filling some niches. IIRC, Perplexity's actual search is just BM25 (based a lex fridman interview of the founder).


Makes sense; perplexity is really responsive and fast usually.

I need to check out that interview with Lex Fridman.


That is a funny was of explaining that they scrape google.


Do you have the link and the time in the video where he mentions it?



This was my exact question. Why do an LLM rewrite, when you can add a context vector to a chunk vector, and for plaintext indexing, add a context string (eg, tfidf)?

The article claimed other context augmentation fails, and that you are better off paying anthropic to run an LLM on all your data, but it seems quite handwavy. What vector+text search nuance does a full document cache LLM rewrite catch that cheapo methods miss? Reminds me of "It is difficult to get a man to understand something when his salary depends on his not understanding it". (We process enough data that we try to limit LLMs to the retrieval step, and only embeddings & light LLMs to the indexing step, so it's a $$$ distinction for our customers.)

The context caching is neat in general, so I have to wonder if this use case is more about paying for ease than quality, and its value for quality is elsewhere.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: