You could do a lot of stuff with pre-calculating things for your embeddings. Why...

TmpstsTrrctta · on Sept 20, 2024

I have experience in traditional search as well and I think this is doing some limiting of my imagination when it comes to vector search. In the post, I did like the introduction of the Contextual BM25 compared to other hybrid approaches then doing rrf.

For question answering, vector/semantic search is clearly a better fit in my mind, and I can see how the contextual models can enable and bolster that. However, because I’ve implemented and used so many keyword based systems, that just doesn’t seem to be how my brain works.

An example I’m thinking of is finding a sushi restaurant near me with availability this weekend around dinner time. I’d love to be able to search for this as I’ve written it. How I would search for it would be search for sushi restaurant, sort by distance and hope the application does a proper job of surfacing time filtering.

Conversely, this is mostly how I would build this system. Perhaps with a layer to determine user intention to pull out restaurant type, location sorting, and time filtering.

I could see using semantic search for filtering down the restaurants to related to sushi, but do we then drop back into traditional search for filtering and sorting? Utilize function calling to have the LLM parameterize our search query?

As stated, perhaps I’m not thinking of these the right way because of my experiences with existing systems, which I find seem to give me better results when well built

ValentinA23 · on Sept 20, 2024

Another approach I saw is to build a conceptual graph using entity extraction and have the LLM suggest search paths through that graph to enhance the retrieval step. The LMM is fine-tuned on the conceptual graph for this specific task. Could work in your case, but you need to deal with an ontology that suits your use case, in other words it must already contain restaurant location, type of dishes served and opening hours.

visarga · on Sept 20, 2024

GraphRAG requires you define upfront the schema of entity and relation types. This works when you are in a known domain, but in general, when you want to just answer questions from a large reference, you don't know what you need to put in the graph.

postalcoder · on Sept 20, 2024

Graph RAG is very cool and outstanding at filling some niches. IIRC, Perplexity's actual search is just BM25 (based a lex fridman interview of the founder).

jillesvangurp · on Sept 20, 2024

Makes sense; perplexity is really responsive and fast usually.

I need to check out that interview with Lex Fridman.

danielcampos93 · on Sept 23, 2024

That is a funny was of explaining that they scrape google.

_hfqa · on Sept 20, 2024

Do you have the link and the time in the video where he mentions it?

rty32 · on Sept 20, 2024

https://youtu.be/e-gwvmhyU7A?t=2h5m41s

lmeyerov · on Sept 20, 2024

This was my exact question. Why do an LLM rewrite, when you can add a context vector to a chunk vector, and for plaintext indexing, add a context string (eg, tfidf)?

The article claimed other context augmentation fails, and that you are better off paying anthropic to run an LLM on all your data, but it seems quite handwavy. What vector+text search nuance does a full document cache LLM rewrite catch that cheapo methods miss? Reminds me of "It is difficult to get a man to understand something when his salary depends on his not understanding it". (We process enough data that we try to limit LLMs to the retrieval step, and only embeddings & light LLMs to the indexing step, so it's a $$$ distinction for our customers.)

The context caching is neat in general, so I have to wonder if this use case is more about paying for ease than quality, and its value for quality is elsewhere.