1. The model size was small enough to process the corpus fast-ish using the limited resources I have. They also support MRL and binary embeddings which help would be helpful in case I need to downsize on the VM size.
2. Close to 500ms. See [^1].
3. This [^2] was the reason I went with milvus. I also assumed that more stars would result in a bigger community and hence faster bug discovery and fixes. And better feature support.
4. Yes, I automated the weekly pull here [^3]. Since I am constrained on resources available, I used HuggingFace Spaces to do the automation for me :)
Although, the space keeps sleeping and to avoid that, I am planning keep calling the same space using api/gradio_client. Let's see how that goes.
| which is more recent, more people might want
Absolutely agree. I am planning to add a 'Recency' sorting option for the same. It should balance between similarity and the date published.
| also you might want more result density - so perhaps a UI option to collapse the abstracts and display more in the first glance.
Oh, I will surely look into it. Thank you so much for a detailed response. :D
2. Close to 500ms. See [^1].
3. This [^2] was the reason I went with milvus. I also assumed that more stars would result in a bigger community and hence faster bug discovery and fixes. And better feature support.
4. Yes, I automated the weekly pull here [^3]. Since I am constrained on resources available, I used HuggingFace Spaces to do the automation for me :) Although, the space keeps sleeping and to avoid that, I am planning keep calling the same space using api/gradio_client. Let's see how that goes.
| which is more recent, more people might want
Absolutely agree. I am planning to add a 'Recency' sorting option for the same. It should balance between similarity and the date published.
| also you might want more result density - so perhaps a UI option to collapse the abstracts and display more in the first glance.
Oh, I will surely look into it. Thank you so much for a detailed response. :D
[1]: https://news.ycombinator.com/item?id=42507116#42509636 [2]: https://benchmark.vectorview.ai/vectordbs.html [3]: https://huggingface.co/spaces/bluuebunny/update_arxiv_embedd...