There isn't a one-size-fits all approach, but I've never worked on a project tha...

boyter · on July 23, 2022

> That might work for something small like a curated collection of a few hundred sites.

Probably more like a few million but otherwise 100% true. Once you really need to scale you have to start losing some accuracy or correctness.

It helps that the goal of a search engine is not to find all the results but instead delight the user by finding the things they want.

marginalia_nu · on July 24, 2022

Yeah, this is the great part about working with search, you really do get to sort of go to the gym when it comes to software engineering breadth. From hardware, to algorithms, to networking, to architecture, to UX, there is really an interesting problem everywhere you turn to look. Even just writing a file to disk is a challenge when the file is several dozen gigabytes and needs to be written byte-by-byte in a largely random order.

This does go some way toward explaining why Google interviews looked the way they've looked. It's just a shame everywhere else has copied their homework, without actually needing the same skills.

kreeben · on July 23, 2022

Yes, yes, yes :D There are so many topics in this space that are so interesting it's like a dream. I would add to your list

- sentiment analysis

- roaring bitmaps

- compression

- applied linear algebra

- ai

In a vent diagram intersecting all of these topics, is search. Coding a search engine from scratch is a beautiful way to spend ones days, if you're into programming.

rawoke083600 · on July 24, 2022

True story I once had a discussion with a developer about search in general like for your website, for the internet and the difficulties involved. Precision vs recall, relevancy vs popularity, ranking etc.

He was dumbfounded that i would want to spend two weeks 'tunning Solr queries' for a project. He asked( nay stated)

"Why ? Search is a solved problem ?"

We no longer talk to each other.

samsquire · on July 24, 2022

Add distributed systems and database technology to that too.

So Kubernetes or Borg and scalable storage and load balancing and global traffic management (glsb)