A search index is often made of smaller independent pieces often called segments. So you can download & process progressively the data locally and upload it to an object storage. And run queries on it. That's what we did here for this project: https://quickwit.io/blog/commoncrawl
Also an interesting blog post here: https://fulmicoton.com/posts/commoncrawl/