Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

BQ is considerably more performant.


This is uninformed and unsubstantiated.

The answer depends massively on the intended use case.


What are the use cases where one would shine and the other wouldn't? I'm interested in real life usage of any of those on >100 GB datasets.


With bigquery the cost is per query. If you do a "SELECT * FROM AAA" you'll pay whatever the size of the table AAA is, price being 5$ per TB.

If you do a "SELECT field1, field2 FROM AAA" you'll pay only for the total size of field1 and field2 rows.

So you usually want to use BigQuery in situations where you don't need to query data all the time, but rather a fixed number of times a day.

Now about performance: BigQuery queues queries, so you don't have a guaranteed time. It can take a couple of seconds before your query starts running, it can take longer. If you need something that responds in < xxx ms, BigQuery is not it.

But the queries themselves are fast. If you need to query across petabytes of data, as a simple BigQuery query will gladly run on however many dozens or hundreds of instances it needs, at no additional cost for you (since you only pay by the size of your data queried).

It's really a great example of serverless. You can run your query across 100 instances, but you only use those instances for a few seconds.


  > SELECT * FROM AAA
Can't think of a realistic use case where such a query would be appropriate, even from just the performance standpoint. In fact, for a very long time the internal counterpart of BigQuery didn't even support "SELECT star", and nobody complained too badly. If you'd like to give Google a gift, however, sure "SELECT star" all you want. :-)

  > "SELECT field1, field2 FROM AAA" you'll pay only for the total size of field1 and field2 rows.
Moreover, if you also use a WHERE clause, you'll pay even less.


This can happen in data cleaning/loading where you load unclean data into a table that is ready for analysis. I have loaded data through staging tables regularly. There may be multiple stages.

Another example is materialized view creation. It's common for these to scan large quantities of data to compute aggregates.


That's not the recommended way of loading data into BigQuery, though.

https://cloud.google.com/bigquery/docs/loading-data


That is not true

A where clause still searches the entire column, unless it is conditioned on the partition column.


Not if you filter on clustered columns, like you should if you care about performance.


True, but now we have moved pretty far from just saying "use a where clause".

"Partition your table and create clustered columns and filter on those"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: