villashoe.blogg.se

Redshift spectrum vs athena
Redshift spectrum vs athena







redshift spectrum vs athena

For the purpose of this test it was easier to load a large dataset from S3 / Google Cloud Storage but in most cases data will be streamed directly from an application.ĭuring my last round of tests with Redshift I split my CSV in 50 2GB files instead of using one 1TB file as shown above. The above information may or may not be relevant in your use case. The times below reflects the load time from S3 / Google Cloud Storage to the datastore and not the time it took to transfer the file from a server to both cloud storage solutions. To calculate load time I initially sent the files to both Amazon S3 and Google Cloud Storage then loaded them into each datastore.

  • All columns are either integers, double precision or floats.
  • This article is a basic comparison on data loading and simple queries between Google BigQuery and Amazon Redshift and its cousin Athena.įor this test we will be loading a CSV/Parquet file which is basically an enlarged version of the STAR Experiment star2002 dataset. I will definitely share a link to their article if they publish one!
  • I added a links section with useful articlesįinally, a few people reached out asking for the dataset to try to load it and benchmark the performance on other databases.
  • Using Redshift admin tables I was able to add the data scanned per query for Redshift (Thanks rockostrich).
  • redshift spectrum vs athena

  • I converted the CSV format to Parquet and re-tested Athena which did give much better results as expecte (Thanks Rahul Pathak, Alex Casalboni, openasocket, Robert Synnott, the amazon Redshift team with Joe Harris, Jenny Chen, Maor Kleider and the Amazon Athena/EMR team with Abhishek Sinha).
  • #Redshift spectrum vs athena update#

    This is the first update of the article and I will try to update it further later. 6/22/17 updateĪlmost 3,000 people read the article and I have received a lot of feedback. The benchmark below has been updated with these new tests. With Joe Harris' help (he is a Redshift Database Engineer at AWS), I measured the performance of an optimized schema on both dc1.large and ds2.xlarge Redshift instances. Following Tino Tereshko's advice (he is Big Data Lead at Google Cloud Office of CTO), I added the metrics for BigQuery standard SQL and re-calculated the data loading time (from Google Cloud Storage to BigQuery) following their recent optimizations.









    Redshift spectrum vs athena