JSON vs Structs vs Columns in BigQuery - Observations

Shout out to Josh Peng and Claus Herther who inspired me to write this post! A common way of logging and delivering data from production systems is via the JSON format. JSON allows for a flexible schema that supports nested value pairs and arrays. In BigQuery we have a few options to consider when choosing how to store this data for use in BigQuery. JSON string column with BigQuery JSON functions Pros:  Easiest to use directly from the source system Flexible schema as source data changes Cons: We lose BigQuery’s columnar data storage benefits. Every query will pay the cost of hitting the entire JSON object. JSON functions are somewhat limited Transform the JSON into a Structs Pros: Depending on your use case this may better fit your data needs, especially if you have nested arrays in your JSON Efficient use of columnar data storage Cons: Lack of schema flexibility May be new territory for SQL users who are not used to using nested fields Transform the JSON into a standard table with

Google BigQuery and Colab - Getting Started

Getting Started This is a brief introduction to BigQuery and Colab (Free Hosted Jupyter Notebooks) for anyone who is new to data or the Google data ecosystem. The universe of data technologies available today is vast. From open source technologies like Apache Spark , Hive , Beam , and Flink , to partially managed services like Amazon’s EMR , Athena , Kinesis , and Redshift , to fully managed services like Snowflake , Google BigQuery , and Google Dataflow . There are pros and cons to every option. We will initially focus on fully managed services like BigQuery as they are the easiest and perhaps best place to start for most. Additionally, the advances in functionality made by managed services like Snowflake and BigQuery make them a more viable choice for the majority of data needs today. Additionally, BigQuery offers 1TB of free processing and 10GB of free storage per month , so what do we have to lose starting there?