JSON vs Structs vs Columns in BigQuery - Observations

Shout out to Josh Peng and Claus Herther who inspired me to write this post! A common way of logging and delivering data from production systems is via the JSON format. JSON allows for a flexible schema that supports nested value pairs and arrays. In BigQuery we have a few options to consider when choosing how to store this data for use in BigQuery. JSON string column with BigQuery JSON functions Pros:  Easiest to use directly from the source system Flexible schema as source data changes Cons: We lose BigQuery’s columnar data storage benefits. Every query will pay the cost of hitting the entire JSON object. JSON functions are somewhat limited Transform the JSON into a Structs Pros: Depending on your use case this may better fit your data needs, especially if you have nested arrays in your JSON Efficient use of columnar data storage Cons: Lack of schema flexibility May be new territory for SQL users who are not used to using nested fields Transform the JSON into a standard table with

Google BigQuery and Colab - Getting Started

Getting Started

This is a brief introduction to BigQuery and Colab (Free Hosted Jupyter Notebooks) for anyone who is new to data or the Google data ecosystem. The universe of data technologies available today is vast. From open source technologies like Apache Spark, Hive, Beam, and Flink, to partially managed services like Amazon’s EMR, Athena, Kinesis, and Redshift, to fully managed services like Snowflake, Google BigQuery, and Google Dataflow. There are pros and cons to every option. We will initially focus on fully managed services like BigQuery as they are the easiest and perhaps best place to start for most. Additionally, the advances in functionality made by managed services like Snowflake and BigQuery make them a more viable choice for the majority of data needs today. Additionally, BigQuery offers 1TB of free processing and 10GB of free storage per month, so what do we have to lose starting there?


Google BigQuery

BigQuery is a fully-managed, serverless data warehouse that enables scalable, cost-effective and fast analysis over petabytes of data. It is a serverless Software as a Service that supports querying using ANSI SQL. It also has built-in machine learning capabilities.

Google Colaboratory (Colab)

Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser.

  • Zero configuration required

  • Free access to GPUs

  • Easy Google Drive style sharing

  • Access to BigQuery, machine learning, and other Python based data science tools

  • Introduction to Colab


Quick Start Guide for Colab and BigQuery

Follow along in the Colab Getting started with BigQuery Notebook

  1. Use the Cloud Resource Manager to Create a Cloud Platform project if you do not already have one.

  2. Enable billing for the project

    1. If we keep our usage under 1TB and storage under 10GB everything will be free!

  3. Enable BigQuery APIs for the project.

*** WARNING *** 

Be careful with the queries you run in BigQuery. If you run a query over a large table or public table you will have to pay for $5 for each TB processed over 1TB. You can always check the size of your query before running by putting it directly into the BigQuery Web Console.

*** END WARNING *** 


The example queries in the getting started notebook are quite small, so no need to worry about going over the free 1TB limit.


Go through the getting started notebook and try out the various options for running queries on BigQuery and working with results in Python / Pandas dataframes. This will enable you to make use of data in many different ways moving forward. Future posts will assume you have familiarity with BigQuery and Colab / Jupyter Notebooks.


If you would really like to ramp up your BigQuery skills I highly recommend this book:


Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale

Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale


Comments


  1. On the off Low Rate Call Girls in Udaipur probability that you want glamorous young girls to hit each of the spots and make every single second you spend together number,Nepali Call Girls in Udaipur then you are going to love being in the personal organization of the Affordable Busty Escorts Service in Gurgaon. Aditi the Most Popular Russian Call Girls in Faridabad Call Girls Agency, this awesome-looking woman has all the features that look like to the versions,Russian Call Girls in Faridabad and the moment a guy approaches her VIP Call Girls in Jaipur the one meeting is enough to drag him back to this queen of beauty.

    ReplyDelete

Post a Comment

Popular posts from this blog

JSON vs Structs vs Columns in BigQuery - Observations

Origins

Translate