CloudSQL to BigQuery Dataflow Pipeline in GCP


Moving data between Cloud SQL and BigQuery is fairly straightforward with federated queries. However, federated queries are not available for Cloud SQL instances created with a private IP address, which might be the only option in many organisation due to security constraints. As an alternative, a Dataflow pipeline can be built to do the job. Moreover, there is a template readily available (JDBC to BigQuery) which in an ideal world would have made this approach easy as well. However, there are some bits and pieces which can be not quite obvious. At least, they weren’t for me — I had to spend a few days building a working pipeline and in the end I had to ask a GCP expert for help. In this blog article I’m trying to address these issues to make life easier for other people facing the same challenge. In my example I’m using Postgres Cloud SQL although I would expect the Mysql case to be very similar if not identical.

