BigQuery to ClickHouse migration guide setup
There are two recommended ways to migrate your project from BigQuery to ClickHouse:
- Using ClickPipes, a managed CDC solution available in ClickHouse Cloud
- By exporting your BigQuery tables to a GCS bucket and importing the data in ClickHouse
Both of the migration guides in this section covering the two approaches above use the Stack Overflow dataset, as an example dataset to show a typical migration from BigQuery to ClickHouse Cloud.
The dataset contains every post, vote, user, comment, and badge that has occurred on Stack Overflow from 2008 to Apr 2024.
The BigQuery schema for this data is shown below:
Pre-requisites
Before proceeding, make sure you have the following:
- A Google Cloud account
Create a new project
- Go to the Google Cloud Console.
- Click on the project picker at the top of the page next to the Google Cloud logo.
- Click on New project.
- Fill in the project name, for example "BigQueryClickHouse."
- Optionally, select a parent resource, or leave it as No organisation.
- Click on Create.
Create table DDL and insert data
- From the console landing page, select the project you just created.
- You should see "You are working in" along with the name of the project you just created.
- Make note of the Project-ID. You will need this in a later step.
- Select the Run a query in BigQuery button to open the studio.
- Now select the SQL query button.
- Enter the following DDL commands in the query editor, replacing project-ID below with the project ID that you took note of in step 3 above:
Data for these tables is made available in Parquet format in a GCS bucket for convenience.
- Click the Run button to execute the statements. It will take about six minutes to complete.
When the queries have finished executing you should see a BigQuery resource in the tab to the left of the query editor. If you click on the resource name to expand it, you should now see the stackoverflow schema and 7 tables.
You have successfully created the tables that we'll be using in these migration guides.
Create a GCS bucket for staging
This step is required regardless of which migration method you will be using.
The initial load process requires a user-provided Google Cloud Storage (GCS) bucket for staging. Refer to the Google Cloud documentation on how to do so.
You will need the bucket's gsutil URI if you intend to use ClickPipes (e.g. gs://bigquery-clickhouse), or the bucket's Cloud Console URL (e.g. https://console.cloud.google.com/storage/browser/bigquery-clickhouse) if you intend to import the data directly.
You can find these by clicking the name of your bucket, and then selecting the Configuration tab. They're listed in a table under the header Overview.
Create a service account file and key
ClickPipes authenticates to your Google Cloud project using a service account key. We recommend creating a dedicated service account with the minimum required set of permissions to allow ClickPipes to export data from BigQuery, load it into the staging GCS bucket, and read it into ClickHouse.
To create a service account:
- Select IAM and admin from the navigation menu in the Google Cloud console
- Select Service accounts
- Click Create service account
Service account permissions
The following service account permissions are required:
BigQuery
The service account must have the following BigQuery roles:
To further scope access, we recommend using IAM conditions to restrict the resources the role has access to. For example, you can restrict the dataViewer role to the specific dataset containing the tables you want to sync:
Cloud Storage
The service account must have the following Cloud Storage roles:
To further scope access, we recommend using IAM conditions to restrict the resources the role has access to. For example, you can restrict the objectAdmin and bucketViewer roles to the dedicated bucket created for ClickPipes syncs.
Create access keys
Depending on the migration approach you plan to use, you will need to create one of the following credentials:
| Migration approach | Credential type | Documentation | Notes |
|---|---|---|---|
| ClickPipes for CDC | Service key | Create a service key | Make sure to select JSON rather than P12 as the Key type. |
| GCS bulk-load | HMAC key | Create an HMAC key | - |
Next steps
After completing the steps in this setup guide, you can now proceed with one of the following migration guides depending on which approach you would like to take: