Google BigQuery
JVM AGENT
Overview
The Google BigQuery connector enables Seeq to access data from Google BigQuery via the BigQuery API.
Prerequisites
There are no prerequisites for connecting to Google BigQuery.
Configuration
This is an example configuration template that is displayed in the Additional Configuration box that appears when you click Configure for an existing datasource (or if a new datasource is being created, in the Create new datasource connection modal that appears after clicking Add Datasource) on the Datasources administration page.
{
"AccessKeyFile" : "C:/Temp/bigqueryconnector-291902-b025642e7bb7.json",
"SignalTables" : [ {
"Id" : "cf9ca9cb-a131-45e8-9297-c38187d4df5a",
"Name" : null,
"NameQuery" : "SELECT table_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.TABLES` WHERE table_name like 'tlc_y%_trips%' OR table_name like 'tlc_g%_trips%'",
"GroupBy" : null,
"DataColumns" : [ "trip_distance" ],
"DataColumnQuery" : null,
"TimeColumn" : "pickup_datetime",
"LastGroupAsSignalName" : false,
"GroupByLimit" : 0,
"ProjectId" : "bigquery-public-data",
"DataSet" : "new_york"
}, {
"Id" : "df9ca9cb-a131-45e8-9297-c38187d4df5b",
"Name" : "covid19_open_data",
"NameQuery" : null,
"GroupBy" : [ "country_name", "subregion1_name", "subregion2_name" ],
"DataColumns" : [ "average_temperature_celsius" ],
"DataColumnQuery" : "SELECT column_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = 'covid19_open_data' AND (data_type='INT64' OR data_type='FLOAT64') AND column_name not in ('latitude', 'longitude')",
"TimeColumn" : "date",
"LastGroupAsSignalName" : false,
"GroupByLimit" : 100,
"ProjectId" : "bigquery-public-data",
"DataSet" : "covid19_open_data"
}, {
"Id" : "009ca9cb-a131-45e8-9297-c38187d4df5a",
"Name" : "citibike_trips",
"NameQuery" : null,
"GroupBy" : null,
"DataColumns" : [ "birth_year" ],
"DataColumnQuery" : null,
"TimeColumn" : "starttime",
"LastGroupAsSignalName" : false,
"GroupByLimit" : 0,
"ProjectId" : "bigquery-public-data",
"DataSet" : "new_york_citibike"
}, {
"Id" : "gh5ca9cb-a131-45e8-9297-c38187d4df5a",
"Name" : "data_by_province",
"NameQuery" : null,
"GroupBy" : [ "country", "region_name", "province_name" ],
"DataColumns" : [ "confirmed_cases" ],
"DataColumnQuery" : null,
"TimeColumn" : "date",
"LastGroupAsSignalName" : true,
"GroupByLimit" : 100,
"ProjectId" : "bigquery-public-data",
"DataSet" : "covid19_italy"
} ],
"ConditionTables" : [{
"Id" : "d110e9da-5bd6-47f8-8231-583bba8dc96a",
"Name" : "processes",
"NameQuery" : null,
"GroupBy" : [ "Building", "Area", "Unit", "Operation", "Phase" ],
"GroupByLimit" : 0,
"StartTimeColumn" : "StartTime",
"EndTimeColumn" : "EndTime",
"MaximumDuration" : "1h",
"CapsuleProperties" : [ {
"Name" : "Batch ID",
"Value" : "${columnResult}",
"Column" : "BatchId",
"Uom" : "string"
}, {
"Name" : "Serial Number",
"Value" : "${columnResult}",
"Column" : "UniqueId",
"Uom" : "string"
}, {
"Name" : "Temperature",
"Value" : "${columnResult}",
"Column" : "Temperature",
"Uom" : "DegF"
}, {
"Name" : "Constant Property",
"Value" : "ABCDE",
"Column" : null,
"Uom" : "string"
} ],
"ProjectId" : "bigqueryconnector-testDB",
"DataSet" : "test"
}]
}],
"AccessKeyFile" : "C:\\BigQuery\\bigqueryconnector-12345.json",
"TimeZone" : "UTC"
}
Google BigQuery Additional Configuration
Property Name | Default Value | Data Type | Description |
---|---|---|---|
| null | String | This is the path to the access key file downloaded from Google cloud platform. The access key file contains properties required for BigQuery connectivity such as project_id, private_key, private_key_id, etc. |
| An empty list | A List of SignalTableDefinitions | The collection of table configurations which provide the necessary information to create signals in Seeq. |
| An empty list | A List of ConditionTableDefinitions. | The collection of table configurations which provide the necessary information to create conditions in Seeq. |
| null | String | Date and DateTime data types in BigQuery do not contain timezone information (use the TimeStamp data type instead, which always stores data in UTC). If you have Date and DateTime data types, specify the timezone using this field. Named zones (e.g. If you have a mix of time zones or other time zone complexity that cannot be worked around using this |
SignalTableDefinition Configuration
Property Name | Default Value | Data Type | Description |
---|---|---|---|
|
| String | This is the unique ID for the signal table configuration itself. Note that changing the |
| null | String | This is the name of the BigQuery project. For example, |
| null | String | This is the name of the BigQuery dataset. For example, |
| null | String | This is the name of the BigQuery table. At least |
| null | String | This is a query that defines how to obtain a list of tables with identical schemas. The NameQuery can contain the placeholders At least |
| Empty list | A list of Strings | This is a list of columns that can be used to group signals into an asset hierarchy based on distinct column value combinations. The column name order is important - it is used to generate the asset hierarchy. Example: |
| 0 | Integer | A limit on the number of results (asset paths) that can be returned from the |
| [] | A list of Strings | The list of columns that contain data that should be synced to Seeq as signal values . Each listed column name creates a unique Seeq signal. By default, signal name is equal to data column name. For example: At least |
| null | String | User provided query which allows to obtain the list of columns within the same table which data fields used as Seeq signal’s values. At least |
|
| String | This is the column used as the timestamp source. For example: |
|
| Boolean | Set to true to enable the last element in the |
ConditionTableDefinition Configuration
Property Name | Default Value | Data Type | Description |
---|---|---|---|
|
| String | This is the unique ID for the condition table configuration itself. Note that changing the |
| null | String | This is the name of the BigQuery project. For example, |
| null | String | This is the name of the BigQuery dataset. For example, |
| null | String | This is the name of the BigQuery table. At least |
| null | String | This is a query that defines how to obtain a list of tables with identical schemas. The NameQuery can contain the placeholders At least |
| Empty list | A list of Strings | This is a list of columns that can be used to group conditions into an asset hierarchy based on distinct column value combinations. The column name order is important - it is used to generate the asset hierarchy. Example: |
| 0 | Integer | A limit on the number of results (asset paths) that can be returned from the |
|
| String | This is the column used to specify the source of timestamps for the start of each capsule. |
|
| String | This is the column used to specify the source of timestamps for the end of each capsule. |
|
| String | This is the maximum length of time for capsules in the Condition(s) defined by this table. Longer capsules will be filtered out. This field cannot be null! |
| An empty CapsulePropertyDescription | CapsulePropertyDescription | This field contains additional metadata that is attached to each capsule. See the JSON configuration above for examples. |
CapsulePropertyDefinition Configuration
Property Name | Default Value | Data Type | Description |
---|---|---|---|
| null | String | This is the name of the capsule property. |
| null | String | This is the value of the capsule property. It can be a static value such as |
| null | String | This field provides a column for reference in |
| null | String | This is the unit of measure for the capsule property values. For columns of BigQuery type |
Known Issues
The BigQuery connector supports the following data types as numeric: Int64, Float64 and Bool. All other data types treated as Strings.
Please report any other issues you find to our support portal.
Troubleshooting
Connect timed out
When the BigQuery connector is unable to connect to Google, the logs will contain entries that look like:
CODE
|
There may be a problem with Google’s service, or your network link between the Remote Agent where the connector is running and the BigQuery service (for example, a firewall that doesn't allow the Remote Agent to connect to BigQuery).
To troubleshoot, visit the links listed in your BigQuery configuration using a browser running on the Remote Agent and debug your network link.
BigQueryService is not initialized
If you see this error message then it is most likely a permissions issue, or the service account has been disabled or deleted.
View the logs for this connector for additional information:
ERROR2022-03-11T03:31:19.182Z[Connection: JVM Agent: ip-172-16-2-125: BigQuery: BQ.Samples: e84563ac-e3f3-e616-a5d3-275ce92edd5e] com.seeq.link.connectors.bigquery.data.BigQueryClient - isConnected: Exception error: Access Denied: Project bigquery-sandbox-342919: User does not have bigquery.jobs.create permission in project bigquery-sandbox-342919.
Contact a BigQuery admin and verify that the access key is correct, that the access key has not been disabled, and that the associated service account has Viewer access to the dataset.
If you are running into other issues with connecting to or access data from Google BigQuery, view our guide for troubleshooting datasource issues.
Performance considerations
The Google BigQuery connector does not have any special performance considerations. View our guide on optimizing datasource performance for general guidance.