Skip to main content
Skip table of contents

Google BigQuery

JVM AGENT

Overview

The Google BigQuery connector enables Seeq to access data from Google BigQuery via the BigQuery API.

Prerequisites

There are no prerequisites for connecting to Google BigQuery.

Configuration

This is an example configuration template that is displayed in the Additional Configuration box that appears when you click Configure for an existing datasource (or if a new datasource is being created, in the Create new datasource connection modal that appears after clicking Add Datasource) on the Datasources administration page.

JSON
{
"AccessKeyFile" : "C:/Temp/bigqueryconnector-291902-b025642e7bb7.json",
"SignalTables" : [ {
  "Id" : "cf9ca9cb-a131-45e8-9297-c38187d4df5a",
  "Name" : null,
  "NameQuery" : "SELECT table_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.TABLES` WHERE table_name like 'tlc_y%_trips%' OR table_name like 'tlc_g%_trips%'",
  "GroupBy" : null,
  "DataColumns" : [ "trip_distance" ],
  "DataColumnQuery" : null,
  "TimeColumn" : "pickup_datetime",
  "LastGroupAsSignalName" : false,
  "GroupByLimit" : 0,
  "ProjectId" : "bigquery-public-data",
  "DataSet" : "new_york"
}, {
  "Id" : "df9ca9cb-a131-45e8-9297-c38187d4df5b",
  "Name" : "covid19_open_data",
  "NameQuery" : null,
  "GroupBy" : [ "country_name", "subregion1_name", "subregion2_name" ],
  "DataColumns" : [ "average_temperature_celsius" ],
  "DataColumnQuery" : "SELECT column_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = 'covid19_open_data' AND (data_type='INT64' OR data_type='FLOAT64') AND column_name not in ('latitude', 'longitude')",
  "TimeColumn" : "date",
  "LastGroupAsSignalName" : false,
  "GroupByLimit" : 100,
  "ProjectId" : "bigquery-public-data",
  "DataSet" : "covid19_open_data"
}, {
  "Id" : "009ca9cb-a131-45e8-9297-c38187d4df5a",
  "Name" : "citibike_trips",
  "NameQuery" : null,
  "GroupBy" : null,
  "DataColumns" : [ "birth_year" ],
  "DataColumnQuery" : null,
  "TimeColumn" : "starttime",
  "LastGroupAsSignalName" : false,
  "GroupByLimit" : 0,
  "ProjectId" : "bigquery-public-data",
  "DataSet" : "new_york_citibike"
}, {
    "Id" : "gh5ca9cb-a131-45e8-9297-c38187d4df5a",
    "Name" : "data_by_province",
    "NameQuery" : null,
    "GroupBy" : [ "country", "region_name", "province_name" ],
    "DataColumns" : [ "confirmed_cases" ],
    "DataColumnQuery" : null,
    "TimeColumn" : "date",
    "LastGroupAsSignalName" : true,
    "GroupByLimit" : 100,
    "ProjectId" : "bigquery-public-data",
    "DataSet" : "covid19_italy"
  } ],
  "ConditionTables" : [{
    "Id" : "d110e9da-5bd6-47f8-8231-583bba8dc96a",
    "Name" : "processes",
    "NameQuery" : null,
    "GroupBy" : [ "Building", "Area", "Unit", "Operation", "Phase" ],
    "GroupByLimit" : 0,
    "StartTimeColumn" : "StartTime",
    "EndTimeColumn" : "EndTime",
    "MaximumDuration" : "1h",
    "CapsuleProperties" : [ {
      "Name" : "Batch ID",
      "Value" : "${columnResult}",
      "Column" : "BatchId",
      "Uom" : "string"
    }, {
      "Name" : "Serial Number",
      "Value" : "${columnResult}",
      "Column" : "UniqueId",
      "Uom" : "string"
    }, {
      "Name" : "Temperature",
      "Value" : "${columnResult}",
      "Column" : "Temperature",
      "Uom" : "DegF"
    }, {
      "Name" : "Constant Property",
      "Value" : "ABCDE",
      "Column" : null,
      "Uom" : "string"
    } ],
    "ProjectId" : "bigqueryconnector-testDB",
    "DataSet" : "test"
    }]
  }],
  "AccessKeyFile" : "C:\\BigQuery\\bigqueryconnector-12345.json",
  "TimeZone" : "UTC"
}
Google BigQuery Additional Configuration

Property Name

Default Value

Data Type

Description

AccessKeyFile

null

String

This is the path to the access key file downloaded from Google cloud platform. The access key file contains properties required for BigQuery connectivity such as project_id, private_key, private_key_id, etc.

SignalTables

An empty list

A List of SignalTableDefinitions

The collection of table configurations which provide the necessary information to create signals in Seeq.

ConditionTables

An empty list

A List of ConditionTableDefinitions.

The collection of table configurations which provide the necessary information to create conditions in Seeq.

TimeZone

null

String

Date and DateTime data types in BigQuery do not contain timezone information (use the TimeStamp data type instead, which always stores data in UTC).

If you have Date and DateTime data types, specify the timezone using this field. Named zones (e.g. America/Los_Angeles) are recommended over offsets (e.g. -03:00 or UTC-3), especially if your timezone observes daylight savings. Note that time zones are case-sensitive, and connections with invalid TimeZone values will fail to index. If no value is provided, UTC will be assumed.

If you have a mix of time zones or other time zone complexity that cannot be worked around using this TimeZone field, then you may need to create a View within BigQuery with the times transformed to TimeStamps.

SignalTableDefinition Configuration

Property Name

Default Value

Data Type

Description

Id

<Enter ID>

String

This is the unique ID for the signal table configuration itself. Note that changing the Id field will cause the relevant signals to be indexed under a different dataId in Seeq. You can use a GUID generator like https://guidgenerator.com/ to create a new one for each definition.

ProjectId

null

String

This is the name of the BigQuery project. For example, bigquery-public-data.

DataSet

null

String

This is the name of the BigQuery dataset. For example, covid19_open_data.

Name

null

String

This is the name of the BigQuery table.

At least Name or NameQuery must be defined within each SignalTableDefinition. It is possible to provide Name and NameQuery parameters at the same time. Duplicate table names are ignored.

NameQuery

null

String

This is a query that defines how to obtain a list of tables with identical schemas. The NameQuery can contain the placeholders %projectid% and %dataset%, which will be replaced with actual values when querying the source system. For example, "NameQuery" : "SELECT table_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.TABLES` WHERE table_name like 'tlc_y%_trips%' OR table_name like 'tlc_g%_trips%'"

At least Name or NameQuery must be defined within each SignalTableDefinition. It is possible to provide Name and NameQuery parameters at the same time. Duplicate table names are ignored.

GroupBy

Empty list

A list of Strings

This is a list of columns that can be used to group signals into an asset hierarchy based on distinct column value combinations. The column name order is important - it is used to generate the asset hierarchy. Example: "GroupBy" : [ "country", "region_name", "province_name" ] parameter definition produces the following hierarchy: <country>\<region_name>\<province_name>. When GroupBy is provided - the associated signal collection is linked to the asset defined by the GroupBy leaf element. When GroupBy parameter is not provided, the signal collection is linked to the asset representing BigQuery table.

GroupByLimit

0

Integer

A limit on the number of results (asset paths) that can be returned from the GroupBy. A GroupByLimit of 0 indicates no limit.

DataColumns

[]

A list of Strings

The list of columns that contain data that should be synced to Seeq as signal values . Each listed column name creates a unique Seeq signal. By default, signal name is equal to data column name. For example: "DataColumns" : [ "average_temperature_celsius" ].

At least DataColumns or DataColumnQuery parameter must be defined within each table configuration. It is possible to provide DataColumns and DataColumnQuery parameters at the same time. Duplicate column names are ignored.

DataColumnQuery

null

String

User provided query which allows to obtain the list of columns within the same table which data fields used as Seeq signal’s values. DataColumnQuery can contain the placeholders %projectid%, %dataset% and %tablename%, which will be replaced with actual values when querying the source system. For example: "DataColumnQuery" : "SELECT column_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = '%tablename%' AND (data_type='INT64' OR data_type='FLOAT64') AND column_name NOT IN ('latitude', 'longitude')"

At least DataColumns or DataColumnQuery parameter must be defined within each table configuration. It is possible to provide DataColumns and DataColumnQuery parameters at the same time. Duplicate column names are ignored.

TimeColumn

starttime

String

This is the column used as the timestamp source. For example: "TimeColumn" : "starttime"

LastGroupAsSignalName

false

Boolean

Set to true to enable the last element in the GroupBy collection to be used as the signal instead of an asset. The first column obtained from DataColumns and/or DataColumnQuery serves as the data source while all other data columns are ignored. When the GroupBy parameter is not defined, the table name is used as the signal instead of an asset.

ConditionTableDefinition Configuration

Property Name

Default Value

Data Type

Description

Id

<Enter ID>

String

This is the unique ID for the condition table configuration itself. Note that changing the Id field will cause the relevant conditions to be indexed under a different dataId in Seeq. You can use a GUID generator like https://guidgenerator.com/ to create a new one for each definition.

ProjectId

null

String

This is the name of the BigQuery project. For example, bigquery-public-data.

DataSet

null

String

This is the name of the BigQuery dataset. For example, covid19_open_data.

Name

null

String

This is the name of the BigQuery table.

At least Name or NameQuery must be defined within each ConditionTableDefinition. It is possible to provide Name and NameQuery parameters at the same time. Duplicate table names are ignored.

NameQuery

null

String

This is a query that defines how to obtain a list of tables with identical schemas. The NameQuery can contain the placeholders %projectid% and %dataset%, which will be replaced with actual values when querying the source system. For example, "NameQuery" : "SELECT table_name FROM `%projectid%.%dataset%.INFORMATION_SCHEMA.TABLES` WHERE table_name like 'tlc_y%_trips%' OR table_name like 'tlc_g%_trips%'"

At least Name or NameQuery must be defined within each ConditionTableDefinition. It is possible to provide Name and NameQuery parameters at the same time. Duplicate table names are ignored.

GroupBy

Empty list

A list of Strings

This is a list of columns that can be used to group conditions into an asset hierarchy based on distinct column value combinations. The column name order is important - it is used to generate the asset hierarchy. Example: "GroupBy" : [ "country", "region_name", "province_name" ] parameter definition produces the following hierarchy: <country>\<region_name>\<province_name>. When GroupBy is provided - the associated condition collection is linked to the asset defined by the GroupBy leaf element. When GroupBy parameter is not provided, the condition collection is linked to the asset representing BigQuery table.

GroupByLimit

0

Integer

A limit on the number of results (asset paths) that can be returned from the GroupBy. A GroupByLimit of 0 indicates no limit.

StartTimeColumn

StartTime

String

This is the column used to specify the source of timestamps for the start of each capsule.

EndTimeColumn

EndTime

String

This is the column used to specify the source of timestamps for the end of each capsule.

MaximumDuration

7d

String

This is the maximum length of time for capsules in the Condition(s) defined by this table. Longer capsules will be filtered out. This field cannot be null!

CapsuleProperties

An empty CapsulePropertyDescription

CapsulePropertyDescription

This field contains additional metadata that is attached to each capsule. See the JSON configuration above for examples.

CapsulePropertyDefinition Configuration

Property Name

Default Value

Data Type

Description

Name

null

String

This is the name of the capsule property.

Value

null

String

This is the value of the capsule property. It can be a static value such as "ABCDE" or defined based on table data by referencing the column defined in the Column field as $columnResult.

Column

null

String

This field provides a column for reference in Value when defining values dynamically.

Uom

null

String

This is the unit of measure for the capsule property values.

For columns of BigQuery type TIMESTAMP, setting the Uom to "string" will cause the capsule properties to be formatted as dates instead of numbers (like 2024-10-13T23:15:00.123456Z instead of 1.602630900123456E9).

Known Issues

  • The BigQuery connector supports the following data types as numeric: Int64, Float64 and Bool. All other data types treated as Strings.

Please report any other issues you find to our support portal.

Troubleshooting

Connect timed out

When the BigQuery connector is unable to connect to Google, the logs will contain entries that look like:

CODE
com.seeq.link.connectors.bigquery.data.BigQueryClient - isConnected: Exception error: Error getting access token for service account: Connect timed out

There may be a problem with Google’s service, or your network link between the Remote Agent where the connector is running and the BigQuery service (for example, a firewall that doesn't allow the Remote Agent to connect to BigQuery).

To troubleshoot, visit the links listed in your BigQuery configuration using a browser running on the Remote Agent and debug your network link.

BigQueryService is not initialized

If you see this error message then it is most likely a permissions issue, or the service account has been disabled or deleted.

View the logs for this connector for additional information:

CODE
ERROR2022-03-11T03:31:19.182Z[Connection: JVM Agent: ip-172-16-2-125: BigQuery: BQ.Samples: e84563ac-e3f3-e616-a5d3-275ce92edd5e] com.seeq.link.connectors.bigquery.data.BigQueryClient - isConnected: Exception error: Access Denied: Project bigquery-sandbox-342919: User does not have bigquery.jobs.create permission in project bigquery-sandbox-342919.

Contact a BigQuery admin and verify that the access key is correct, that the access key has not been disabled, and that the associated service account has Viewer access to the dataset.

If you are running into other issues with connecting to or access data from Google BigQuery, view our guide for troubleshooting datasource issues.

Performance considerations

The Google BigQuery connector does not have any special performance considerations. View our guide on optimizing datasource performance for general guidance.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.