Skip to main content
Skip table of contents

Amazon Athena

JVM AGENT

Item Type Support

SIGNALS SCALARS CONDITION

Overview

The SQL Connector enables Seeq to access data from Amazon Athena.

Prerequisites

AWS Region

You will need the region in which your Athena service is running. This can be found in the AWS CLI or the Console.

Authentication

You can authenticate with username and password or with AWS session token credentials.

The IAM permissions required are as follows:

CODE
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
				"athena:CreatePreparedStatement",
				"athena:StartQueryExecution",
				"athena:GetQueryResultsStream",
				"glue:GetTables",
				"glue:GetPartitions",
				"athena:UpdatePreparedStatement",
				"athena:GetQueryResults",
				"glue:BatchGetPartition",
				"athena:DeletePreparedStatement",
				"glue:GetDatabases",
				"athena:GetPreparedStatement",
				"glue:GetTable",
				"glue:GetDatabase",
				"glue:GetPartition",
				"athena:GetQueryExecution",
				"athena:ListPreparedStatements"
			],
			"Resource": "<put your resource scope here>"
		},
		{
			"Sid": "VisualEditor1",
			"Effect": "Allow",
			"Action": [
				"s3:PutObject",
				"s3:GetObject",
				"s3:ListBucketMultipartUploads",
				"s3:AbortMultipartUpload",
				"s3:CreateBucket",
				"s3:ListBucket",
				"s3:GetBucketLocation",
				"s3:ListMultipartUploadParts"
			],
			"Resource": [
				"arn:aws:s3:::<your bucket name>/*",
				"arn:aws:s3:::<your bucket name>"
			]
		}
	]
}

If you use username and password the Key ID for an IAM user will be the username and the Key Secret will be the password.

If you use session token credentials, the Seeq Remote Agent must be configured with an AWS credentials file stored in the home directory of the Seeq Windows Service account user. The credentials file should be placed in the /.aws subfolder in the home directory. For example, if the Seeq Windows service account is Seeq-User, then the credentials file should be placed in C:\Users\Seeq-User\.aws. The credentials file has no file extension.

Configure the credentials file profile details.

CODE
[seeqprofile]
aws_access_key_id=ASIAXXXXXXXXX
aws_secret_access_key=XXXXXXXX
aws_session_token=XXXXXXXXXXXXXXXXXX

The Seeq Athena connection is configured to use session token authentication through the addition of two new parameters in the JdbcProperties field. The new parameters will instruct the Seeq Athena JDBC connection to use the session token AwsCredentialsProviderClass of com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider and direct the connector to the named profile configured in the credentials file. When using session token authentication, the Username and Password fields are ignored, but set them to null for clarity.

CODE
"Username" : null,
"Password" : null,
"JdbcProperties":
{
    "AwsCredentialsProviderClass": "com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider",
    "AwsCredentialsProviderArguments": "seeqprofile"
}

S3 Output Location

Athena writes the query results to an S3 bucket, this can be identified in the AWS console.

Configuration

This is an example configuration template that is displayed in the Additional Configuration box that appears when you click Configure for an existing datasource (or if a new datasource is being created, in the Create new datasource connection modal that appears after clicking Add Datasource) on the Datasources administration page.

JSON
{
    "Name" : "Athena",
    "Id" : "dc728d73-8830-4446-ba09-0c6603dd2e47",
    "Enabled" : true,
    "Type" : "ATHENA",
    "Location" : null,
    "Hostname" : null,
    "Port" : null,
    "DatabaseName" : null,
    "Username" : "TESTACCESSKEY",
    "Password" : "123432/2sdf224t545dk",
    "UseWindowsAuth" : false,
    "AwsRegion" : "us-east-1",
    "S3OutputLocation" : "s3://path-to-bucket/queryResults/",
    "InitialSql" : null,
    "TimeZone" : "America/Los_Angeles",
    "PrintRows" : false,
    "UseResultsetStreaming" : 1,
    "RowsToFetchPerBlock" : 500000,
    "JdbcConnectionStringOverride": null
  }
Standard SQL Additional Configuration

Property Name

Default Value

Data Type

Description

QueryDefinitions

null

Array[QueryDefinition]

The definition for how Seeq should query for data. If your hostname is of the form "abc\def", you will have to escape the backslash like so: "abc\\def".

Hostname

null

String

The hostname of your datasource.

QueryDefinitionExpansionLimit

1,000,000

Integer

The maximum number of signals that can be indexed from a single query definition. This value is here to protect against incorrect query definitions producing many millions of invalid signals.

Port

0

Integer

The port for the JDBC Connection.

Database Name

null

String

Optional: Can be defined here or as part of a fully qualified table name in the QueryDefinition.

Username

“"

String

The user name

Password

null

String/SecretFile

The user password.

JdbcConnectionStringOverride

null

String

Optional: Can be specified if you have a known, functioning JDBC connection string. If specified, Hostname, Port and Database Name need not be specified.

InitialSql

null

String

Optional: A SQL command that would be run one upon establishing a connection.

TimeZone

null

String

Optional: The time zone to use for timestamp or datetime columns. For example, to set this to US Pacific Time, you would use America/Los_Angeles.

PrintRows

false

Boolean

The rows from the SQL query will be printed to the jvm-link log. This is for debugging purposes only, and should be set to false in normal operation.

UseWindowsAuth

false

Boolean

Note: This is not available for all Database Types. If you are using a database type that supports Windows Authenication, you will need to ensure that the remote agent is running as the correct user.

Time Zone 

Some SQL date and/or time column types have no zone information. The TimeZone field is available to specify the time zone that Seeq should use for data coming from columns types that have no time zone information of their own. UTC offsets (+01:00, -10:30, etc.) and IANA regions (America/Los_Angeles) are accepted. If no time zone is specified, Seeq defaults to the local region of the Seeq server. If your data was stored in UTC time, set this field to "UTC" or "+00:00". If your data was entered using a "wall clock", set this to the IANA time region of the "wall clock". Note that offsets are constant throughout the year whereas a region may observe daylight savings time. If you used a wall clock in a location that observes daylight savings time, a region is a better choice than an offset for this field. A list of IANA regions (tz database time zones) can be found here.

Extended Athena Additional Configuration

Property Name

Default Value

Data Type

Description

AWSRegion

 null

String

The AWS region of your service. This is in the form of us-west-1.

S3OutputLocation

null

String

The S3 location where the query results are stored.

UseResultsetStreaming

0

Integer (either 0 or 1)

If set to 1, the streaming API is used. The streaming API requires an outbound connection on port 444 in addition to the 443 port.

RowsToFetchPerBlock

1000

Integer

This is the number of rows to fetch. When using “UseResultsetStreaming" : 0 the JDBC driver internally limits this value to 1000. When using “UseResultsetStreaming" : 1 this value is uncapped, but should be set so that memory issues do not become a problem on smaller agents.

Known Issues

There are no known issues for the SQL Connector. Please report any issues you find to our support portal.

Troubleshooting

A couple of errors that you may encounter are:

String-valued samples are prohibited in numeric-valued signal

If the y-axis value of the signal is a string, then the Value Unit Of Measure property is required and must be set to "string". See Example 2. 

Since the Value Unit Of Measure is different for string and numeric signals, it may be easiest to write one query definition for the numeric signals and write another for the string signals. Alternatively, the Value Unit Of Measure property could be set according to an SQL IF statement similar to the technique used in Example 13.

Samples must be ordered by their keys

If this is occurring when trending near the daylight savings transition, this is an indication that the TimeZone is not configured properly. For example, if TimeZone is set to "America/Los_Angeles", this means that the timestamp data in the SQL table was recorded using "America/Los_Angeles" time (Pacific) which observes daylight savings. During the spring daylight savings transition, time skips from 01:59:59.9 to 03:00:00.0 which means that the 02:00 hour doesn't exist and therefore there should be no data in the SQL table during that 02:00 hour. Any data in the 02:00 hour is interpreted as being in the 03:00 hour. If there is also data in the 03:00 hour, the samples will be out of order. 

Original data:

01:15, 01:45, 02:15, 02:45, 03:15

After accounting for non-existent 02:00 hour:

01:15, 01:45, 03:15, 03:45, 03:15

If data exists in the 02:00 hour, it must mean it was either recorded in error or was recorded in a time zone that doesn't observe daylight savings such as UTC or a constant offset from UTC.

For more information, see the TimeZone field in the Configuration section above.

If you are running into issues with connecting to or access data from SQL Connector, view our guide for troubleshooting datasource issues.

Performance considerations

  • The performance of this connection is highly dependent on the partitioning scheme used within the S3 bucket

  • You will want to ensure your partitioning scheme matches your query to limit the amount of data scanned per query.

View our guide on optimizing datasource performance for general guidance.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.