In [1]:
import math
import re
import textwrap
from datetime import datetime, timedelta
from seeq.sdk import *
import pandas as pd
from IPython.display import display, HTML

# Monitoring your Data Usage

As a Seeq admin or champion of its use within your organization, you may wish to be alerted when significant usage occurs over a particular time period. This can highlight the value that users are realizing by using Seeq but can also highlight areas where calculations could be optimized to be more efficient (and therefore faster).

This notebook uses the Seeq SDK to look at recent usage (the same as what is shown in the **Usage** tab of the Administration page) and notify via email if a (customizable) threshold is exceeded.

## Your preferences

Change the values you see in the code block below to customize this script to your preferences.

In [2]:
# How far back do you want to look each time this script runs?
look_back = timedelta(days=7)

# For what threshold do you want to notify the stakeholders?
threshold = '30 GB'

# Who do you want to notify?
emails_and_names = [
    ('jane.doe@mycompany.com', 'Jane Doe')
]

# What's the email subject line?
email_subject = 'Seeq Data Usage Report'

## Utility functions

We define here some utility functions to help us make things look good in the email.

In [3]:
def humanized(bytes_processed) -> str:
    """
    Turns a raw bytes number into a human-readable representation with units like MB or GB.
    """
    if bytes_processed == 0:
        return '0 B'

    suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
    i = int(math.floor(math.log(bytes_processed, 1000)))
    p = math.pow(1000, i)
    s = int(bytes_processed / p)
    return '%s %s' % (s, suffixes[i])

def to_bytes(s: str) -> int:
    """
    Opposite of humanized()
    """
    matcher = re.match(r'([\d\.]+)\s*(\w?)B', s)
    num = float(matcher.group(1))
    order = matcher.group(2).upper()
    unit_powers = {'': 0, 'K': 1, 'M': 2, 'G': 3, 'T': 4, 'P': 5, 'X': 6}
    power = unit_powers[order]
    return int(num * math.pow(1000, power))

def link(row):
    """
    Creates an HTML link if the Source URL is available in the DataFrame row.
    """
    if row["Source URL"]:
        return f'<a href="{row["Source URL"]}">{row["Source"]}</a>'
    else:
        return row["Source"]

## Query the usage

Seeq exposes a Usage API, which is the same API that powers the _Usage_ tab in the Seeq administration page.

We'll use it here to query for the usage over the time period that was specified at the top.

In [4]:
usage_api = UsageApi(spy.client)

start = (datetime.utcnow() - look_back).isoformat() + 'Z'
end = datetime.utcnow().isoformat() + 'Z'

usage_output_list = usage_api.get_usage(
    start_time=start,
    end_time=end,
    aggregate_by=['User', 'Source']
)

## Create a DataFrame

Let's create a DataFrame of the output so it's easy to manipulate the information.

In [5]:
usage_df = pd.DataFrame([{
    'User': c.identity,
    'Source': c.source_label,
    'Source URL': f'{spy.session.public_url}{c.source_url}' if c.source_url is not None else '',
    'Bytes': c.bytes
} for c in usage_output_list.content])

usage_df = usage_df.sort_values(by=['Bytes'], ascending=False)
usage_df['Data Processed'] = usage_df['Bytes'].apply(humanized)
usage_df.head(10)

Unnamed: 0,User,Source,Source URL,Bytes,Data Processed
0,Che Tse,Autoscaling - every 2min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,918086509504,918 GB
1,Che Tse,Autoscaling - every 3min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,546826414512,546 GB
2,Che Tse,Autoscaling - every 5min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,419919067232,419 GB
3,Che Tse,Autoscaling - every 10 min starting at :00,https://develop.seeq.dev/view/worksheet/0EE5D6...,319633503664,319 GB
4,AF Data Reference,,,1358956928,1 GB
5,John Cox,CLPM Demo June 2022 - Copy - Site CLPM,https://develop.seeq.dev/view/worksheet/0EDB6F...,258146432,258 MB
6,Sean Tropsa,Enterprise Demo - Daily Monitoring View,https://develop.seeq.dev/view/worksheet/E71F77...,176458688,176 MB
7,Sepide Zakeri,Process Health Solution Dashboard - Seeq ML en...,https://develop.seeq.dev/view/worksheet/5DB566...,125598672,125 MB
8,Mark Suchomel,Hydrogen Production via SMR - GREET Model Emis...,https://develop.seeq.dev/view/worksheet/0EE476...,116739136,116 MB
9,Sepide Zakeri,Process Health Solution Dashboard - Seeq ML en...,https://develop.seeq.dev/view/worksheet/5DB566...,60235200,60 MB


## Filter down

We only care about the rows that exceed our threshold, so filter out any other rows.

In [6]:
exceeds_threshold_df = usage_df[usage_df['Bytes'] > to_bytes(threshold)]
exceeds_threshold_df

Unnamed: 0,User,Source,Source URL,Bytes,Data Processed
0,Che Tse,Autoscaling - every 2min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,918086509504,918 GB
1,Che Tse,Autoscaling - every 3min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,546826414512,546 GB
2,Che Tse,Autoscaling - every 5min at various hours,https://develop.seeq.dev/view/worksheet/0EE5D6...,419919067232,419 GB
3,Che Tse,Autoscaling - every 10 min starting at :00,https://develop.seeq.dev/view/worksheet/0EE5D6...,319633503664,319 GB


## Prepare an email

We want to populate the email with the relevant information.

In [7]:
look_back_days = float(look_back.total_seconds()) / 86400.0
table_rows = '\n'.join([
    f'<tr><td>{row["User"]}</td><td>{link(row)}</td><td>{row["Data Processed"]}</td></tr>'
    for _, row in exceeds_threshold_df.iterrows()])
email_body = textwrap.dedent(f"""
    <p>Data usage for a particular User and Source exceeded threshold of {threshold}:</p>
    <table bgcolor='#EEEEEE' border=1>
      <tr><td>User</td><td>Source</td><td>Data Processed</td></tr>
      {table_rows}
    </table>
    <p>Log into Seeq and go to the <strong>Usage</strong> tab of the Administration page if desired.</p>
    <p>Time period: Last {look_back_days} day(s)</p>
""").strip()

## Send the email (if necessary)

If there was one or more rows in the filtered DataFrame, send an email.

In [8]:
notifier_api = NotifierApi(spy.client)

email_result = ''
if len(exceeds_threshold_df) > 0:
    email_result = notifier_api.send_email(body=SendEmailInputV1(
        to_emails=[
            SendEmailContactV1(email=pair[0], name=pair[1])
            for pair in emails_and_names
        ],
        subject=email_subject,
        content=email_body
    ))
    
email_result

{'status_message': 'The email was accepted'}

### Sending the email via Sendgrid

If you are not using Seeq's SaaS service, the `NotifierApi` is likely not set up. The cell below contains code that sends the email via Sendgrid, a common and easy-to-use emailer service. You will need a Sendgrid API key which, depending on current pricing plans, may require a purchase.

You will need to `pip install sendgrid` to bring the Sendgrid API into this Data Lab project.

To use this Sendgrid option, copy the code below into a cell and execute it. You will likely also want to delete the cell above that attempts to send via Seeq's `NotifierApi`.

**Note: Your Seeq Data Lab Server will need to have internet access to the Sendgrid service.**

```
pip install sendgrid
```

```
from sendgrid import SendGridAPIClient, Attachment, FileContent, FileType, FileName, Disposition
from sendgrid.helpers.mail import Mail
```

```
def send_mail_via_sendgrid(sendgrid_api_key: str, sender: str, recipients: list, subject: str,
                           html: str, attachment: str = None):
    message = Mail(
        from_email=sender,
        to_emails=recipients,
        subject=subject,
        html_content=html)

    if attachment is not None:
        with open(attachment, 'rb') as f:
            data = f.read()
            f.close()
        encoded_file = base64.b64encode(data).decode()

        mime_type, _ = mimetypes.guess_type(attachment)
        attached_file = Attachment(
            FileContent(encoded_file),
            FileName(os.path.basename(attachment)),
            FileType(mime_type),
            Disposition('attachment')
        )

        message.attachment = attached_file

    SendGridAPIClient(sendgrid_api_key).send(message)
```

```
# This is a bogus key, just for demostration purposes. It will need to be replaced by a real key.
# Log into your Sendgrid account and navigate to Settings > API Keys to create a new key.
sendgrid_api_key = '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'

send_mail_via_sendgrid(
    sendgrid_api_key,
    'jane.doe@mycompany.com',
    ['john.doe@mycompany.com'],
    email_subject,
    email_body
)
```

## Run this notebook on a schedule

You likely want to have this notebook run in the background, on a schedule. Modify the schedule text below as desired.

In [9]:
spy.jobs.schedule('every day at 6:00am')

0,1,2,3
,Schedule,Scheduled,Next Run
0.0,every day at 6:00am,At 06:00 AM,2023-10-03 06:00:00 PDT


Unnamed: 0,Schedule,Scheduled,Next Run
0,every day at 6:00am,At 06:00 AM,2023-10-03 06:00:00 PDT
