How To: Create a Box Plot Add-on with Plotly and Solara

Box Plot Overview:

A Box Plot (or box-and-whisker plot) visualizes the distribution of a dataset using five key statistics: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

Box: Represents the middle 50% of data, with the median marked inside.
Whiskers: Extend to the smallest and largest values within 1.5 times the interquartile range (IQR).
Outliers: Data points outside the whiskers, representing unusual values.

Steps to Create a Box Plot Add-on in Data Lab:

Set Up Your Environment: Start a new Data Lab Project. Ensure you have the necessary Python packages installed (e.g., seeq, pandas, plotly, solara).
Import Libraries: Load the required libraries (seeq, pandas, plotly, etc.).
Create Helper Functions:
- parse_jupyter_notebook_url: Parses the notebook URL to retrieve workbook and worksheet IDs.
- create_df: Fetches worksheet data and creates a DataFrame.
- signal_info_to_dict: Converts worksheet item details into a dictionary.
Generate the Box Plot:
- Use the generate_chart function with plotly to create the box plot, customizing it with parameters like jitter, whisker width, and data point visibility.
Build User Interface with Solara:
- Use the BoxPlotComponent function to create an interactive UI in the notebook, allowing users to adjust plot settings.
Run the Component: Display the component in the notebook.
Converting to an Add-on: Finalize the notebook and convert it into an Add-on using the User Tool Creator in Seeq Data Lab.

Let’s Start…

1. Set Up Your Environment

Open up a new Data Lab Project and make sure you have the necessary Python packages installed. You'll need:

seeq
seeq-spy
pandas
plotly
solara

Data Lab will have seeq and spy already loaded in the project. Using the Terminal in your Data Lab Project you can install the other packages using pip if you haven't already:

BASH

pip install pandas plotly solara

The Data Lab Project will need to be restarted in order for Solara to work. Hit the “Quit” button to shut down the project.

2. Import Required Libraries

In your Jupyter notebook, start by importing the necessary libraries:

PY

from seeq import spy
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import solara
import os
import urllib.parse as urlparse
from urllib.parse import parse_qs
from requests.models import PreparedRequest

These libraries will be used to handle data, create visualizations, and build the user interface.

3. Create Helper Functions

Preprocessing the Jupyter Notebook URL

In certain workflows, there may arise a necessity to retrieve data associated with a Seeq worksheet. Typically, these data are transmitted as query parameters to the Add-on and affixed to the Add-on URL. The Add-on can be customized to encompass key parameters like workbookId, worksheetId, and workstepId. These parameters are then transmitted within the jupyter_notebook_url, which is automatically injected into all Data Lab Projects. If the Add-on is not initiated from a workbench, it will lack the specific worksheet parameters. This situation commonly occurs during Add-on development. To streamline development within Data Lab, it is recommended to incorporate a default worksheet URL.

Let’s create a parse_jupyter_notebook_url function that helps with this process. The purpose of this function is to parse the jupyter_notebook_url and return the workbookId and worksheetId.

This function takes two arguments:

jupyter_notebook_url (the URL of the Jupyter Notebook)
backup_workbench (the URL of a backup Seeq Worksheet with signals in the details pane).

If there are no query parameters in the URL, it assumes that the notebook is being not run as an Add-on (i.e. developing from the Project), and it uses the backup_workbench URL to extract the workbook ID and worksheet ID.

Here’s the function:

PY

def parse_jupyter_notebook_url(jupyter_notebook_url, backup_workbench):
    """
    Parses the Jupyter Notebook URL and returns the workbook ID and worksheet ID.

    Args:
        jupyter_notebook_url (str): The URL of the Jupyter Notebook.
        backup_workbench (str): The URL of a backup Seeq Worksheet with signals in the details pane.

    Returns:
        tuple: A tuple containing the workbook ID and worksheet ID.
    """
    parsed_url = urlparse.urlparse(jupyter_notebook_url)
    query_params = parse_qs(parsed_url.query)

    if not query_params:
        # Assuming it's not being run as an Add-on and there are no query parameters
        # Manually including defaults from the backup_workbench
        url = spy.utils.get_data_lab_project_url()
        workbook_id = spy.utils.get_workbook_id_from_url(backup_workbench)
        worksheet_id = spy.utils.get_worksheet_id_from_url(backup_workbench)
        params = {'workbookId': workbook_id, 'worksheetId': worksheet_id}
        req = PreparedRequest()
        req.prepare_url(url, params)
        parsed_url = urlparse.urlparse(req.url)
        query_params = parse_qs(parsed_url.query)

    workbook_id = query_params['workbookId'][0]
    worksheet_id = query_params['worksheetId'][0]

    return workbook_id, worksheet_id

Next, let's develop functions that extract data from the worksheets and retrieve additional information related to the signals, such as their respective colors. This data will be utilized in the subsequent creation of Box Plots.

Here's what the function does step-by-step:

It constructs the URL for the specific worksheet using the provided workbook ID, worksheet ID, and the Seeq server's host URL.
It sends a request to the Seeq server to retrieve a list of items (signals or data streams) associated with the specified worksheet. However, it filters out any items that have a string unit of measure, as those are likely not numerical data.
It then requests the entire workbook data from the Seeq server, but it specifies that it only needs the data for the specific worksheet identified by the worksheet_id

From the workbook data, it extracts the display settings (e.g., colors) for the items in the specified worksheet. It merges the list of worksheet items with their corresponding display settings (colors) into a single DataFrame.

It retrieves the display range (start and end times) for the specified worksheet.

Finally, it sends another request to the Seeq server to fetch the actual data values for the worksheet items within the specified display range.

The function returns this data as a pandas DataFrame of the signals and the worksheet items.

Here is the function:

PY

def create_df(workbook_id, worksheet_id):
    """
    Fetches worksheet data from a Seeq server and returns a DataFrame.

    Args:
        workbook_id (str): The ID of the workbook.
        worksheet_id (str): The ID of the worksheet.

    Returns:
        pandas.DataFrame: The fetched data DataFrame.
    """
    host = spy.client.host[:-3]
    url = f"{host}workbook/{workbook_id}/worksheet/{worksheet_id}"

    # Fetch worksheet items and filter out string units of measure
    worksheet_items = spy.search(url, quiet=True)
    worksheet_items = worksheet_items[worksheet_items['Value Unit Of Measure'] != 'string']

    # Pull the workbook and specific worksheet
    workbook = spy.workbooks.pull(workbook_id,
                                  include_inventory=False,
                                  include_annotations=False,
                                  include_referenced_workbooks=False,
                                  include_images=False,
                                  include_rendered_content=False,
                                  specific_worksheet_ids=[worksheet_id],
                                  quiet=True)[0]

    worksheet = workbook.worksheets[0]

    # Update items to include the axis color
    display_items = worksheet.display_items[['ID', 'Color']]
    worksheet_items = pd.merge(worksheet_items, display_items, on='ID')

    # Get Display Range
    display_range = worksheet.display_range

    # Fetch data for the selected worksheet items
    df = spy.pull(worksheet_items,
                  start=display_range['Start'],
                  end=display_range['End'],
                  header="ID",
                  grid=None,
                  quiet=True)

    return df, worksheet_items

All columns in a Pandas Dataframe must have unique names. Nevertheless, there may be instances where we need to analyze multiple signals with identical names but originating from distinct Assets. In such cases, we retain the signal's ID as the Pandas dataframe header and create a comprehensive map of all signal information based on its ID utilizing the worksheet_items. Below is a function designed to facilitate this process.

Here is the function:

PY

def signal_info_to_dict(worksheet_items):
    """
    Creates a signal map dictionary from the worksheet items.

    Args:
        worksheet_items (pandas.DataFrame): The worksheet items DataFrame.

    Returns:
        dict: The signal map dictionary.
    """
    signal_map = worksheet_items.set_index('ID').apply(lambda x: x.to_dict(), axis=1).to_dict()
    return signal_map

This function processes a pandas DataFrame that holds details about items in a worksheet, generating a dictionary that links each item's distinct identifier (ID) to its attributes such as name and color.

A comprehensive function can be crafted to merge these components. It accepts the jupyter_notebook_url and its backup (backup_workbench) as inputs, and yields the df and signal_info essential for producing the Box Plot.

PY

def get_worksheet_data(jupyter_notebook_url, backup_workbench):
    """
    Fetches worksheet data and signal map from a Seeq server based on the provided Jupyter Notebook URL.

    Args:
        jupyter_notebook_url (str): The URL of the Jupyter Notebook.
        backup_workbench (str): The URL of a backup Seeq Worksheet with signals in the details pane.

    Returns:
        tuple: A tuple containing the fetched data DataFrame and the signal map dictionary.
    """
    workbook_id, worksheet_id = parse_jupyter_notebook_url(jupyter_notebook_url, backup_workbench)
    df, worksheet_items = create_df(workbook_id, worksheet_id)
    signal_info = signal_info_to_dict(worksheet_items)

    return df, signal_info

Be sure to use the backup_workbench to the url of a worksheet with signals on it for development.

4. Create a Function to Generate the Box Plot

Plotly will be used to generate the figure. The function generate_chart is responsible for creating the box plot. Here’s a simplified version of the function:

PY

def generate_chart(df, signal_map, jitter=1, whiskerwidth=1, show_box_points=False, title=None, height=500, width=1200):
    """
    Generate a box plot chart using Plotly.

    Args:
        df (pandas.DataFrame): DataFrame containing the data to be plotted.
        signal_map (dict): Dictionary mapping column names to asset, name, and color information.
        jitter (float): Amount of jitter to apply to the box points. Defaults to 1.
        whiskerwidth (float): Width of the whiskers. Defaults to 1.
        show_box_points (bool): Whether to show individual box points. Defaults to False.
        title (str): Title of the chart. Defaults to None.
        height (int): Height of the chart in pixels. Defaults to 500.
        width (int): Width of the chart in pixels. Defaults to 1200.

    Returns:
        plotly.graph_objects.Figure: A Plotly figure object containing the box plot chart.
    """
    fig = go.Figure()

    # Add box traces for each column in the DataFrame
    for name, series in df.items():
        fig.add_trace(go.Box(
            y=series,
            name=f"{signal_map[name]['Asset']} >> {signal_map[name]['Name']}",
            marker_color=signal_map[name]["Color"],
            boxmean="sd",
            boxpoints='all' if show_box_points else None,
            jitter=jitter,
            whiskerwidth=whiskerwidth
        ))

    # Update layout properties
    fig.update_layout(title=title, height=height, width=width)
    return fig

This function processes the input data (df), along with the signal information (signal_info), and offers multiple optional parameters to tailor the visual presentation of the plot.

8. Create the User Interface Components

To create an interactive user interface, you can use Solara. The BoxPlotComponent function sets up controls and renders the chart:

PY

@solara.component
def BoxPlotComponent():
    # State variables
    whisker_width, set_whisker_width = solara.use_state(1)
    show_data_points, set_show_data_points = solara.use_state(False)
    selected_jitter, set_jitter = solara.use_state(1)  
    error_message, set_error_message = solara.use_state(None)

    def validate_whisker_width(value):
        if value > 1:
            set_error_message("Whisker width cannot be greater than 1")
        else:
            set_error_message(None)
            set_whisker_width(value)

    def render_controls():
        with solara.HBox() as controls:
            solara.Select(
                label="Show Samples",
                values=[True, False],
                value=show_data_points,
                on_value=set_show_data_points,
            )

            if show_data_points:
                solara.Select(
                    label="Jitter",
                    values=[0, 0.1, 0.5, 1],
                    value=selected_jitter,
                    on_value=set_jitter,
                )

            solara.InputFloat(
                label="Whisker Width",
                value=whisker_width,
                on_value=validate_whisker_width,
            )

        return controls

    def render_chart():
        fig = generate_chart(
            df=df,
            jitter=selected_jitter,
            show_box_points=show_data_points,
            signal_map=signal_info,
            whiskerwidth=whisker_width,
            height=800,
            width=1200,
        )
        solara.FigurePlotly(fig, dependencies=[selected_jitter, show_data_points, whisker_width])

    with solara.Div() as main:
        with solara.AppBarTitle():
            solara.Text("Box Plot")

        if error_message:
            solara.Error(
                f"{error_message}",
                text=False,
                dense=True,
                outlined=False,
                icon=True,
            )

        render_controls()
        render_chart()

    return main

This Solara component adds controls to adjust the jitter, whisker width, and visibility of data points in the box plot.

6. Run the Component

Finally, ensure your component is displayed in the notebook:

PY

BoxPlotComponent()

This will render the interactive box plot with controls to tweak the appearance of the chart.

Making it an Add-on

Upon finalizing the notebook, the next step is to transform it into an Add-on. The most effective method to achieve this is by utilizing the User Tool Creator. To proceed, an Administrator must install this tool and grant you the necessary permissions. The Target URL for the Add-on corresponds to the Add-on Mode URL, accessible by selecting the Add-on Mode button located at the notebook's top.

Congratulations you have just created a Box Plot Add-on!

Here is everything all together in a Jupyter Notebook:

box_plot.ipynb