Partial Least Squares (PLS)

What is Partial Least Squares (PLS)?

Imagine you're running a water treatment plant and want to understand how various operational parameters (like pH levels, flow rates, and chemical dosages) influence the quality of the treated water (e.g., clarity and contaminant levels). Partial Least Squares (PLS) helps by identifying the key relationships between these inputs and outputs, simplifying complex interactions into actionable insights.

PLS is a statistical method used to find relationships between two datasets by creating new features (called components) that summarize the most relevant information from both datasets. These components maximize the correlation between the predictor variables (e.g., operational parameters) and the response variables (e.g., water quality metrics).

How PLS Works for Analyzing Data in Industrial Applications

PLS is especially useful when dealing with datasets where the predictors and responses are numerous and highly correlated. It simplifies the data into components that capture the most meaningful relationships, making it easier to interpret and predict outcomes.

Example: Imagine monitoring a water treatment plant's operations. PLS can identify which parameters (e.g., flow rate, chemical dosage) most strongly affect water quality metrics like turbidity or contaminant removal, helping operators fine-tune the process for better results.

Using the Partial Least Squares Tool

Input Signals: From the dropdown menu, select at least one signal to serve as independent variable(s) for the model.

Training window: Select the fixed window of data on which the model will be trained. This defaults to the display range but can be modified.

Advanced Options:

Limit to condition within training window: You can choose to limit the data supplied for training to data from within the training window and within a condition. For example, choose a condition that identifies when an operation is running and all data outside of the condition (when the unit is not running) will be ignored for the model training.
Restrict output to be within a condition: You can limit the data displayed to only show results within a relevant condition. If you limit the model to periods of time when the unit is running, you would likely want to restrict the output to the same running condition. You can use different condition here than the training window if appropriate.

Number of components: Defaults to 2. It cannot exceed the number of input signals and too many components leads to overfitting. If input signals are removed, the number of components will automatically be reduced so the value does not exceed the number of signals. Consult rSquared in Model Properties to see how changing the components affects the model. To determine a reasonable number of components apply the model to data that is not part of the model training and use the Scatterplot to plot actual vs predicted Target signals. A good fit between actual and predicted indicates a robust model.

Additional training configuration:

Maximum number of iterations: Defaults to 500. PLS is an iterative algorithm where each component is the result of a number of iterations. Commonly a solution is found after less than 500 iterations. Increasing the maximum number of iteration will result in increased training time but can yield slightly better results.

Tolerance: Defaults to 1e-06 The tolerance is used as convergence criteria in the power method: the algorithm stops whenever the improvements to the new result are lower than a value proportional to the tolerance.

Scale: Defaults to true. By default all variables are centered and scaled to unit variance by subtracting signal average from each data point and divide by signal standard deviation. This is appropriate pre-treatment for process data since it will give an equal chance to all input signals to influence the model regardless of the actual variance or numerical values.

There are cases when unchecking this box is appropriate such as if all input signals have the same units and are already related to a shared reference other than the signal center.

Use Cases for Partial Least Squares in Key Industries

Manufacturing and Production Optimization
PLS helps manufacturers understand and improve the relationships between process parameters and product quality:

Process Control: Identifying which process variables (e.g., temperature, pressure) most impact product consistency.
Yield Optimization: Predicting and maximizing production output by adjusting key variables.

Pharmaceutical Development and Quality Control
In pharmaceutical environments, PLS enables the development and control of robust manufacturing processes:

Formulation Development: Linking ingredient properties with drug efficacy to optimize formulations.
Quality Assurance: Detecting process variations that may lead to out-of-specification products.

Chemical and Petrochemical Processes
PLS supports complex systems with many interacting variables by isolating the critical ones:

Process Optimization: Pinpointing which variables have the greatest impact on efficiency or yield in refining or chemical synthesis.
Emissions Reduction: Understanding the relationships between input variables and pollutant levels to minimize environmental impact.

Semiconductor Manufacturing
PLS helps in improving process stability and product reliability:

Defect Analysis: Linking sensor data with defect rates to identify root causes of failures.
Recipe Optimization: Relating process recipes to chip performance metrics for higher yields.

Energy and Utilities
PLS supports predictive analysis and optimization in energy generation and distribution:

Power Plant Efficiency: Identifying how changes in fuel composition or operating conditions impact efficiency and emissions.
Water Treatment Optimization: Relating operational parameters to water quality metrics, enabling fine-tuned processes for better results.

Benefits of Partial Least Squares

Handles Multicollinearity: PLS is robust when predictors are highly correlated, which is common in industrial data.
Dimension Reduction: It reduces the complexity of large datasets by creating fewer, meaningful components.
Predictive Power: PLS provides interpretable models that predict outcomes based on key relationships in the data.

A Few Limitations of PLS

While PLS is a powerful tool, it has some limitations:

Interpretability of Components: The new features (components) are combinations of the original variables, which may require extra effort to interpret.
Sensitivity to Noise: If the data contains noise, PLS components may capture irrelevant patterns, affecting predictions.
Requires Tuning: The number of components to include must be carefully selected to avoid underfitting or overfitting the model.

Notes on training the model

The data from the input signals that goes into training is auto gridded. This means that the samples from all the signals are aligned - same sampling rate across the training window. Each input signal will be resampled to have the same number of samples in the training window.

The signals may get down-sampled if the total number of samples in the training window is greater than 2.5 million.

Partial Least Squares (PLS)

Using the Partial Least Squares Tool

Notes on training the model

Training Accelerates Insight

How to Access Training

Calendar of Upcoming Trainings