AuthorMichael Kirschneck
Earliest Software Version0.3.2
Date2024-02-21

Algorithm Logic

The algorithm logic can be summarized as such:

The alarm computes the mean for each of the sensors in the site. It then executes an ML outlier detection based on the data of the last 7 days. Each turbine is a sample and the features are the 10 minute data of the sensor. The outlier algorithm, checks the times series of each sensor and compares it to the neighbors.

Each outlier is a potential alarm. The algorithm filters out the potential alarm that don't meet several criteria such as duration and severity. The remaining outliers are returned as alarms.

Function algorithm_run:

The algorithm computes the average of the sites for each of the sensors it analysis. In this step it also looks for stuck values, where a sensor is flagged as stuck if the value doesn't change for 10 consecutive datapoints.

The algorithm iterates over the last 7 days of data (can be set by threshold)

For each of these iterations the algorithm calls the function issue_detection

In the function issue_detection the algorithm:

  1. Iterate over each of each sensor that it analysis
  2. Adds for each of the sensors related columns to the data set
  3. Calls the function alarm_generation

The function alarm_generation calls several algorithms that determine if an alarm is present. The names used in the code are confusing and have nothing to do with the actual logic.

There are four types of alarms:

The "blade motor" alarms are created in the function alarm_blade_motor. The second type of alarm checks for values out of static bounds. This is done in the alarm_limit function.

The third alarm check is the "low_alarm" which is created in the function low_temp_alarm. The fourth alarm creation type is not in a separate function but based on an ML based outlier detection. The outlier detection is done in the function abnormal_func which is also used by the other algorithms.

After the potential alarms are identified, they go through a filter process in the function get_alarm_turbines_sensor. The filtered alarms are returned by the function alarm_generation.

The function low_temp_alarm groups the turbines in the site by turbine technology, then it executes an outlier detection by calling abnormal_func. In case that no alarm is found the outlier detection is repeated with so called "related_columns" these are additional sensor values from other sensors related to the current sensor, being analyzed.

The function abnormal_func represents an outlier detection. The samples are the data from the various turbines in the site for the sensor that is being analyzed. The outlier detection identifies,thus, the turbine that shows abnormal sensor values for the last 7 days for the current sensor being analyzed. The function returns the anomaly scores created by either the isolation forest method or the LocalOutlierFactor method.

The function get_alarm_turbines_sensor goes through the outliers detected in the functions before and adds the data that comprises an alarms (date_occurred, issue_detection_date etc.) It filters out the alarms for which the deviation from the mean of the sensor value is either not long enough, i.e. does not last long enough or is not significant enough. To do that it calls get_deviation_hc.

The function get_deviation_hc filters out alarms based on the severity of their deviation. That includes duration a well as deviation.

Anomaly detection, anomaly detection

These include blade motor alarms, low alarms and high alarms

heck he algorithm:

  1. Check first for alarms of the blade motor by calling get_alarm_turbines_sensor

  2. Checks for low alarms by calling low_temp_alarm

  3. Checks for high and low alarms

  4. Calls the function get_alarm_turbine_sensor

  5. Returns the alarms and the scores

  • The algorithm iterates over every day in the given time period and works with the data of the last 7 days

  • Then it iterates over every sensor

  • For each sensor related columns are added to the data set that is used for alarm identification

  • Then it goes through several identification methods. Not all identification methods are used for all sensors

    • Blade motor
    • Low alarm
    • High alarm
    • Over-limit identification

Low sensor alarm

  • The algorithm groups the data by turbine type

  • For each turbine type an out lier detection is done. Either isolation forest or LocalOutlierFactor. Scikit learn is used for this

  • In case the first outlier detection could not find anything the outlier detection is executed on the related data

  • The outlier detection sees ever turbine as a sample and every 10 min value of the day as a feature. The outlier detection is thus done compared to the values of the other turbines on the same day

  • The outlier detection is skipped if less thant 5 turbines are available and less than 30 percent of the data in the 7 days currently used

Over-limit identification

  • The algorithm compares the data with static alarm limits
  • The alarm limits are set in the threshold json file

High alarm

  • Uses the same outlier detection as the low alarm but does not use the related sensors

Afterwards

  • The identified anomalies are rejected based on various conditions

  • The anomalies have to be active for 60 % of the data of a week (can be changed in thresholds by changing threshold_percent_abnormal)

  • An anomaly is defined by a deviation from the mean of the site by more than 8 deg (can be changed in thresholds by changing threshold_temp)

  • In case the above is satisfied some more test are done

  • The alarm has to be active today. That is determined by a rolling average which needs at least 15 data points in a day. That are 15 hours ofwhich the mean deviation has to be high or lower than the mean. The same direction as the 7 days average deviation