29-31 Oct 2019 Nantes (France)
Training data requirements for SCADA based condition monitoring using artificial neural networks
Simon Letzgus  1@  
1 : Institute of Software Engineering and Theoretical Computer Science, Technical University of Berlin

SCADA data analytics has attracted considerable research interest for monitoring wind turbine condition without additional equipment (compare [1]). Above all, normal behaviour models using artificial neural networks (ANNs) have shown promising result [2]. In this approach an ANN is trained to predict SCADA parameters under healthy turbine conditions. Afterwards predictions are compared to measured values. Deviations that exceed a certain threshold indicate a component malfunction. One of the crucial requirements of the described approach is the availability of sufficient training data. The range of the training parameters should be as varied as possible while still representing normal turbine behaviour. Insufficient training data leads to frequent false alarms, which strongly affects the anomaly detection capabilities of the approach. However, the question of how much training data is actually required to train robust and reliable normal behaviour models has not been adressed in literature so far. In fact, contradictory statements regarding the required training data can be found. While some authors train there models with a few months of data only (for example [3,4,5]), others argue that at least one year of training data is required in order to capture seasonal effects [2]. This contribution aims to derive guidelines for selecting an appropriate amount of data for training ANNs for SCADA based condition monitoring. Consequently, the findings will be considered to derive robust training strategies for SCADA based condition monitoring using ANNs.

Certainly, the amount of required training data depends on the actual model architecture and the selected input parameters. Therefore, a selection of representative model architectures will be trained with variable training length. Model performance will then be evaluated for both, a healthy as well as a faulty turbine. In the healthy turbine case models can be compared based on common error measures, such as the root mean squared error. Moreover, it can be studied how training time affects model robustness against false alarms. The application of models to a turbine with component problems allows to study training time implications regarding the model's capability to detect malfunctions. First results based on recurrent neural networks suggest that longer training times do not always lead to smaller ANN errors. The implications of this counter intuitive insight will be analysed and presented. Moreover, it was observed that the time in between training and application period might play a more important role than the absolute lenght of the training data set. Therefore, a strategy based on periodic model re-training might be favourable over a static single training at the beginning of the condition monitoring period. However, these results have to be confirmed with additional simulations that are currently running.

References:
 [1] Tautz-Weinert and Watson, IET Renewable Power Generation, 11, 382 (2017)
 [2] Bangalore et al., Wind Energy, 20, 1421 (2017)
 [3] Zaher et al., Wind Energy, 12, 574 (2009)
 [4] Kusiak and Verma, , Renewable Energy, 48, 110 (2012)
 [5] Brandão and Carvalho, PowerTech Eindhoven (2015)


Online user: 1