[Prior Research Team, Jihyun Song]
The TadGAN algorithm developed by the MIT research team is known to have better performance than previously known models in detecting anomalies by analyzing time series data.
I know that many companies researching anomaly detection are currently researching using TadGAN in various fields (financial and aerospace, IT, security and medical fields).
Current state-of-the-art unsupervised learning methods for anomaly detection suffer from scalability and portability issues. In this regard, by introducing TadGAN, an unsupervised learning anomaly detection approach built based on a Generative Adversarial Network (GAN), it has a surprising effect different from the existing time series data analysis method, and many additional studies are being conducted in various places.
How TadGAN Works [Learning and Prediction]
Reconstructed data is built based on train data, and when new data comes in, it detects anomalies between the previously constructed reconstructed data and new data, calculates anomaly error score, and detects anomaly detection sections according to the threshold.
TadGAN whole process
1. Data Preprocessing
A sliding window is introduced to divide the original time series data into signal segments in order to obtain training samples in a fixed number of seconds.
2. Model training
Critic X: Training to distinguish between real data and data obtained by replacing random data with a generator / Critic Z: Training to distinguish between random data and data obtained by replacing real data with an encoder completes construction of reconstruction data.
You can see that the reconstructed data is well built based on the original data.
4. Error Calculation
Let us calculate the difference between the actual data and the reconstruction data. Critic Score is a score that expresses how close the given data is to the real thing by using the previously trained Critic score for error calculation.
5. Threshold setting
Set how much the red Error score below exceeds the threshold and detect the degree of Anomaly.
6. Anomaly detection
Timeseries Anomaly Detection GAN, a GAN model optimized for time-series data anomaly detection, outperforms other anomaly detection models and is recognized in various fields.
Limitations of TadGAN and Ways to Improve Performance
There is a big difference in performance depending on the setting, and the optimal setting value that varies depending on the data is shown as the limit of the current model.
It is also known that there is no way to evaluate model performance on unlabeled datasets with unknown anomaly .
In order to overcome this, performance evaluation is possible if a plan is prepared to evaluate the performance by applying the criteria to evaluate the distribution of anomaly scores such as mid-range itself.
By artificially generating labeled anomaly cases in raw data and optimizing them based on them, it is possible to optimize models that are rare and difficult.
I hope that there will be further developments in time series data analysis, which I thought was difficult, based on TadGAN in the future.
- Code link: https://github.com/signals-dev/Orion
- Reference: markr 2021 online seminar