Time Series Anomaly Detection with Reinforcement Learning
Yonsei university Artificial Intelligence 9th 이상민
Summer, 2022
I thought the time seires anomaly detection and sparse reward problem of reinforcement learning had analogy. Many cases(time stamps) are not anomaly, so if our agent could get rewards only at anomaly points, time series anomaly detection would be sparse reward problem. I tried using Intrinsic Curiosity Module, which uses intrinsic rewards to solve the sparse rewards problem. In the beginning, to make the time series anoamly detection task a sparse reward problem, I approximated TN reward and FP reward to zero, which just sent positive (+
So, I must re-approach to solve this problem. I have put the encoder out from ICM and make it shared by DQN and ICM. Encoder(I use LSTM) is trained with inverse model in ICM by supervised learning and can extract proper features. With these features, Q-function in DQN approximates Q-value better.
I use two buffers; Anomalous buffer and Normal buffer. Agent's anomalous exprience is memorized in Anomalous buffer and normal experience in Normal buffer. When training, the agent samples batch(
-
Easy to control : Reward
- if you need to never miss anomalies, then just give more negative reward to False Negative
-
Sampling anomaly data as much as you want : Replay Buffer
- One of the reasons anomaly detection problem is so hard is excessively unbalanced data to train the model. By separately memorizing anomalous and normal experinces, we can control the ratio of anomaly data in batch when training.
- LSTM used as encoder
- Action taken by eps-greedy policy with Q-value
- Total Reward = Intrinsic Reward(agent's prediction error) + Extrinsic Reward(environment)
- encoder trained with inverse model in ICM
- Store experiences in two buffers. anomalous experiences in one buffer and normal experiences in the other
- Sampling anomaly data and not anomaly data from each buffers
- batch =
$\alpha \times$ batch size +$(1-\alpha) \times$ batch size where$\alpha$ is anomaly samples ratio that you want
F1-Score : Harmonic mean of Precision and Recall
Super-state
- Yahoo A1 Benchmark
Real(traffic to Yahoo services)
time-series representing the metrics of various Yahoo services.
- Yahoo A2 Benchmark
Synthetic(simulated)
- AIOps KPI
For Time series (labeled) Anomaly detection Datasets from AIOps Challenge
python test.py
dataset
A1Benchmark
real_#.csv
A2Benchmark
synthetic_#.csv
AIOps
KPI.csv
datasets
KPI.py
Yahoo.py
build_data.py
util
ExperienceReplay.py
metric.py
sliding_window.py
models
agent.py
env.py
model.py
pretrained
Super-state
main.py
test.py
config.py








