Skip to content

FSNet may not beat the naive model #4

@HappyWalkers

Description

@HappyWalkers

I have done some experiments with the FSNet and it works well. However, after plotting the ground truth and the prediction made by FSNet, I realized that the naive model, simply shifting the truth by one step left, is also a strong model. By shifting the truth only one step, I mean using only the latest data point to predict the future data point. Astonishingly, the naive model beats FSNet on the ETTh2 dataset! I don't have the code and results now. But roughly speaking, the MSE of the naive model on the ETTh2 dataset is 0.40 while the MSE of FSNet is 0.466. I believe the results are easy to reproduce.

Another interesting thing is the precision achieved by the backbone TS2Vec is way better than the one achieved by FSNet. The MSE for univariate forecasting on the ETTh2 dataset with a forecasting horizon of 24 achieved by TS2Vec is 0.090, while the MSE on the same dataset and horizon achieved by FSNet is 0.687. The following tables are reported in FSNet and TS2Vec papers respectively.
Screenshot from 2023-09-03 11-14-07
Screenshot from 2023-09-03 11-06-43

The comparison between TS2Vec and FSNet is not surprising because FSNet assumes the streaming data and gives up the usage of batch training. However, FSNet also is defeated by the naive model, which is a little embarrassing. The naive model is pretty strong in the online learning task because it can rapidly adapt to the abrupt changing points of the time series hence reducing the prediction error. The reason behind the rapid learning feature of the naive model is that the naive model is only one step slower than the target time series while the neural network still needs multiple steps to follow the abrupt changes. (I had a pretty good figure to explain this but the figure is not available now.) The neural network is sensitive to abrupt changes in online learning situations because the loss would be huge when the abrupt changes happen and the network is forced to adapt to it to reduce the error. However, the fixed step length limits the adaptability of the network, leading to a dilemma of fast learning and overreaction. (I also had a good figure but ...)

A good evidence of the strong naive model and the dilemma is the sequence length of the time series used to predict. Although FSNet and its backbone, the TS2Vec, claim to use multiple past time steps to predict the future value. The regressor of their network only takes in the last intermediate representation to predict the future value. The statement in TS2Vec is as follows,
Screenshot from 2023-09-03 11-37-51

The corresponding code in TS2Vec and FSNet is as follows,

https://github.com/yuezhihan/ts2vec/blob/main/tasks/forecasting.py
Screenshot from 2023-09-03 11-46-10

https://github.com/salesforce/fsnet/blob/main/exp/exp_fsnet.py
Screenshot from 2023-09-03 11-47-35

I also tried using all the intermediate representations to predict the future values but it turns out the precision is worse than predicting with only the last representation. This result breaks my belief that the prediction with longer sequence input will perform better. I guess the reason is that the longer sequence input makes the fast-learning feature hard to achieve and makes the sequence mapping between the past and the future hard to learn. I guess that's why TS2Vec uses only the last hidden representation.

The FSNet paper precisely describes the difficulties of online learning and the dilemma between fast learning and persistent memory. The one-batch training is computation-efficient but the performance could be improved further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions