LSTM Timeseries prediction example #1532

NicoZweifel · 2024-03-26T13:18:03Z

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Changes

Adds a timeseries forecasting example using the LSTM that was added in Feat/lstm #370, using a Partial Dataset from Huggingface.
The Dataset is limited to 10000 entries at the moment. Training on the full Dataset seems to be buggy still. I am not sure if the normalization is messed up or if there might be a memory limitation or bug with the SqliteDataset.

I have not narrowed it down yet as I am using custom Datasets on my other burn project and they work fine (InMemory with data from alphavantage). I might need to spend some more time on it to figure it out but since it doesn't block me in my other goals and the example seems to work with 10000 entries I though I could publish this as a draft for now.

Testing

cargo run --example lstm --features tch-cpu

codecov · 2024-03-26T15:01:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.31%. Comparing base (7705fd9) to head (9dedb3f).
Report is 59 commits behind head on main.

❗ Current head 9dedb3f differs from pull request most recent head 8a8b57f

Please upload reports for the commit 8a8b57f to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1532      +/-   ##
==========================================
- Coverage   86.38%   86.31%   -0.07%     
==========================================
  Files         693      683      -10     
  Lines       80473    78091    -2382     
==========================================
- Hits        69519    67408    -2111     
+ Misses      10954    10683     -271

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard

Nice example, ping me when you think it's going to be ready for a review.

wcshds · 2024-03-28T14:42:06Z

@nathanielsimard Hello, I am always interested in the implementation of lstm in burn. I still think lstm is buggy right now. If a linear layer is added after the lstm, the parameters of the lstm and all layers before it will not be updated during training. I've been stuck on this problem for a long time.

The example of using lstm in this PR further confirms that lstm does have problems. I add some code in training.rs to check the parameters of the model before and after training.

let pjr = PrettyJsonFileRecorder::<FullPrecisionSettings>::new();
model.input_layer.clone().save_file("./input-before.json", &pjr).unwrap();
model.lstm.clone().save_file("./lstm-before.json", &pjr).unwrap();
model.output_layer.clone().save_file("./output-before.json", &pjr).unwrap();

// ......

model_trained.input_layer.clone().save_file("./input-after.json", &pjr).unwrap();
model_trained.lstm.clone().save_file("./lstm-after.json", &pjr).unwrap();
model_trained.output_layer.clone().save_file("./output-after.json", &pjr).unwrap();

After training, only the parameters of the output_layer changed. Nevertheless, for the dataset in the example, only one linear layer might be enough to overfit.

NicoZweifel · 2024-03-28T15:42:26Z

@nathanielsimard Hello, I am always interested in the implementation of lstm in burn.

I was hoping I could spark the development of the LSTM implementation a bit with an example. I would love to use Burn for this purpose as well.

After training, only the parameters of the output_layer changed. Nevertheless, for the dataset in the example, only one linear layer might be enough to overfit.

Happy to incorporate your suggestions! Feel free to create a PR that makes changes to this branch.

nathanielsimard · 2024-03-28T17:59:13Z

@wcshds @NicoZweifel I have identified the issue and we already have a planned fix. However, we will prioritize it as it directly affects a real-world use case. The problem lies in the autodiff graph, which is always attached to a tensor. When two tensors with different graphs interact, we merge the graphs. However, this process assumes that all nodes in the graph will eventually interact, which is not the case for LSTM. For instance, you may only use the hidden_state, but the graph is actually held by the gate_state, which explains the problem when working with the current LSTM implementation.

We already want to implement a client/server architecture in burn-autodiff to avoid graph merging, locking and to fix that problem.

antimora · 2024-03-28T18:45:54Z

@wcshds @NicoZweifel I have identified the issue and we already have a planned fix. However, we will prioritize it as it directly affects a real-world use case. The problem lies in the autodiff graph, which is always attached to a tensor. When two tensors with different graphs interact, we merge the graphs. However, this process assumes that all nodes in the graph will eventually interact, which is not the case for LSTM. For instance, you may only use the hidden_state, but the graph is actually held by the gate_state, which explains the problem when working with the current LSTM implementation.

We already want to implement a client/server architecture in burn-autodiff to avoid graph merging, locking and to fix that problem.

@nathanielsimard do we have a separate ticket of "planned fix"? It would go to track and link it here.

NicoZweifel · 2024-03-28T18:48:18Z

@wcshds @NicoZweifel I have identified the issue and we already have a planned fix. However, we will prioritize it as it directly affects a real-world use case. The problem lies in the autodiff graph, which is always attached to a tensor. When two tensors with different graphs interact, we merge the graphs. However, this process assumes that all nodes in the graph will eventually interact, which is not the case for LSTM. For instance, you may only use the hidden_state, but the graph is actually held by the gate_state, which explains the problem when working with the current LSTM implementation.

We already want to implement a client/server architecture in burn-autodiff to avoid graph merging, locking and to fix that problem.

@nathanielsimard Kinda off topic but it would be cool to have a generic TimeSeriesDataset that supports windowing, similar to what other libraries have. If this is something that is desired I could try to look into it in a separate Issue/PR.

antimora · 2024-03-28T18:51:24Z

@wcshds @NicoZweifel I have identified the issue and we already have a planned fix. However, we will prioritize it as it directly affects a real-world use case. The problem lies in the autodiff graph, which is always attached to a tensor. When two tensors with different graphs interact, we merge the graphs. However, this process assumes that all nodes in the graph will eventually interact, which is not the case for LSTM. For instance, you may only use the hidden_state, but the graph is actually held by the gate_state, which explains the problem when working with the current LSTM implementation.
We already want to implement a client/server architecture in burn-autodiff to avoid graph merging, locking and to fix that problem.

@nathanielsimard Kinda off topic but it would be cool to have a generic TimeSeriesDataset that supports windowing, similar to what other libraries have. If this is something that is desired I could try to look into it in a separate Issue/PR.

@NicoZweifel, that would be a great addition. You can file an issue for this and we can assign it to you.

NicoZweifel · 2024-03-28T21:14:00Z

@wcshds @NicoZweifel I have identified the issue and we already have a planned fix. However, we will prioritize it as it directly affects a real-world use case. The problem lies in the autodiff graph, which is always attached to a tensor. When two tensors with different graphs interact, we merge the graphs. However, this process assumes that all nodes in the graph will eventually interact, which is not the case for LSTM. For instance, you may only use the hidden_state, but the graph is actually held by the gate_state, which explains the problem when working with the current LSTM implementation.
We already want to implement a client/server architecture in burn-autodiff to avoid graph merging, locking and to fix that problem.

@nathanielsimard Kinda off topic but it would be cool to have a generic TimeSeriesDataset that supports windowing, similar to what other libraries have. If this is something that is desired I could try to look into it in a separate Issue/PR.

@NicoZweifel, that would be a great addition. You can file an issue for this and we can assign it to you.

Thanks, I created a separate issue to discuss the details 👍

github-actions · 2024-05-19T12:07:24Z

This PR has been marked as stale because it has not been updated for over a month

NicoZweifel added 7 commits March 26, 2024 13:41

lstm timeseries prediction example

0ae4079

fix: readme, imports

fa4f63e

add comment, fix tmp naming

b560385

remove redundant comment

1323795

format

abfccf5

fix: format

90d23a1

fix: check-all, format

a6a9826

nathanielsimard reviewed Mar 27, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into lstm

ab9f770

NicoZweifel mentioned this pull request Mar 28, 2024

WindowDataset/windows #1551

Closed

2 tasks

nathanielsimard mentioned this pull request Mar 28, 2024

[Autodiff] Client/server architecture to handle subgraphs #1552

Closed

Merge remote-tracking branch 'upstream/main' into lstm

9dedb3f

antimora added the example Related to examples label Mar 28, 2024

nathanielsimard mentioned this pull request Apr 3, 2024

[Breaking] Make Tensor, Module, Optimizer !Sync + Refactor Autodiff #1575

Merged

NicoZweifel added 2 commits April 9, 2024 10:54

Merge remote-tracking branch 'upstream/main' into lstm

3be9ee7

fix dataset windows

ae7de2f

NicoZweifel mentioned this pull request Apr 10, 2024

TimeSeriesDataset #1598

Open

4 tasks

NicoZweifel added 5 commits April 18, 2024 19:36

Merge remote-tracking branch 'upstream/main' into lstm

8176863

Merge remote-tracking branch 'upstream/main' into lstm

838c4e5

use windows/update PR

70ef4bc

Update mod.rs

fea2288

Update window.rs

8a8b57f

NicoZweifel mentioned this pull request Apr 18, 2024

fix: window -> pub window in dataset/mod.rs #1658

Merged

2 tasks

github-actions bot added the stale The issue or pr has been open for too long label May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM Timeseries prediction example #1532

LSTM Timeseries prediction example #1532

NicoZweifel commented Mar 26, 2024 •

edited

codecov bot commented Mar 26, 2024 •

edited

nathanielsimard left a comment

wcshds commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024 •

edited

nathanielsimard commented Mar 28, 2024

antimora commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024

antimora commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024

github-actions bot commented May 19, 2024

LSTM Timeseries prediction example #1532

Are you sure you want to change the base?

LSTM Timeseries prediction example #1532

Conversation

NicoZweifel commented Mar 26, 2024 • edited

Checklist

Changes

Testing

codecov bot commented Mar 26, 2024 • edited

Codecov Report

nathanielsimard left a comment

Choose a reason for hiding this comment

wcshds commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024 • edited

nathanielsimard commented Mar 28, 2024

antimora commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024

antimora commented Mar 28, 2024

NicoZweifel commented Mar 28, 2024

github-actions bot commented May 19, 2024

NicoZweifel commented Mar 26, 2024 •

edited

codecov bot commented Mar 26, 2024 •

edited

NicoZweifel commented Mar 28, 2024 •

edited