[ENH] Add partial dataset of NAB, Bayes Online for Anomaly Detection, and testing example notebook #6335

duydl · 2024-04-26T04:25:49Z

Reference Issues/PRs

#6167, #3214

What does this implement/fix? Explain your changes.

Introduce dataset from Numenta Anomaly Benchmark into sktime. Implement online anomaly detection algos on the dataset.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

review-notebook-app · 2024-04-26T04:25:54Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

duydl · 2024-04-26T04:45:54Z

Bayes CPD for Anomaly:

Initialization

Initialize self.max_length_probs[0, 0] to 1:
$$P(r_0 = 0) = 1$$
This probability matrix will store the probabilities for each potential run length ( r ).

Sequential Update for Each Data Point ( x_t )

The algorithm processes and updates probabilities for all potential run lengths for each new data point.

2.1. Predictive Probability for Each Run Length:
$$P(x_t \mid r_{t-1}, x_{1:t-1})$$
Calculated with observation_likelihood. The likelihood of observing ( x_t ) given the data model parameters for a specific run length ( r_{t-1} ).

2.2. Update Run Length Probabilities:
$$P(r_t = r_{t-1} + 1 \mid x_{1:t}) = (1 - H(r_{t-1})) \times P(x_t \mid r_{t-1}, x_{1:t-1}) \times P(r_{t-1} \mid x_{1:t-1})$$
( H(r) ) is the hazard function i.e., the probability of a change point at each run length. This is the probability of not having a change point and the run length updated accordingly.

2.3. Probability of a Change Point:
$$P(r_t = 0 \mid x_{1:t}) = \sum_{r=0}^{max_run_length} H(r) \times P(x_t \mid r, x_{1:t-1}) \times P(r \mid x_{1:t-1})$$
This step sums the probabilities across all previous run lengths, weighted by the hazard function, to compute the likelihood that ( x_t ) is a change point.

2.4. Normalization and Transfer:

$$P(r_t \mid x_{1:t}) = \frac{P(r_t \mid x_{1:t})}{\sum_{j=0}^{max_run_length + 1} P(r_j \mid x_{1:t})}$$
After updating, the probabilities are normalized and transferred from ([:, 1]) back to ([:, 0]).

Iterating Over All Data Points

The algorithm repeats these steps for each new data point, continuously updating the probability distribution over potential run lengths and adapting to new evidence as it comes in.

fkiraly · 2024-04-26T13:27:09Z

sktime/annotation/online_anomaly/online_bayes_cpd.py

+from sktime.annotation.base import BaseSeriesAnnotator
+
+
+class StudentTDistribution:


the distribution exists already in sktime.proba.t, no?

On first glance, I would have modelled this as a distribution fitter, inheriting from the "parameter estimator" template in param_est. However, it does not fit entirely the interface, so we can leave it as is for now.

Add NAB partial dataset, bayes online cpd, testing nb

3aa47c9

Add __init__ file in online anomaly

b9c8c68

fkiraly reviewed Apr 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add partial dataset of NAB, Bayes Online for Anomaly Detection, and testing example notebook #6335

[ENH] Add partial dataset of NAB, Bayes Online for Anomaly Detection, and testing example notebook #6335

duydl commented Apr 26, 2024 •

edited

review-notebook-app bot commented Apr 26, 2024

duydl commented Apr 26, 2024 •

edited

fkiraly Apr 26, 2024

		from sktime.annotation.base import BaseSeriesAnnotator


		class StudentTDistribution:

[ENH] Add partial dataset of NAB, Bayes Online for Anomaly Detection, and testing example notebook #6335

Are you sure you want to change the base?

[ENH] Add partial dataset of NAB, Bayes Online for Anomaly Detection, and testing example notebook #6335

Conversation

duydl commented Apr 26, 2024 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

PR checklist

For all contributions

For new estimators

review-notebook-app bot commented Apr 26, 2024

duydl commented Apr 26, 2024 • edited

Bayes CPD for Anomaly:

Initialization

Sequential Update for Each Data Point ( x_t )

Iterating Over All Data Points

fkiraly Apr 26, 2024

Choose a reason for hiding this comment

duydl commented Apr 26, 2024 •

edited

duydl commented Apr 26, 2024 •

edited