Centering rather than normalizing subsequences duing motif discovery #940
Replies: 2 comments 1 reply
-
Thank you for your question and welcome to the STUMPY community! Also, thank you for the kind words! Computing distance between one multi-dimensional motif and its closest match may need its own discussion as it is more complicated than computing the distances between subsequences in one dimensional time series data. For now, let's assume you have one single time series data (i.e. just one dimension). Let's try to answer your question for this simple case first.
The short answer is "No". In this similar post, @seanlaw mentioned that:
It is also suggested that one might be better off to just compute the pairwise distances between all subsequences IF the volume of your data is small (see this comment)
New contributions are welcome indeed if it sounds a good fit to the library. @seanlaw @EitanHemed |
Beta Was this translation helpful? Give feedback.
-
I was thinking more about your problem and I think there might be a way to do it. But, there are a few notes that you should consider:
(I) The transformation of our interest is just (II) Let's say you have two subsequences And, let's say you want to apply some offset to them,
where
Let Lines 121 to 140 in 3559b38 to this:
We can use this to address #900 as well. |
Beta Was this translation helpful? Give feedback.
-
Hi all
This is a good opportunity to say thanks to the developers of STUMPY. So useful!
And now to my question. Currently, I am working on a dataset comprising of locations (X, Y), bounded between 0 and 1.
Using multidimensional motif discovery, i am able to find motifs and matches which represent common patterns of paths (e.g., left-downward movement). I've tweaked the parameters of
stumpy.mstump
andstumpy.mmotifs
, quite a lot but can't really get what i need. Here i give simplified examples, but usually i look for motiffs usingm=60
(30Hz data).--
As i set
normalize=True
, the sub-sequences of the location data included in each motif could contain data of different scale, and be located at a different area in space.e.g., the path XY1 = (0, 0) >> XY2 = (1, 1) describing upward-rightward path would be clustered with XY1 = (0.3, 0.4) >> XY2 = (0.4, 0.5), regardless of the distance covered or the origin and end point.
The following scatter displays a set of matches for one of the motifs (left downward movement, here).
The purple scatter is the trivial match (distance = 0).
The orange squares show the most similar match.
The yellow triangle scatter is the least similar match.
The red
S
indicates the start of the sub-sequence,E
stands for the end of the sub-sequence.I would like to use a different normalization method* for each sub-sequence, rather than Z-normalization.
as this will allow me to find sets of paths which have similar shape but different scale.
The matches plotted above, will look like this, when centered (i.e., originate from XY = [0, 0]).
Is there a way to do that currently in STUMPY? Unless I miss something, a multidimensional profile matrix
normalize
only takes a boolean value at the moment.I imagine that this could be a useful feature for many, so if this is an issue of just implementing it, i would be willing to take a stab at it.
One approach I've tried with limited success is following motif discovery, take the sum of distance covered in the path of each sub-sequence, and cluster the matches to short vs. long paths. However, I was hoping for a more elegant solution.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions