Replies: 6 comments 3 replies
-
@pindapuj Thank you for your interest in STUMPY and please correct me if I'm not understanding correctly. It sounds like you are essentially looking for a distance measure that allows you to compare multiple time series (of course, you can use that distance measure as input for clustering). Is this correct? If so, then you might be interested in this metric called |
Beta Was this translation helpful? Give feedback.
-
@pindapuj I'm always curious as to what applications there are. Can you give me a brief description of what specific use case you are using STUMPY for? |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Yes, I believe that is correct. If I understood correctly, I would have pairwise similarity and not necessarily full set similarity. (This is why I planning to compute the motifs across the entire set of observations and using the corresponding sub sequences as "features"-- I'd capture "global" structure.) MPdist does implicitly capture pair-wise motif-like information b/c it looks for the nearest neighbor subsequences. If I understand correctly, I'd be doing the following.
Say I have n samples. Then I would need to do n^2 matrix profile computations, and pull out k for each of them.
Each row of the matrix, can be seen as that observation's feature vector. The maximum of the row gives the 1NN, but it need not be symmetric. ( NN(A) = B != NN(B) = A). I can use kmeans or some other clustering algorithm to group the edges. I'm using STUMPY to do some exploratory analysis on some graph edge streams. I'd like understand the different types of edges that occur in my graph setting, so that I can use the edge clusters as "prototypes" for predicting graph behavior. Thank you for your help! |
Beta Was this translation helpful? Give feedback.
-
Yes, I believe that what you showed above is the right approach. I strongly suggest that you go over the MPdist paper if you haven’t already. Please report back and let us know if it worked out! Sent with GitHawk |
Beta Was this translation helpful? Give feedback.
-
@pindapuj Closing this for now but feel free to re-open if you have any further questions or comments! |
Beta Was this translation helpful? Give feedback.
-
Hello, I am currently working on a similar issue. I have 3 time series and I need to extract representative days. The approach that I am thinking of is to extract motifs for each time series and then cluster based on these motifs. I do understand this conversation occurred around 4 years ago. Are there new techniques that could be used using stumpy? |
Beta Was this translation helpful? Give feedback.
-
Thank you for this great package!
I'm trying to cluster a set of time series, based on a "shared" set of motifs, and corresponding sub-sequences. I thought of extracting the motifs given the matrix profile for each time series separately, and then doing some k-means clustering based on the subsequences of each time series over all motifs. (The subsequences for each found motif for a given time series would represent the "features.")
But, I think that this approach amplifies individual time series differences, so the motifs aren't generalizable. Instead, I was wondering if it made more sense to find motifs shared across the set of time series. (Not sure if this is the correct way to think about the time series set. Its not really a multi-dimensional dataset b/c each sample is a representation of the same phenomenon.)
Is there a natural or better way of finding these motifs or using matrix-profiles/motifs for clustering? For example, I assume it would be incorrect to simply concatenate all my time series together separated by nans, and find the matrix profile and motifs for the entire profile. Then, treat those motifs as "shared" for clustering.
Any help would be appreciated! I'm very new to the world of time series mining. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions