Not sure why I'm getting a data mismatch error #5902
Replies: 5 comments 2 replies
-
hm, that is very odd. Might be a bug. series_groups = X_train.groupby(level=list(range(X_train.index.nlevels - 1)), sort=False)
n_series = series_groups.ngroups Can you try which number this produces? If not, we may have to look at the specific |
Beta Was this translation helpful? Give feedback.
-
Here is my data preprocessor function: @DataClass
|
Beta Was this translation helpful? Give feedback.
-
So I'm not sure why it's creating nan in the first place, but when I remove the dropna line I don't get that error, but instead get this error: ValueError Traceback (most recent call last) File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/classification/base.py:238, in BaseClassifier.fit(self, X, y) File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/classification/kernel_based/_svc.py:219, in TimeSeriesSVC._fit(self, X, y) File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/classification/kernel_based/_svc.py:204, in TimeSeriesSVC._kernel(self, X, X2) File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/dists_kernels/base/_base.py:235, in BasePairwiseTransformerPanel.call(self, X, X2) File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/dists_kernels/base/_base.py:417, in BasePairwiseTransformerPanel.transform(self, X, X2) ValueError: Unsupported dtype object |
Beta Was this translation helpful? Give feedback.
-
Hm, I see - I suppose then the only way to diagnose is is to share your data. Then, post code that creates the data from scratch, and we can use this for debugging. |
Beta Was this translation helpful? Give feedback.
-
Hm I don't think I'm allowed to post my data. I might be able to post a toy one. I got it to run by doing X_train = X_train.apply(pd.to_numeric, errors='coerce') since the model was throwing errors due to it being an object. But when I ran TimeSeriesSVC it just keeps running forever. model = TimeSeriesSVC(class_weight='balanced', verbose=True, max_iter=1) even when doing max_iter =1 and reducing the dataset to ~4000 rows it never finishes running. When I tried the only other model for unequal datasets, KNeighbors, I get an error saying it doesn't support unequal length instances. |
Beta Was this translation helpful? Give feedback.
-
I'm still getting a data mismatch error after converting y from stream to time series classification (one result per instance (in this case called agents)). Can I have the same agents in the test set as train set?
Here are all the relevant prints I can think of:
##################################################
(5718336, 8)
(1429584, 8)
(26395,)
(28608,)
##################################################
5718336
1383029
5718336
1429584
##################################################
26395
28608
1680
1314
##################################################
5718336
1429584
5718336
1429584
##################################################
(7147920, 17)
32588
y_pred = model.predict(X_test)
ValueError Traceback (most recent call last)
Cell In[10], line 5
1 from sktime.classification.kernel_based import TimeSeriesSVC
3 model = TimeSeriesSVC()
----> 5 model.fit(X_train, y_train)
7 y_pred = model.predict(X_test)
9 # balanced_accuracy_score(y_test, y_pred)
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/classification/base.py:203, in BaseClassifier.fit(self, X, y)
198 # no vectorization needed, proceed with normal fit
199
200 # convenience conversions to allow user flexibility:
201 # if X is 2D array, convert to 3D, if y is Series, convert to numpy
202 X, y = self._internal_convert(X, y)
--> 203 X_metadata = self._check_input(
204 X, y, return_metadata=self.METADATA_REQ_IN_CHECKS
205 )
206 X_mtype = X_metadata["mtype"]
207 self._X_metadata = X_metadata
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/sktime/base/_base_panel.py:440, in BasePanelMixin._check_input(self, X, y, enforce_min_instances, return_metadata)
438 n_labels = y.shape[0]
439 if n_cases != n_labels:
--> 440 raise ValueError(
441 f"Mismatch in number of cases. Number in X = {n_cases} nos in y = "
442 f"{n_labels}"
443 )
444 if isinstance(y, np.ndarray):
445 if y.ndim > 2:
ValueError: Mismatch in number of cases. Number in X = 32588 nos in y = 26395
Not sure why it's saying number in X is 32588. That's the number of unique agents in df, but X_train has less
Also there aren't many models for unbalanced data. Any recommendations? Should I fill it to use the models that work on balanced data?
(True,
None,
{'is_univariate': False,
'is_empty': False,
'has_nans': False,
'n_features': 8,
'feature_names': ....
'n_instances': 32588,
'is_one_series': False,
'is_equal_length': False,
'is_equally_spaced': False,
'n_panels': 1,
'is_one_panel': True,
'mtype': 'pd-multiindex',
'scitype': 'Panel'})
Beta Was this translation helpful? Give feedback.
All reactions