Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

Closed
marcotet opened this issue Jan 31, 2023 · 4 comments
Closed

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

marcotet opened this issue Jan 31, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@marcotet
Copy link

marcotet commented Jan 31, 2023

Description

When using MultivariateGrouper to create a list of backtesting data entries, I get a VisibleDeprecationWarning if I have Numpy version <= 1.23 installed, and an outright error with version >= 1.24.
This is due to the MultivariateGrouper combining the data of each test case in a single Numpy array, even if their dimensions is not constant.

To Reproduce

import numpy as np
import pandas as pd
from gluonts.dataset.multivariate_grouper import MultivariateGrouper

test_data = []
for t in range(2):
    for i in range(3):
        test_data.append({
            "start": pd.Period("2000-01-01") + t,
            "item_id": len(test_data),
            "target": np.ones(5),
        })
        
test_grouper = MultivariateGrouper(num_test_dates=2)
test_grouper(test_data)

Error message or code output

On Numpy == 1.23.5:

/home/toolkit/.conda/envs/tactis/lib/python3.10/site-packages/gluonts/dataset/multivariate_grouper.py:205: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}

On Numpy == 1.24.1:

File ~/.conda/envs/tactis/lib/python3.10/site-packages/gluonts/dataset/multivariate_grouper.py:205, in MultivariateGrouper._transform_target(funcs, dataset)
    203 @staticmethod
    204 def _transform_target(funcs, dataset: Dataset) -> DataEntry:
--> 205     return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

Environment

  • Operating system: Docker container (Linux)
  • Python version: 3.10.9
  • GluonTS version: 0.11.9
  • MXNet version: 1.9.1
@marcotet marcotet added the bug Something isn't working label Jan 31, 2023
@marcotet
Copy link
Author

Here is a suggested fix for the issue, although I have only minimally tested it, so I don't know if there are incompatibilities with other parts of the code.

class FixedMultivariateGrouper(MultivariateGrouper):
    def _prepare_test_data(self, dataset: Dataset) -> Dataset:
        assert self.num_test_dates is not None
        assert len(dataset) % self.num_test_dates == 0

        logging.info("group test time series to datasets")

        test_length = len(dataset) // self.num_test_dates
        
        all_entries = list()
        for test_start in range(0, len(dataset), test_length):
            dataset_at_test_date = dataset[test_start:test_start+test_length]
            transformed_target = self._transform_target(self._left_pad_data, dataset_at_test_date)[FieldName.TARGET]
            
            grouped_data = dict()
            grouped_data[FieldName.TARGET] = np.array(
                list(transformed_target), dtype=np.float32
            )
            for data in dataset:
                fields = data.keys()
                break
            if FieldName.FEAT_DYNAMIC_REAL in fields:
                grouped_data[FieldName.FEAT_DYNAMIC_REAL] = np.vstack(
                    [data[FieldName.FEAT_DYNAMIC_REAL] for data in dataset],
                )
            grouped_data = self._restrict_max_dimensionality(grouped_data)
            grouped_data[FieldName.START] = self.first_timestamp
            grouped_data[FieldName.FEAT_STATIC_CAT] = [0]
            all_entries.append(grouped_data)

        return ListDataset(
            all_entries, freq=self.frequency, one_dim_target=False
        )

@lostella
Copy link
Contributor

lostella commented Feb 8, 2023

Caused by numpy/numpy#22004

@jaheba
Copy link
Contributor

jaheba commented Feb 14, 2023

The fix suggest by numpy would be to replace the offending line by:

return {FieldName.TARGET: np.array([funcs(data) for data in dataset], dtype=object)}

Then again, I don't know anything about the MultivariateGrouper and what we behaviour should be.

@lostella
Copy link
Contributor

Fixed in #2671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants