Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

marcotet · 2023-01-31T19:35:17Z

Description

When using MultivariateGrouper to create a list of backtesting data entries, I get a VisibleDeprecationWarning if I have Numpy version <= 1.23 installed, and an outright error with version >= 1.24.
This is due to the MultivariateGrouper combining the data of each test case in a single Numpy array, even if their dimensions is not constant.

To Reproduce

import numpy as np
import pandas as pd
from gluonts.dataset.multivariate_grouper import MultivariateGrouper

test_data = []
for t in range(2):
    for i in range(3):
        test_data.append({
            "start": pd.Period("2000-01-01") + t,
            "item_id": len(test_data),
            "target": np.ones(5),
        })
        
test_grouper = MultivariateGrouper(num_test_dates=2)
test_grouper(test_data)

Error message or code output

On Numpy == 1.23.5:

/home/toolkit/.conda/envs/tactis/lib/python3.10/site-packages/gluonts/dataset/multivariate_grouper.py:205: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}

On Numpy == 1.24.1:

File ~/.conda/envs/tactis/lib/python3.10/site-packages/gluonts/dataset/multivariate_grouper.py:205, in MultivariateGrouper._transform_target(funcs, dataset)
    203 @staticmethod
    204 def _transform_target(funcs, dataset: Dataset) -> DataEntry:
--> 205     return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

Environment

Operating system: Docker container (Linux)
Python version: 3.10.9
GluonTS version: 0.11.9
MXNet version: 1.9.1

The text was updated successfully, but these errors were encountered:

marcotet · 2023-01-31T19:47:08Z

Here is a suggested fix for the issue, although I have only minimally tested it, so I don't know if there are incompatibilities with other parts of the code.

class FixedMultivariateGrouper(MultivariateGrouper):
    def _prepare_test_data(self, dataset: Dataset) -> Dataset:
        assert self.num_test_dates is not None
        assert len(dataset) % self.num_test_dates == 0

        logging.info("group test time series to datasets")

        test_length = len(dataset) // self.num_test_dates
        
        all_entries = list()
        for test_start in range(0, len(dataset), test_length):
            dataset_at_test_date = dataset[test_start:test_start+test_length]
            transformed_target = self._transform_target(self._left_pad_data, dataset_at_test_date)[FieldName.TARGET]
            
            grouped_data = dict()
            grouped_data[FieldName.TARGET] = np.array(
                list(transformed_target), dtype=np.float32
            )
            for data in dataset:
                fields = data.keys()
                break
            if FieldName.FEAT_DYNAMIC_REAL in fields:
                grouped_data[FieldName.FEAT_DYNAMIC_REAL] = np.vstack(
                    [data[FieldName.FEAT_DYNAMIC_REAL] for data in dataset],
                )
            grouped_data = self._restrict_max_dimensionality(grouped_data)
            grouped_data[FieldName.START] = self.first_timestamp
            grouped_data[FieldName.FEAT_STATIC_CAT] = [0]
            all_entries.append(grouped_data)

        return ListDataset(
            all_entries, freq=self.frequency, one_dim_target=False
        )

lostella · 2023-02-08T12:48:54Z

Caused by numpy/numpy#22004

jaheba · 2023-02-14T08:50:41Z

The fix suggest by numpy would be to replace the offending line by:

return {FieldName.TARGET: np.array([funcs(data) for data in dataset], dtype=object)}

Then again, I don't know anything about the MultivariateGrouper and what we behaviour should be.

lostella · 2023-02-20T14:36:05Z

Fixed in #2671

marcotet added the bug Something isn't working label Jan 31, 2023

abdulfatir mentioned this issue Feb 18, 2023

Remove creation of ragged sequences in MultivariateGrouper #2671

Merged

lostella closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

marcotet commented Jan 31, 2023 •

edited

marcotet commented Jan 31, 2023

lostella commented Feb 8, 2023

jaheba commented Feb 14, 2023

lostella commented Feb 20, 2023

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

Incompatibility between MultivariateGrouper and Numpy version 1.24 #2612

Comments

marcotet commented Jan 31, 2023 • edited

Description

To Reproduce

Error message or code output

Environment

marcotet commented Jan 31, 2023

lostella commented Feb 8, 2023

jaheba commented Feb 14, 2023

lostella commented Feb 20, 2023

marcotet commented Jan 31, 2023 •

edited