Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nanosecond precision lost when reading time data #7817

Closed
4 tasks done
kmuehlbauer opened this issue May 4, 2023 · 3 comments · Fixed by #7827
Closed
4 tasks done

nanosecond precision lost when reading time data #7817

kmuehlbauer opened this issue May 4, 2023 · 3 comments · Fixed by #7827
Labels
bug topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@kmuehlbauer
Copy link
Contributor

What happened?

When reading nanosecond precision time data from netcdf the precision is lost. This happens because CFMaskCoder will convert the variable to floating point and insert "NaN". In CFDatetimeCoder the floating point is cast back to int64 to transform into datetime64. This casting is sometimes undefined, hence #7098.

What did you expect to happen?

Precision should be preserved. The transformation to floating point should be omitted.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import netCDF4 as nc
import matplotlib.pyplot as plt

# create time array and fillvalue
min_ns = -9223372036854775808
max_ns = 9223372036854775807
cnt = 2000
time_arr = np.arange(min_ns, min_ns + cnt, dtype=np.int64).astype("M8[ns]")
fill_value = np.datetime64("1900-01-01", "ns")

# create ncfile with time with attached _FillValue
with nc.Dataset("test.nc", mode="w") as ds:
    ds.createDimension("x", cnt)
    time = ds.createVariable("time", "<i8", ("x",), fill_value=fill_value)
    time[:] = time_arr
    time.units = "nanoseconds since 1970-01-01"

# normal decoding
with xr.open_dataset("test.nc").load() as xr_ds:
    print("--- normal decoding ----------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values.astype(np.int64) + max_ns, color="g", label="normal")

# no decoding
with xr.open_dataset("test.nc", decode_cf=False).load() as xr_ds:
    print("--- no decoding ----------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values + max_ns, lw=5, color="b", label="raw")
    
# do not decode times, this shows how the CFMaskCoder converts 
# the array to floating point before it would run CFDatetimeCoder
with xr.open_dataset("test.nc", decode_times=False).load() as xr_ds:
    print("--- no time decoding ----------------------")
    print(xr_ds["time"])
    
# do not run CFMaskCoder to show that times will be converted nicely
# with CFDatetimeCoder
with xr.open_dataset("test.nc", mask_and_scale=False).load() as xr_ds:
    print("--- no masking ------------------------------")
    print(xr_ds["time"])
    plt.plot(xr_ds["time"].values.astype(np.int64) + max_ns, lw=2, color="r", label="nomask")

plt.legend()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

--- normal decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([                          'NaT',                           'NaT',
                                 'NaT', ...,
       '1677-09-21T00:12:43.145226240', '1677-09-21T00:12:43.145226240',
       '1677-09-21T00:12:43.145226240'], dtype='datetime64[ns]')
Dimensions without coordinates: x
--- no decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([-9223372036854775808, -9223372036854775807, -9223372036854775806,
       ..., -9223372036854773811, -9223372036854773810,
       -9223372036854773809])
Dimensions without coordinates: x
Attributes:
    _FillValue:  -2208988800000000000
    units:       nanoseconds since 1970-01-01
--- no time decoding ----------------------
<xarray.DataArray 'time' (x: 2000)>
array([-9.22337204e+18, -9.22337204e+18, -9.22337204e+18, ...,
       -9.22337204e+18, -9.22337204e+18, -9.22337204e+18])
Dimensions without coordinates: x
Attributes:
    units:    nanoseconds since 1970-01-01
--- no masking ------------------------------
<xarray.DataArray 'time' (x: 2000)>
array([                          'NaT', '1677-09-21T00:12:43.145224193',
       '1677-09-21T00:12:43.145224194', ...,
       '1677-09-21T00:12:43.145226189', '1677-09-21T00:12:43.145226190',
       '1677-09-21T00:12:43.145226191'], dtype='datetime64[ns]')
Dimensions without coordinates: x
Attributes:
    _FillValue:  -2208988800000000000

Anything else we need to know?

Plot from above code:

time-fillval

Xref: #7098, #7790 (comment)

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.60-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: 11.6.0
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.0
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.11.0
sphinx: None

@kmuehlbauer kmuehlbauer added bug needs triage Issue that has not been reviewed by xarray team member topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) labels May 4, 2023
@kmuehlbauer
Copy link
Contributor Author

cc @spencerkclark @DocOtak I've tried to at least find one example which incarnates as bug. Nevertheless the transformation from int to float in CFMaskCoder should be avoided.

We might think about special casing time data in CFMaskCoder, or handle masking of time data in CFDatetimeCoder/CFTimedeltaCoder.

@dcherian dcherian added topic-CF conventions and removed needs triage Issue that has not been reviewed by xarray team member labels May 6, 2023
@dcherian
Copy link
Contributor

dcherian commented May 6, 2023

because CFMaskCoder will convert the variable to floating point and insert "NaN". In CFDatetimeCoder the floating point is cast back to int64 to transform into datetime64.

Can we reverse the order so that CFDatetimeCoder handles _FillValue for datetime arrays, and then it will be skipped in CFMaskCoder

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented May 8, 2023

@dcherian Yes, I've setup a prototype in #7827. But the overall solution doesn't look that nice. The handling of fill_value has still to be done in CFMaskCoder.

Also #7098 is needed for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants