Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't write missing values in obsm and varm to h5ad #1146

Closed
2 of 3 tasks
rcannood opened this issue Sep 22, 2023 · 2 comments
Closed
2 of 3 tasks

can't write missing values in obsm and varm to h5ad #1146

rcannood opened this issue Sep 22, 2023 · 2 comments
Labels

Comments

@rcannood
Copy link
Contributor

rcannood commented Sep 22, 2023

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

I was busy porting a unit test from anndataR to python to replicate an issue I was having in R, which is why the code looks a bit funky.

When an obsm or varm contains a column of strings with missing values, I get the following issue when writing an h5ad:

Code:

import anndata as ad
import pandas as pd

obs = pd.DataFrame(
  index=[f"cell{i}" for i in range(1, 11)]
)
var = pd.DataFrame(
  index=[f"gene{i}" for i in range(1, 21)]
)
obsm = dict(
  characters_with_nas=pd.DataFrame(
    index=obs.index,
    data=dict(
      characters_with_nas=[f"value{i}" if i in [1, 2, 5, 6, 9] else None for i in range(1, 11)]
    )
  )
)
varm = dict(
  characters_with_nas=pd.DataFrame(
    index=var.index,
    data=dict(
      characters_with_nas=[f"value{i}" if i in [1, 3, 4, 6, 7, 16, 17, 18, 19, 20] else None for i in range(1, 21)]
    )
  )
)
adata = ad.AnnData(
  obs = obs,
  var = var,
  obsm = obsm,
  varm = varm
)
adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")

Traceback:

>>> adata.write_h5ad("anndata_to_hdf5_obsmvarm_character_with_nas.h5ad")
Traceback (most recent call last):
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 359, in write_vlen_string_array
    f.create_dataset(k, data=elem.astype(str_dtype), dtype=str_dtype, **dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/group.py", line 183, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/h5py/_hl/dataset.py", line 166, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_core/anndata.py", line 1951, in write_h5ad
    _write_h5ad(
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/h5ad.py", line 94, in write_h5ad
    write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 353, in write_elem
    Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 281, in write_mapping
    _writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 246, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 311, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/registry.py", line 52, in wrapper
    result = func(g, k, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/specs/methods.py", line 579, in write_dataframe
    _writer.write_elem(
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 248, in func_wrapper
    re_raise_error(e, elem, key)
  File "/home/rcannood/.local/lib/python3.11/site-packages/anndata/_io/utils.py", line 229, in re_raise_error
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'characters_with_nas' of <class 'h5py._hl.group.Group'> to /

Interestingly, when I store the same info in obs and var, the same issue does not occur:

adata2 = ad.AnnData(
  obs = obsm["characters_with_nas"],
  var = varm["characters_with_nas"]
)
adata2.write_h5ad("anndata_to_hdf5_obsvar_character_with_nas.h5ad")

Versions

-----
anndata             0.9.2
pandas              2.0.3
session_info        1.0.0
-----
abrt_exception_handler3     NA
cython_runtime              NA
dateutil                    2.8.2
google                      NA
h5py                        3.9.0
natsort                     8.4.0
numpy                       1.24.3
packaging                   21.3
paste                       NA
pytz                        2023.3
scipy                       1.11.1
six                         1.16.0
systemd                     NA
zope                        NA
-----
Python 3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)]
Linux-6.4.12-100.fc37.x86_64-x86_64-with-glibc2.36
-----
Session information updated at 2023-09-22 07:10
@c-westhoven
Copy link

c-westhoven commented Oct 4, 2023

similar issue described in #1141 and #1143 and #1068

@flying-sheep
Copy link
Member

flying-sheep commented Dec 4, 2023

Also scverse/scanpy#1651. Let’s track this in #1068, which contains both a reproducer and discussion of the solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants