Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to ignore "nan" with sc.pl.rank_genes_groups() and error while writing data to .h5ad #1651

Closed
3 tasks done
chris-rands opened this issue Feb 15, 2021 · 8 comments
Closed
3 tasks done
Labels

Comments

@chris-rands
Copy link
Contributor

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

sc.tl.filter_rank_genes_groups() replaces gene names with "nan" values, would be nice to be able to ignore these with sc.pl.rank_genes_groups() and instead show the top n actual non-filtered genes

Minimal code sample

adata = sc.datasets.pbmc68k_reduced()
sc.tl.rank_genes_groups(adata, 'bulk_labels', method='wilcoxon')
sc.tl.filter_rank_genes_groups(adata, min_fold_change=3)
sc.pl.rank_genes_groups(adata, key="rank_genes_groups_filtered")

image

Versions

scanpy==1.7.0 anndata==0.7.5 umap==0.5.1 numpy==1.19.5 scipy==1.5.4 pandas==1.1.5 scikit-learn==0.24.1 statsmodels==0.12.2 python-igraph==0.8.3 leidenalg==0.8.3
@chris-rands
Copy link
Contributor Author

Sort of separate but also, an error writing such data with adata.write("results.h5ad")

Traceback:

TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'uns/rank_genes_groups_filtered/names' of <class 'h5py._hl.files.File'> from /.

del adata.uns["rank_genes_groups_filtered"] and the .write() call succeeds

@chris-rands chris-rands changed the title Option to ignore "nan" with sc.pl.rank_genes_groups() Option to ignore "nan" with sc.pl.rank_genes_groups() and error while writing data to .h5ad Feb 15, 2021
@ivirshup
Copy link
Member

This should be solved (albeit a bit differently) by #1529. There, you would pass the filters to the plotting functions.

@chris-rands
Copy link
Contributor Author

chris-rands commented Feb 24, 2021

Thanks @ivirshup , that makes sense I think. For the 2nd issue (unable to write to h5ad) is actually a more important issue for me at least. I have frequently found that I cannot add dataframes in adata.uns and then write to .h5ad file. This looks like a known issue from the code https://github.com/theislab/scanpy/blob/0d25f457e100080e578809d4db625ea147eed121/scanpy/tools/_marker_gene_overlap.py#L160-L163 - is there a github issue/PR that I missed? The only workaround I found is to convert everything within the dataframe to a string, but this is no always desirable

@ivirshup
Copy link
Member

That's definitely out of date. This should work at the moment:

 import scanpy as sc
 
pbmc = sc.datasets.pbmc3k_processed()
pbmc.uns["obs"] = pbmc.obs.copy()
pbmc.write("tmp.h5ad")

@chris-rands
Copy link
Contributor Author

Right, but I still get the TypeError above with

sc.tl.rank_genes_groups(pbmc, "louvain")
sc.tl.filter_rank_genes_groups(pbmc)
pbmc.write("tmp.h5ad")

pbmc.uns["rank_genes_groups_filtered"] is actually a dict with str keys and recarray values (so not a dataframe as I first thought)

@ivirshup
Copy link
Member

Yes, that was a known issue when filter_rank_genes_groups was implemented. We currently trying to figure out what the appropriate fix is for this. It might be that we remove the filter_rank_genes_groups functions and add their functionality to the rank_genes_groups plotting functions.

@phys-bio
Copy link

phys-bio commented Apr 29, 2022

Yes, that was a known issue when filter_rank_genes_groups was implemented. We currently trying to figure out what the appropriate fix is for this. It might be that we remove the filter_rank_genes_groups functions and add their functionality to the rank_genes_groups plotting functions.

Hi, @ivirshup Have you fixed the problem?

@flying-sheep
Copy link
Member

Tracked in scverse/anndata#1068

@flying-sheep flying-sheep closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants