Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate (.ome).zarr against ngff schema #1409

Open
yarikoptic opened this issue Feb 21, 2024 · 4 comments
Open

Validate (.ome).zarr against ngff schema #1409

yarikoptic opened this issue Feb 21, 2024 · 4 comments
Assignees
Labels

Comments

@yarikoptic
Copy link
Member

A part inspired from

For any .zarr we encounter we should

For any OME .zarr (be either detected through above or having .ome.zarr extension), validate that zarr against the specified version (if no version -- validation error)

❯ jq '.omero.version' .zattrs
"0.4"

of schema as provided on https://github.com/ome/ngff under {schema_version}/schemas/ folder in .schema json files and issue corresponding validation errors to the users trying to upload non-compliant OME .ngffs.

@yarikoptic
Copy link
Member Author

following advice in ome/ngff#228 (comment) let's :

  • consider versions from .multiscales[].version and .omero.version.
  • if more than 1 found -- issue a validation error on inconsistency of OME/NGFF version, stop validation
  • if 1 version found -- load that schema and validate against
  • if no version found -- not OME/NGFF, no OME specific validation

@yarikoptic
Copy link
Member Author

here is stats across zarrs on S3 - first one for .ome.version, 2nd for .multiscales[].version:

dandi@drogon:~$ sort /tmp/ome-versions.out | uniq -c
     20 [0.2,0.2]
   4303 ["0.4","0.4"]
    572 [null,"0.4"]

where that file was created using for d in *-*-*; do git -C $d annex whereis .zattrs | awk '/versionId=/{print $2;}' | xargs curl --silent | jq -c '[.omero.version, .multiscales[].version]'; done | tee /tmp/ome-versions.out

note that some had it (incorrectly) as floats I think. Might be worth making code robust there and explicitly test for it being a string and otherwise issue validation error

@jwodder
Copy link
Member

jwodder commented Feb 27, 2024

@yarikoptic

  • So should dandi-cli just download the OME schemata on the fly when needed?
  • Could you link me to an example valid (or nearly-valid) OME Zarr?

@yarikoptic
Copy link
Member Author

well -- ideally cache locally. I thought we already do something similar to dandi schema or used to do for bids at some point.

"nearly-valid" I think most of zarrs in 000108 according to e.g. https://ome.github.io/ome-ngff-validator/?source=https://dandiarchive.s3.amazonaws.com/zarr/e41844a2-dad0-4b1c-9c53-d55883e0553f which errors with

{
  "instancePath": "/omero/channels/0/window",
  "schemaPath": "#/properties/omero/properties/channels/items/properties/window/required",
  "keyword": "required",
  "params": {
    "missingProperty": "start"
  },
  "message": "must have required property 'start'"
}

but then otherwise is happy to report 7 Datasets checked ✓.

I also found one in https://dandiarchive.org/dandiset/000243/draft/files?location=sub-S01%2Fanat&page=1 which fully valid https://ome.github.io/ome-ngff-validator/?source=https://dandiarchive.s3.amazonaws.com/zarr/7723d02f-1f71-4553-a7b0-47bda1ae8b42

also those in 000026 seems to be good: https://dandiarchive.org/dandiset/000026/draft/files?location=sub-I45%2Fses-SPIM%2Fmicr&page=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants