Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-sanitized session strings in DANDI:000473 #10

Closed
TheChymera opened this issue May 13, 2024 · 5 comments
Closed

Non-sanitized session strings in DANDI:000473 #10

TheChymera opened this issue May 13, 2024 · 5 comments

Comments

@TheChymera
Copy link
Collaborator

Relevant snippet:

├── sub-156131
│   ├── ses-156131_20191112-probe0
│   │   └── ephys
│   │       ├── sub-156131_channels.tsv
│   │       ├── sub-156131_contacts.tsv
│   │       ├── sub-156131_probes.tsv
│   │       └── sub-156131_ses-156131_20191112-probe0_ephys.nwb
│   ├── sessions.json
│   └── sessions.tsv
[deco]~ ❱ tree /mnt/data/.scratch/
/mnt/data/.scratch/
├── participants.json
├── participants.tsv
├── sub-128514
│   ├── sessions.json
│   └── sessions.tsv
├── sub-128515
│   ├── sessions.json
│   └── sessions.tsv
├── sub-128516
│   ├── sessions.json
│   └── sessions.tsv
├── sub-147463
│   ├── sessions.json
│   └── sessions.tsv
├── sub-147465
│   ├── sessions.json
│   └── sessions.tsv
├── sub-152414
│   ├── sessions.json
│   └── sessions.tsv
├── sub-152417
│   ├── sessions.json
│   └── sessions.tsv
├── sub-152419
│   ├── sessions.json
│   └── sessions.tsv
├── sub-156130
│   ├── sessions.json
│   └── sessions.tsv
├── sub-156131
│   ├── ses-156131_20191112-probe0
│   │   └── ephys
│   │       ├── sub-156131_channels.tsv
│   │       ├── sub-156131_contacts.tsv
│   │       ├── sub-156131_probes.tsv
│   │       └── sub-156131_ses-156131_20191112-probe0_ephys.nwb
│   ├── sessions.json
│   └── sessions.tsv
├── sub-216300
│   ├── sessions.json
│   └── sessions.tsv
├── sub-216301
│   ├── ses-216301_20200521-probe0
│   │   └── ephys
│   │       ├── sub-216301_channels.tsv
│   │       ├── sub-216301_contacts.tsv
│   │       ├── sub-216301_probes.tsv
│   │       └── sub-216301_ses-216301_20200521-probe0_ephys.nwb
│   ├── sessions.json
│   └── sessions.tsv
├── sub-225757
│   ├── sessions.json
│   └── sessions.tsv
├── sub-225758
│   ├── sessions.json
│   └── sessions.tsv
├── sub-225759
│   ├── sessions.json
│   └── sessions.tsv
├── sub-258412
│   ├── sessions.json
│   └── sessions.tsv
├── sub-258414
│   ├── sessions.json
│   └── sessions.tsv
├── sub-258416
│   ├── sessions.json
│   └── sessions.tsv
├── sub-258419
│   ├── sessions.json
│   └── sessions.tsv
├── sub-259112
│   ├── sessions.json
│   └── sessions.tsv
├── sub-268947
│   ├── sessions.json
│   └── sessions.tsv
├── sub-268951
│   ├── sessions.json
│   └── sessions.tsv
├── sub-273853
│   ├── sessions.json
│   └── sessions.tsv
├── sub-273855
│   ├── sessions.json
│   └── sessions.tsv
└── sub-273858
    ├── sessions.json
    └── sessions.tsv

I'm pretty sure this is what the metadata looks like in the DANDI archive, and it's read here →

"session_keyvalue": "ses-" + nwbfile.session_id if nwbfile.session_id else "",

@yarikoptic I can sanitize as you suggested, replace all manner of special characters with X, just that you said in the meeting today DANDI already sanitizes them, but I don't think it did here.

@yarikoptic
Copy link
Member

yes, ATM you would need to sanitize any non-alphanumeric to some alphanumeric to become BIDS-compliant.
In DANDI's organize we also sanitize but allow for - and +, which isn't BIDS compliant ATM.

back refs on related efforts

@yarikoptic
Copy link
Member

right away -- if identical there should be no sessions.json per each sub- folder - just place on top level. Moreover

  • what information do you place into sessions.tsv? AFAIK it is optional and thus if just a list of session ids, do not bother genearating
  • needs sub- prefix.

@TheChymera
Copy link
Collaborator Author

TheChymera commented May 13, 2024

@yarikoptic

In DANDI's organize we also sanitize but allow for - and +, which isn't BIDS compliant ATM.

but the string I got contains an underscore. I know how to fix it, but it seems to contradict the statement about what's already sanitized in DANDI.

If I only need to replace - and + that would best be done with replace, if I can expect literally anything from _ to ¯ it would be done via a whitelist of what characters to keep as they are. Might be safest to do that aynway since the NWB files don't need to come from DANDI 🤔

@yarikoptic
Copy link
Member

Let me repeat

ATM you would need to sanitize any non-alphanumeric to some alphanumeric to become BIDS-compliant.

so it means that you need to replace underscore as well... and whitelist is just "alphanumeric" characters.
I don't see what contradicts here... there were no statement that we replace only - and + .

@TheChymera
Copy link
Collaborator Author

Fixed as of 0034ce4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants