Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsv_join wdl task fails on gzipped tsv input #202

Open
dpark01 opened this issue Jan 27, 2021 · 0 comments
Open

tsv_join wdl task fails on gzipped tsv input #202

dpark01 opened this issue Jan 27, 2021 · 0 comments

Comments

@dpark01
Copy link
Member

dpark01 commented Jan 27, 2021

The tsv_join wdl task is intended to accept gzipped tsv inputs to simplify handling of nextmeta files in their native state. But it doesn't work:

File "/opt/viral-ngs/source/file_utils.py", line 208, in tsv_join
header.extend(reader.fieldnames)
File "/opt/miniconda/envs/viral-ngs-env/lib/python3.7/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
File "/opt/miniconda/envs/viral-ngs-env/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
2021/01/27 03:06:54 Starting delocalization.

Probably because the util.file.open_or_gzopen in viral-core might not be opening in text mode properly, etc...

Current workaround is to just pass decompressed tsvs in.

Tangentially, while fixing this, we might want to consider just moving the relatively small python bits into the wdl task itself and using a smaller docker container (like python:slim instead of viral-core) for reduced spinup times on an otherwise simple task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant