Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More helpful saving and format options #2286

Open
reagle opened this issue Feb 1, 2024 · 5 comments
Open

More helpful saving and format options #2286

reagle opened this issue Feb 1, 2024 · 5 comments
Labels

Comments

@reagle
Copy link

reagle commented Feb 1, 2024

Two stories about how I could use more guidance or guard rails when saving work. Presently, I have to look up and refer to the supported formats, and then my choices often don't work.

Usenet

VisiData helped me find Elizabeth Edwards' (famous) participation on Usenet's alt.support.grief; vd can read Internet Archives mbox format and make quick work of searching.

Saving the derivative sheet is tricky though. vd defaults to tsv (even if I give the mbox extension), but there's is no mbox save support, so I don't know what the resulting file format is anymore and I don't think vd does either when I return to the file. (I can save to csv, which is okay, but the result has some odd character conversions.)

Reddit

I'm analyzing posts on a subreddit which are in a "zstandard compressed ndjson" file. vd opens it well, but after some manipulations, I want to save it so I can return to the data as is, so vds seems like a natural format. And it works! However, I think, why not save it as compressed, and the resulting file BestofRedditorUpdates_submissions.vds.zst is smaller, but cannot be reopened: "Unsupported operation: Underlying stream is not seakable."

-rw-r--r-- 1 reagle staff  77M Feb  1 16:09 BestofRedditorUpdates_submissions.vds
-rw-r--r-- 1 reagle staff  17M Feb  1 16:09 BestofRedditorUpdates_submissions.vds.zst
-rw-r--r-- 1 reagle staff  14M Feb  1 15:45 BestofRedditorUpdates_submissions.zst

Consequently, relying on the file extension is problematic outside of the simplest cases because:

  1. When saving, the file extension might not trigger a format (e.g., supported on read but not supported on write).
  2. It's not clear if multiple extensions work (i.e., format+compression).
  3. In the moment, I'm not sure what formats are available.
@reagle reagle added the wishlist label Feb 1, 2024
@midichef
Copy link
Contributor

midichef commented Feb 2, 2024

To answer your first question, I went through and looked in the source for def open_*() and def save_*()
Here's a list of every extension/filetype that visidata can read+write:

arrow         gsheets       npy           tsv           xd
arrows        html          org           txt           xls
csv           jrnl          parquet       usv           xlsx
dta           jsonl         png           vdj           xml
fixed         jsonla        rec           vds           zip
geojson       lsv           sqlite        vdx

And here's what it can read, but not write:

airtable      forg          mh            pdf           toml
babyl         frictionless  mmdf          puz           ttf
bytes         gdrive        mnu           pyprof        vcf
conll         git           npz           reddit        vd
conllu        h5            ods           sas7bdat      xlsb
eml           jsonobj       orgdir        scrape        xpt
f5log         maildir       pandas        shp           yml
fdir          mbox          pbf           spss          zulip
fec           mbtiles       pcap          tar

And there seem to be a few it can write but not read: dot svg. And there are several extensions that are not exactly full-fledged file types, they are types of tables in the tabulate library, for table files (see loaders/texttable.py. The list of these includes jira md table (and more) that it can write, but not read.

Where is a good place you'd like to see this information? It could go in a table like https://visidata.org/docs/formats/, perhaps in one of the guides? Accessible by a command like open-format-guide?

@saulpw
Copy link
Owner

saulpw commented Feb 2, 2024

It's not clear if multiple extensions work (i.e., format+compression).

This does not work currently, but it's on my wishlist too. I'd be interested in a PR that addressed this.

@saulpw
Copy link
Owner

saulpw commented Feb 2, 2024

the resulting file BestofRedditorUpdates_submissions.vds.zst is smaller, but cannot be reopened: "Unsupported operation: Underlying stream is not seakable."

Can the file be decompressed manually and then opened as .vds? If so, then it's likely a bug in the vds loader (otherwise it's a bug in the vds saver). This is a bug either way though.

Also I would support vdz as an alias for vds+zstd when that's possible.

@reagle
Copy link
Author

reagle commented Feb 2, 2024

I'm not sure what the best way to do this is, but some thoughts:

  • Give the user a choice of formats to save in. I don't know how to prompt this, as tab autocompletion works on the filename presently, but perhaps some other shortcut?
  • If the user specifies a filename with an extension for which there is not a save format, warn them. ("are you sure?" with explanation in sidebar?)

@reagle
Copy link
Author

reagle commented Feb 2, 2024

@saulpw I loaded the vds file, saved it as BestofRedditorUpdates_submissions.vds.zstd, which is a smaller file size, but am unable to decompress manually.

❯ zstd --decompress BestofRedditorUpdates_submissions.vds.zstd
zstd: BestofRedditorUpdates_submissions.vds already exists; overwrite (y/n) ? y
zstd: BestofRedditorUpdates_submissions.vds.zstd: unsupported format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants