-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the fuse option to the data entered linearly #3011
Comments
@SoftTools59654: Sorry, but I'm having difficulty understanding your question. I wonder if maybe you have some misunderstanding about what the In any case, I'll try to read back my best guesses of what you might be getting at. It's easiest to have these discussions if we're both working off the same sample data, so I'll point to some public ones I can find. If I don't manage to reproduce the effect you're seeing with your data it would be ideal if you could attach a sample here or point to a URL of another that's publicly available that you use to show the problem. I did some web searches for "movies" CSV data since that seemed to be what you were showing in your screenshots. I didn't find any 10 GB in size, but I did find a smaller one at https://gist.github.com/jheer/4dee9b65d5f4cab64235e28d0e4010dc. In my demo videos below I've downloaded that to my desktop and am using it as my starting point. I'm using Zui Insiders 1.6.1-13 to reproduce these. For well-formed CSV like this, you can drag it in and the auto-detect will read it into the pool. The multiple "shapes" highlighted in this case is a side effect of frequent "null" values which results in many different Zed record types. If this poses a problem you can apply Demo1.mp4Question: Is there something you're looking to accomplish with well-formed CSV and Since you spoke of "problems to import", I suspect where you're looking for help is something more like this next example. Here I've created separate small test data gist_bad.csv that has just the first handful of lines from the original CSV with a "bad" line inserted:
I try to import this data in the next video. As it shows, the auto-detect finds the "bad line" and therefore recognizes that it can't import it as structured CSV, so I revert to the Line input mode, which it sounds like it might be similar to what you're doing. Demo2.mp4This is where I wonder if maybe you're misunderstanding Assuming I'm on the right track with that, next let's look at the example of JSON data that has problems importing, because you've got more options here. Here I've created small test data gist_bad.json that's similar to the CSV example:
In the final video, I take similar steps to read it as Line because the auto-detect once again notices the bad line. But here we have some more options. Specifically, the Demo3.mp4That Zed I applied:
And like we did in the first video, you could perhaps choose to use that as your "shaper" at load time if you wanted to avoid having the string representation first loaded into your pool. Let me know if any of those help you with what you're trying to achieve and if there's still something missing in there that's needed for your specific use case. Being able to show it with your own sample data or the ones I've been using here would help. Thanks. |
Add the fuse option to the data entered linearly
A short review of the size and speed differences between zng and csv files
I imported a 10GB csv file into zui as a line because it had problems with the file structure
The result was interesting for me, reducing the size of the csv file up to 30% of the original file size, in some cases even 25% of the original file size (of course, even if the standard csv and json file is also imported, the size will be less than this, but in some cases that the files do not have a proper structure at all, the line option is the best solution)
The search speed is also acceptable due to the 75% reduction in volume
It is a suitable option for files with a very large volume. Because csv files take up a lot of space
I have a request
Is it possible to activate the fuse option in the query display section for csv json standard data? Enable this option for the line as well, if there are separator characters like in the image below or if you have a problem with the json file that is imported as a line and has the minimum standards, the search result, the fuse option should be enabled for it as well.
It will be a suitable option for data that has minimum standards such as separators, even for problematic jsons. Because I didn't see any limitation in importing data linearly, while in csv and json files that don't have some standards, there are many problems to import this data.
The text was updated successfully, but these errors were encountered: