Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets : harmonize Netflix parsers with the rest #26

Open
ocramz opened this issue Dec 30, 2018 · 0 comments
Open

datasets : harmonize Netflix parsers with the rest #26

ocramz opened this issue Dec 30, 2018 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed R&D: library Research and (re-)design a library component

Comments

@ocramz
Copy link
Member

ocramz commented Dec 30, 2018

The Netflix Prize dataset uses a custom parser because one data example does not fit into a single dataset row (such as CSV data) but has a custom "stanza-based" format. For example, these are two stanzas of the "qualifying.txt" data file :

1:
1046323,2005-12-19
1080030,2005-12-23
2127527,2005-12-04
1944918,2005-10-05
1057066,2005-11-07
954049,2005-12-20
10:
12868,2004-10-19
627923,2005-12-16
690763,2005-12-13

It would be nice to upgrade the library such that it can deal with these cases

Solution sketch:

  • Add one constructor to ReadAs that can accept an attoparsec parser as parameter
@ocramz ocramz added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers R&D: library Research and (re-)design a library component labels Dec 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed R&D: library Research and (re-)design a library component
Projects
None yet
Development

No branches or pull requests

1 participant