Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results history option #28

Open
TahsinTariq opened this issue Jun 9, 2021 · 2 comments · May be fixed by #96
Open

Results history option #28

TahsinTariq opened this issue Jun 9, 2021 · 2 comments · May be fixed by #96
Assignees
Labels
enhancement New feature or request

Comments

@TahsinTariq
Copy link

This is an enhancement request.

The results could be saved to a file and there could be an option to show all of them by date. This could be helpful in tracking overall progress.

And maybe later, the results could be visualized in a graph too?

@max-niederman max-niederman added the enhancement New feature or request label Jun 10, 2021
@JulianWgs
Copy link

JulianWgs commented Aug 2, 2021

I'd would like to propose a JSON data structure for this, which would save all necessary data to replay the trial and do thorough analysis.

{
    "word": [
        {"start_datetime": "2021-08-02 17:36:22.2", "keystrokes": "word", "timings": [0.2, 0.1, 0.1, 0.2], "trial_hash": "a4f8e2"},
        {"start_datetime": "2021-08-02 17:37:25.7", "keystrokes": "wort\bd", "timings": [0.3, 0.1, 0.1, 0.3, 0.1, 0.1], "trial_hash": "a4f8e2"}
    ]
}

Alternatively it would also be possible to put the word into the data structure. The datetime string should be formatted so it is best read by common parsers. Alternatively one could save the int64 representation.

Zipping this would lead to very small files. Also they could be saved separately for each trial and then concatenated if needed.

Having the really detailed data available would be very convenient for in depth analysis:

  • Which words are the hardest for me? Where do I spend time?
  • Which sequences (like the "-ion" ending) are the hardest for me?
  • How did I improve over time?
  • At which time of day do I perform best?

Providing a easy to parse data structure makes it easy to analyze the data in other frameworks or languages.

Best regards
Julian

EDIT: Think through this one more time I actually would prefer a solely csv based data structure. It is again easier to parse than json (for example with excel) and concatenating is also much easier. Also when zipping it, it should make no difference in filesize (zipping uses a dictionary). Also reading from one or multiple csv files is embarrassingly parallelize-able.

start_datetime,keystrokes,timings,trial_hash
2021-08-02 17:36:22.2,word,0.2 0.1 0.1 0.2,a4f8e2

I'm not sure how to best represent the timings. But joining them with a spacebar should work out.

@max-niederman max-niederman mentioned this issue Dec 12, 2021
@max-niederman
Copy link
Owner

I've been thinking about this issue for a little bit, and I think it would be best to store the results internally using a binary file format or JSON, and then provide multiple exporters for different file formats.

Using serde, this would be quite easy to implement for most JSON-like formats. It also has the advantage of minimizing uncompressed file sizes, and if we store the results internally, we could easily implement automatic compression if file size is still an issue.

As for @JulianWgs' suggestion of using CSV, I definitely think providing a CSV exporter could be useful, but I also think a lot of thought needs to be put into the encoding since our data is essentially a two-dimensional list of characters (keystrokes, really) annotated with a duration, plus some metadata about the test; there's simply no good way to encode a two-dimensional list of tuples in a CSV file. @JulianWgs' proposed encoding would certainly work, but I'm not sure if it would be much easier to parse than JSON because of the space-delimited timings, which won't work well with e.g. Excel.

As for performance concerns, I don't think there really are any. It's true that CSVs are stupidly easy to append to, but if we use a file per test we won't ever need to and each results object should almost never be larger than a few dozen KBs.

@max-niederman max-niederman self-assigned this Dec 12, 2021
@JulianWgs JulianWgs linked a pull request Oct 15, 2023 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants