Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze: add usage example(s) #17

Open
ocramz opened this issue Oct 30, 2018 · 8 comments
Open

analyze: add usage example(s) #17

ocramz opened this issue Oct 30, 2018 · 8 comments
Assignees
Labels
documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed R&D: applications

Comments

@ocramz
Copy link
Member

ocramz commented Oct 30, 2018

Possibly a binary in the app/ folder with an end-to-end workflow. Then we can split back anything good that comes out of this into the main library

@ocramz ocramz self-assigned this Oct 30, 2018
@ocramz
Copy link
Member Author

ocramz commented Dec 12, 2018

One possible use case (from https://www.reddit.com/r/haskell/comments/a50xpr/datahaskell_solve_this_small_problem_to_fill_some/ )

The problem

Averaged across persons, excluding legal fees, how much money had each person spent by time 6?

item , price 
----------
computer , 1000 
car , 5000 
legal fees (1 hour) , 400
date , person , item-bought , units-bought 
------------------------------------
7 , bob , car , 1 
5 , alice , car , 1 
4 , bob , legal fees (1 hour) , 20 
3 , alice , computer , 2 
1 , bob , computer , 1 

It would be extra cool if you provided both an in-memory and a streaming solution.

Principles|operations it illustrates

Predicate-based indexing|filtering. Merging (called "joining" in SQL). Within- and across-group operations. Sorting. Accumulation (what Data.List calls "scanning"). Projection (both the "last row" and the "mean" operations). Statistics (the "mean" operation).

Solution and proposed algorithm (it's possible you don't want to read this)

The answer is $4000. That's because by time 6, Bob had bought 1 computer ($1000) and 20 hours of legal work (excluded), while Alice had bought a car ($5000) and two computers ($2000). In total they had spent $8000, so the across-persons average is $4000.

One way to compute that would be to:

  • Delete any purchase of legal fees.
  • Merge price and purchase data.
  • Compute a new column, "money-spent" = units-bought price.
  • Group by person.
  • Within each group: Sort by date in increasing order.
  • Compute a new column, "accumulated-spending" = running total of money spent.
  • Keep the last row with a date no greater than 6; drop all others.
  • Across groups, compute the mean of accumulated spending.

@ocramz ocramz added help wanted Extra attention is needed good first issue Good for newcomers labels Dec 12, 2018
@ocramz
Copy link
Member Author

ocramz commented Jan 13, 2019

Started addressing this with some generic conversion machinery in #34

@UnkDevE
Copy link
Contributor

UnkDevE commented Feb 12, 2019

Currently writing an example, will commit soon

@UnkDevE
Copy link
Contributor

UnkDevE commented Feb 13, 2019

writtern code! don't know how to pull request however

@ocramz
Copy link
Member Author

ocramz commented Feb 13, 2019

@UnkDevE you open a PR starting from the page with your fork, then clicking "Compare" to see your changes in context :

image

master...UnkDevE:master

then you can press "Create pull request"

@UnkDevE
Copy link
Contributor

UnkDevE commented Feb 13, 2019

Thanks! made pull request.

ocramz pushed a commit that referenced this issue Mar 2, 2019
* add example

* add fixtures, handle read errors

* get rid of errors using hlint
ocramz added a commit that referenced this issue Mar 2, 2019
This reverts commit b681743.
ocramz added a commit that referenced this issue Mar 2, 2019
This reverts commit b681743.
@ocramz
Copy link
Member Author

ocramz commented Mar 2, 2019

@UnkDevE I was too quick in merging your previous PR; a number of things still needed to be fixed. For the future, could you add your tests to the main test group, so that Travis runs them together and we see if anything is broken? Thanks!

@UnkDevE
Copy link
Contributor

UnkDevE commented Mar 2, 2019

no problem! will get started on that tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed R&D: applications
Projects
None yet
Development

No branches or pull requests

2 participants