Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch logistic classifier to random forest as default classifier #1031

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Jun 2, 2022

would close #990

@coveralls
Copy link

coveralls commented Jun 2, 2022

Coverage Status

Coverage remained the same at 64.947% when pulling 574aa84 on rf into 6fc8724 on main.

@dedupeio dedupeio deleted a comment from github-actions bot Jun 2, 2022
@fgregg
Copy link
Contributor Author

fgregg commented Jun 2, 2022

@benchmark

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

before after ratio benchmark
528M 532M 1.01 canonical.Canonical.peakmem_run
! 17.4±0.3s failed n/a canonical.Canonical.time_run
- 0.919 0.767 0.83 canonical.Canonical.track_precision
+ 0.902 1.0 1.11 canonical.Canonical.track_recall
228M 232M 1.02 canonical_gazetteer.Gazetteer.peakmem_run(None)
! 15.7±0.2s failed n/a canonical_gazetteer.Gazetteer.time_run(None)
0.982 0.991 1.01 canonical_gazetteer.Gazetteer.track_precision(None)
0.982 0.982 1.00 canonical_gazetteer.Gazetteer.track_recall(None)
228M 232M 1.02 canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
228M 232M 1.02 canonical_matching.Matching.peakmem_run({'threshold': 0.5})
! 14.3±0.07s failed n/a canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
! 14.5±0.05s failed n/a canonical_matching.Matching.time_run({'threshold': 0.5})
0.981 0.982 1.00 canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
0.99 1.0 1.01 canonical_matching.Matching.track_precision({'threshold': 0.5})
0.911 0.991 1.09 canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
0.911 1.0 1.10 canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

@fgregg
Copy link
Contributor Author

fgregg commented Jun 2, 2022

@benchmark

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

before after ratio benchmark
527M 532M 1.01 canonical.Canonical.peakmem_run
! 16.9±0.2s failed n/a canonical.Canonical.time_run
- 0.919 0.824 0.90 canonical.Canonical.track_precision
+ 0.902 1.0 1.11 canonical.Canonical.track_recall
228M 232M 1.02 canonical_gazetteer.Gazetteer.peakmem_run(None)
! 16.0±0.03s failed n/a canonical_gazetteer.Gazetteer.time_run(None)
0.982 0.991 1.01 canonical_gazetteer.Gazetteer.track_precision(None)
0.982 1.0 1.02 canonical_gazetteer.Gazetteer.track_recall(None)
227M 235M 1.03 canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
227M 235M 1.03 canonical_matching.Matching.peakmem_run({'threshold': 0.5})
14.5±0.05s 29.5±0.3s ~2.03 canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
14.1±0.1s 28.9±0.2s ~2.05 canonical_matching.Matching.time_run({'threshold': 0.5})
0.99 0.991 1.00 canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
0.99 1.0 1.01 canonical_matching.Matching.track_precision({'threshold': 0.5})
0.911 0.991 1.09 canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
0.911 1.0 1.10 canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

@fgregg
Copy link
Contributor Author

fgregg commented Jun 2, 2022

@benchmark

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

before after ratio benchmark
528M 532M 1.01 canonical.Canonical.peakmem_run
+ 15.6±0.03s 24.4±0s 1.56 canonical.Canonical.time_run
0.87 0.83 0.95 canonical.Canonical.track_precision
0.955 1.0 1.05 canonical.Canonical.track_recall
227M 232M 1.02 canonical_gazetteer.Gazetteer.peakmem_run(None)
+ 13.2±0.01s 23.7±0s 1.79 canonical_gazetteer.Gazetteer.time_run(None)
0.982 1.0 1.02 canonical_gazetteer.Gazetteer.track_precision(None)
0.982 1.0 1.02 canonical_gazetteer.Gazetteer.track_recall(None)
227M 234M 1.03 canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
227M 234M 1.03 canonical_matching.Matching.peakmem_run({'threshold': 0.5})
11.7±0.08s 22.8±0.2s ~1.94 canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
11.3±0.3s 23.0±0.2s ~2.03 canonical_matching.Matching.time_run({'threshold': 0.5})
0.99 1.0 1.01 canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
0.99 1.0 1.01 canonical_matching.Matching.track_precision({'threshold': 0.5})
0.911 1.0 1.10 canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
0.911 1.0 1.10 canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

@fgregg
Copy link
Contributor Author

fgregg commented Jun 2, 2022

@NickCrews, there's a lot of noise in the recall/precision differences. Should we increase the repetitions?

@NickCrews
Copy link
Contributor

You know you can run these benchmarks locally, right? That might help with any sort of debugging that you need to do. Look at the workflow to see what to do, I can help if needed.

Seems easy enough to try increasing the reps and seeing if the metrics become more stable. Play around locally and see how many you need to do to get stability? I think this is done by playing with https://asv.readthedocs.io/en/stable/benchmarks.html#benchmark-attributes

It seems inherent in the fact that we are using non-deterministic algorithms that we are seeing this variation, so I don't see it as a problem with our testing methodology. It could be considered a problem with the actual implementation though: If we get such variation between runs, then should dedupe actually do a bunch of trials and then choose the settings from the best run? This smells of overfitting to me?

Somewhat related, but a nice-to-have would be if we could pass in random_state into all the classes and functions to make them deterministic, like sklearn etc all do. Doing this would sort of make things better for this problem: It would make two different benchmark runs more comparable. BUT, if we happen to choose a random_seed that isn't representative, then these benchmarks won't be very helpful for predicting real life (maybe I'm thinking about this wrong).

Anyway, I think yes increasing reps for benchmarks would be the best bet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider making the default classifier a random forest
3 participants