Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace mode pipeline #892

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

aGuttman
Copy link
Contributor

No description provided.

@aGuttman
Copy link
Contributor Author

Addressing e-mission/e-mission-docs#841

Start building out infrastructure to allow replace mode model to run in pipeline. Functions made to use gbdt model through trip_model interface, create storage methods for model.

@aGuttman
Copy link
Contributor Author

todo:
continue building/testing infrustructure components
write save/update functions for model
create unit tests

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, I also don't see this new algorithm called from anywhere in the pipeline.

Comment on lines +160 to +162
def predict_gradient_boosted_decision_tree(trip, max_confidence=None, first_confidence=None, confidence_multiplier=None):
# load application config
model_type = eamtc.get_model_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it is just a copy/paste of the previous predict_cluster_confidence_discounting
Why does this have to be in the labels directory anyway?
labels is for predicting labels based on other labels
replaced_mode is for predicting the replaced mode based on other characteristics (e.g. demographics).

So while it is appropriate to have this be inspired by the label assist algorithm, it is its own algorithm/model, and for clarity, it should be in its own directory. Its scaffolding can be similar to the label assist, but it is not a label assist.

Comment on lines +172 to +174
labels = copy.deepcopy(labels)
for l in labels: l["p"] *= confidence_coeff
return labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concretely, this is also wrong because there will not be a label array or probabilities.
Note that this code as written does not work because confidence_coeff is not defined.


# Does all the work necessary for a given user
def infer_labels(user_id):
time_query = epq.get_time_range_for_label_inference(user_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this is not the time range to query for because that will return the time range for the label inference algorithm. You are your own algorithm and you need your own time range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the pipeline unless changed.


# Code structure based on emission.analysis.classification.inference.mode.pipeline
# and emission.analysis.classification.inference.mode.rule_engine
class LabelInferencePipeline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this needs to change for clarity

Comment on lines +60 to +62
cleaned_trip_dict = copy.copy(cleaned_trip)["data"]
inferred_trip = ecwe.Entry.create_entry(user_id, "analysis/inferred_trip", cleaned_trip_dict)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have basically copy-pasted the other pipeline.py, you need to understand how it works and adapt it to be a separate step.

@@ -118,6 +118,27 @@ def predict_labels_with_n(
predictions, n = model.predict(trip)
return predictions, n

def predict_labels_with_gbdt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this called from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the list of algorithms (but pipeline_replace_mode.py will do it later).

@shankari shankari changed the base branch from random-forest-mode-detection to master September 23, 2023 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants