Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing imputation for nulls in DateToUnitCircleTransformer #556

Open
michaelweilsalesforce opened this issue Jun 7, 2021 · 0 comments

Comments

@michaelweilsalesforce
Copy link
Contributor

Problem
When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle.
Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format MM-DD-YYYY are converted to MM-DD-YYYY 00h00m00s, hence will have a circular representation of (1, 0).
We would expect the null values being (1, 0) as well.

Solution
Using (1, 0) instead of (0, 0) for null default value.

Alternatives
Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique.
Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :

  • DateToUnitCircleTransformer is not an estimator
  • As an estimator, you would store as a fitted param all the distinct non null values of the dataset.

Additional context
This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance being not 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant