-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes made in sampling methods to accomodate state_name #943
base: dev
Are you sure you want to change the base?
Conversation
created a different method to handle state name to support extensibility needed for continuous models.
Codecov Report
@@ Coverage Diff @@
## dev #943 +/- ##
==========================================
- Coverage 94.71% 94.61% -0.11%
==========================================
Files 114 114
Lines 11185 11208 +23
==========================================
+ Hits 10594 10604 +10
- Misses 591 604 +13
Continue to review full report at Codecov.
|
""" | ||
A utility function to map samples to corresponding state name | ||
""" | ||
# Assuming all cpd in model either have state name or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan In my opinion this is not a good assumption. I think it would be better to check and throw an error message accordingly. It would be much easier for the user to debug the issue as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan Are you suggesting that I should check if all have a state_name or not ? If certain cpd have then should I convert them to state_name and let other be numerics or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan Yes, and if all of them doesn't name state names defined, maybe show a warning and fall back to just returning numerical values. Having a mixture of state names and numeric states is a good idea but only if it's not computationally too expensive. We wouldn't want to reduce the performance of the general case for dealing with a specific case.
|
||
for node in samples.dtype.names[:index]: | ||
cpd = getattr(model, method)(node) | ||
named_samples[node] = np.array(list(map(lambda x: cpd.state_names[node][x], samples[node]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan Can use np.fromiter
. That should be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a faster approach would be to create an array with state names and then just index over it. For example:
states = np.array(['state_1', 'state_2'])
samples = np.array([0, 0, 1, 1, 0, 1, 0])
state_samples = states[samples]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan I tried using np.fromiter but it gave the error regarding variable size while assigning to the array. Lemme check for the second suggestion though.
|
||
def _map_to_state_name(model, samples): | ||
""" | ||
A utility function to map samples to corresponding state name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example here as well ? It's difficult to understand what exactly it's supposed to do by looking at the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan It is an internal method and is not supposed to be used by the user. That's why I didn't add it in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it would be good to have an example for our own reference later.
@@ -87,6 +88,7 @@ def forward_sample(self, size=1, return_type='dataframe'): | |||
weights = cpd.values | |||
sampled[node] = sample_discrete(states, weights, size) | |||
|
|||
sampled = _map_to_state_name(self.model, sampled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we should mention this in the docstring that it returns state names if defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan Okay. But I think users will assume that it will return state name because he would want that. But no harm in writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan Yeah, true. But it's always better to be explicit.
@@ -158,6 +160,7 @@ def rejection_sample(self, evidence=None, size=1, return_type="dataframe"): | |||
|
|||
i += len(_sampled) | |||
|
|||
sampled = _map_to_state_name(self.model, sampled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer _map_to_state_name
to be from return_samples
. In my opinion it belongs there, since _return_samples
is supposed to postprocess the samples. Also, with that we will not need to call this method everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan No. return_samples is also being used by hmc and nuts. So it will make code messy IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan Yeah, but return_samples
is supposed to be like a post processor, so any cases regarding that should be handled inside it. And if implementing that messes up the code too much, we should have a different solution to that, maybe implementing 2 separate post processors for different cases. But calling 2 post processing methods doesn't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan I'm in favour of keeping these two separately. I don't see any problem in this and it keeps the code clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurankan What should be done here? Both functions have different work.Two different post processors for the different case will lead to redundancy. Merging them into one can be done if we want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khalibartan I do think that we should merge these two.
created a different method to handle state name to support extensibility needed for continuous models.
Your checklist for this pull request
馃毃Please review the guidelines for contributing to this repository.
Fixes #942
Changes
Please list the proposed changes in this pull request.
state_name
inSampling.py
methods馃挃Thank you!