Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes made in sampling methods to accomodate state_name #943

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

khalibartan
Copy link
Member

created a different method to handle state name to support extensibility needed for continuous models.

Your checklist for this pull request

馃毃Please review the guidelines for contributing to this repository.

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the dev branch (left side). Also you should start your branch off our dev.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Fixes #942

Changes

Please list the proposed changes in this pull request.

  • Added changes to accommodate state_name in Sampling.py methods

馃挃Thank you!

created a different method to handle state name to support extensibility
needed for continuous models.
@codecov
Copy link

codecov bot commented Nov 9, 2017

Codecov Report

Merging #943 into dev will decrease coverage by 0.1%.
The diff coverage is 45.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #943      +/-   ##
==========================================
- Coverage   94.71%   94.61%   -0.11%     
==========================================
  Files         114      114              
  Lines       11185    11208      +23     
==========================================
+ Hits        10594    10604      +10     
- Misses        591      604      +13
Impacted Files Coverage 螖
pgmpy/sampling/__init__.py 100% <酶> (酶) 猬嗭笍
pgmpy/sampling/Sampling.py 100% <100%> (酶) 猬嗭笍
pgmpy/sampling/base.py 82.02% <27.77%> (-13.76%) 猬囷笍

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 5953dcd...2c1dd56. Read the comment docs.

"""
A utility function to map samples to corresponding state name
"""
# Assuming all cpd in model either have state name or not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan In my opinion this is not a good assumption. I think it would be better to check and throw an error message accordingly. It would be much easier for the user to debug the issue as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan Are you suggesting that I should check if all have a state_name or not ? If certain cpd have then should I convert them to state_name and let other be numerics or something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan Yes, and if all of them doesn't name state names defined, maybe show a warning and fall back to just returning numerical values. Having a mixture of state names and numeric states is a good idea but only if it's not computationally too expensive. We wouldn't want to reduce the performance of the general case for dealing with a specific case.


for node in samples.dtype.names[:index]:
cpd = getattr(model, method)(node)
named_samples[node] = np.array(list(map(lambda x: cpd.state_names[node][x], samples[node])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan Can use np.fromiter. That should be faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a faster approach would be to create an array with state names and then just index over it. For example:

states = np.array(['state_1', 'state_2'])
samples = np.array([0, 0, 1, 1, 0, 1, 0])
state_samples = states[samples]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan I tried using np.fromiter but it gave the error regarding variable size while assigning to the array. Lemme check for the second suggestion though.


def _map_to_state_name(model, samples):
"""
A utility function to map samples to corresponding state name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example here as well ? It's difficult to understand what exactly it's supposed to do by looking at the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan It is an internal method and is not supposed to be used by the user. That's why I didn't add it in the first place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be good to have an example for our own reference later.

@@ -87,6 +88,7 @@ def forward_sample(self, size=1, return_type='dataframe'):
weights = cpd.values
sampled[node] = sample_discrete(states, weights, size)

sampled = _map_to_state_name(self.model, sampled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should mention this in the docstring that it returns state names if defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan Okay. But I think users will assume that it will return state name because he would want that. But no harm in writing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan Yeah, true. But it's always better to be explicit.

@@ -158,6 +160,7 @@ def rejection_sample(self, evidence=None, size=1, return_type="dataframe"):

i += len(_sampled)

sampled = _map_to_state_name(self.model, sampled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer _map_to_state_name to be from return_samples. In my opinion it belongs there, since _return_samples is supposed to postprocess the samples. Also, with that we will not need to call this method everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan No. return_samples is also being used by hmc and nuts. So it will make code messy IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan Yeah, but return_samples is supposed to be like a post processor, so any cases regarding that should be handled inside it. And if implementing that messes up the code too much, we should have a different solution to that, maybe implementing 2 separate post processors for different cases. But calling 2 post processing methods doesn't make much sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan I'm in favour of keeping these two separately. I don't see any problem in this and it keeps the code clean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankurankan What should be done here? Both functions have different work.Two different post processors for the different case will lead to redundancy. Merging them into one can be done if we want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khalibartan I do think that we should merge these two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sampling methods in discrete case don't use state_names in provided in CPD
2 participants