Changes made in sampling methods to accomodate state_name #943

khalibartan · 2017-11-09T08:39:24Z

created a different method to handle state name to support extensibility needed for continuous models.

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
Make sure you are making a pull request against the dev branch (left side). Also you should start your branch off our dev.
Check the commit's or even all commits' message styles matches our requested structure.
Check your code additions will fail neither code linting checks nor unit test.

Fixes #942

Changes

Please list the proposed changes in this pull request.

Added changes to accommodate state_name in Sampling.py methods

💔Thank you!

created a different method to handle state name to support extensibility needed for continuous models.

codecov · 2017-11-09T08:51:45Z

Codecov Report

Merging #943 into dev will decrease coverage by 0.1%.
The diff coverage is 45.83%.

@@            Coverage Diff             @@
##              dev     #943      +/-   ##
==========================================
- Coverage   94.71%   94.61%   -0.11%     
==========================================
  Files         114      114              
  Lines       11185    11208      +23     
==========================================
+ Hits        10594    10604      +10     
- Misses        591      604      +13

Impacted Files	Coverage Δ
pgmpy/sampling/__init__.py	`100% <ø> (ø)`	⬆️
pgmpy/sampling/Sampling.py	`100% <100%> (ø)`	⬆️
pgmpy/sampling/base.py	`82.02% <27.77%> (-13.76%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5953dcd...2c1dd56. Read the comment docs.

ankurankan · 2017-11-09T12:59:53Z

pgmpy/sampling/base.py

+    """
+        A utility function to map samples to corresponding state name
+    """
+    # Assuming all cpd in model either have state name or not


@khalibartan In my opinion this is not a good assumption. I think it would be better to check and throw an error message accordingly. It would be much easier for the user to debug the issue as well.

@ankurankan Are you suggesting that I should check if all have a state_name or not ? If certain cpd have then should I convert them to state_name and let other be numerics or something else.

@khalibartan Yes, and if all of them doesn't name state names defined, maybe show a warning and fall back to just returning numerical values. Having a mixture of state names and numeric states is a good idea but only if it's not computationally too expensive. We wouldn't want to reduce the performance of the general case for dealing with a specific case.

ankurankan · 2017-11-09T13:47:30Z

pgmpy/sampling/base.py

+
+        for node in samples.dtype.names[:index]:
+            cpd = getattr(model, method)(node)
+            named_samples[node] = np.array(list(map(lambda x: cpd.state_names[node][x], samples[node])))


@khalibartan Can use np.fromiter. That should be faster.

Or a faster approach would be to create an array with state names and then just index over it. For example:

states = np.array(['state_1', 'state_2']) samples = np.array([0, 0, 1, 1, 0, 1, 0]) state_samples = states[samples]

@ankurankan I tried using np.fromiter but it gave the error regarding variable size while assigning to the array. Lemme check for the second suggestion though.

ankurankan · 2017-11-09T13:52:31Z

pgmpy/sampling/base.py

+
+def _map_to_state_name(model, samples):
+    """
+        A utility function to map samples to corresponding state name


Could you add an example here as well ? It's difficult to understand what exactly it's supposed to do by looking at the code.

@ankurankan It is an internal method and is not supposed to be used by the user. That's why I didn't add it in the first place

But it would be good to have an example for our own reference later.

ankurankan · 2017-11-09T13:53:58Z

pgmpy/sampling/Sampling.py

@@ -87,6 +88,7 @@ def forward_sample(self, size=1, return_type='dataframe'):
                weights = cpd.values
            sampled[node] = sample_discrete(states, weights, size)

+        sampled = _map_to_state_name(self.model, sampled)


I do think we should mention this in the docstring that it returns state names if defined.

@ankurankan Okay. But I think users will assume that it will return state name because he would want that. But no harm in writing.

@khalibartan Yeah, true. But it's always better to be explicit.

ankurankan · 2017-11-09T13:55:36Z

pgmpy/sampling/Sampling.py

@@ -158,6 +160,7 @@ def rejection_sample(self, evidence=None, size=1, return_type="dataframe"):

            i += len(_sampled)

+        sampled = _map_to_state_name(self.model, sampled)


I would prefer _map_to_state_name to be from return_samples. In my opinion it belongs there, since _return_samples is supposed to postprocess the samples. Also, with that we will not need to call this method everywhere.

@ankurankan No. return_samples is also being used by hmc and nuts. So it will make code messy IMO.

@khalibartan Yeah, but return_samples is supposed to be like a post processor, so any cases regarding that should be handled inside it. And if implementing that messes up the code too much, we should have a different solution to that, maybe implementing 2 separate post processors for different cases. But calling 2 post processing methods doesn't make much sense.

@ankurankan I'm in favour of keeping these two separately. I don't see any problem in this and it keeps the code clean.

@ankurankan What should be done here? Both functions have different work.Two different post processors for the different case will lead to redundancy. Merging them into one can be done if we want to.

@khalibartan I do think that we should merge these two.

Changes made in sampling methods to accomodate state_name

2c1dd56

created a different method to handle state name to support extensibility needed for continuous models.

khalibartan requested a review from ankurankan November 9, 2017 08:41

ankurankan requested changes Nov 9, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes made in sampling methods to accomodate state_name #943

Changes made in sampling methods to accomodate state_name #943

khalibartan commented Nov 9, 2017

codecov bot commented Nov 9, 2017 •

edited

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017 •

edited

ankurankan Nov 9, 2017

ankurankan Nov 9, 2017

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017 •

edited

ankurankan Nov 9, 2017

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017

ankurankan Nov 9, 2017

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017

ankurankan Nov 9, 2017

khalibartan Nov 9, 2017

khalibartan Dec 6, 2017

ankurankan Dec 7, 2017

		@@ -158,6 +160,7 @@ def rejection_sample(self, evidence=None, size=1, return_type="dataframe"):

		i += len(_sampled)

		sampled = _map_to_state_name(self.model, sampled)

Changes made in sampling methods to accomodate state_name #943

Are you sure you want to change the base?

Changes made in sampling methods to accomodate state_name #943

Conversation

khalibartan commented Nov 9, 2017

Your checklist for this pull request

Fixes #942

Changes

codecov bot commented Nov 9, 2017 • edited

Codecov Report

Choose a reason for hiding this comment

khalibartan Nov 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khalibartan Nov 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 9, 2017 •

edited

khalibartan Nov 9, 2017 •

edited

khalibartan Nov 9, 2017 •

edited