Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: stats: Fix zipf.pmf and zipfian.pmf for int32 k #20702

Merged
merged 3 commits into from
May 20, 2024

Conversation

dschmitz89
Copy link
Contributor

@dschmitz89 dschmitz89 commented May 12, 2024

Reference issue

Closes gh-20692

In the long run, these issues should hopefully be taken care of by the infrastructure (#15928 ).

@github-actions github-actions bot added scipy.stats defect A clear bug or issue that prevents SciPy from being installed or used as expected labels May 12, 2024
@dschmitz89 dschmitz89 changed the title BUG: Fix zipf.pmf for integer k BUG: Fix zipf.pmf for int32 k May 12, 2024
@dschmitz89
Copy link
Contributor Author

zipfian likely suffers from the same issue:

return 1.0 / _gen_harmonic(n, a) / k**a

I am not familiar with these distributions though and could not cook up a failing example in a quick session, so let me know if this would be important to fix as well.

@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 13, 2024
@tylerjereddy tylerjereddy added this to the 1.14.0 milestone May 13, 2024
dist = zipf(9)
pmf = dist.pmf(k)
pmf_k_int32 = dist.pmf(k_int32)
assert_equal(pmf, pmf_k_int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Daniel, I just messed around for a bit and found this case that failed before/passed after a similar patch per your comment on zipfian:

diff --git a/scipy/stats/_discrete_distns.py b/scipy/stats/_discrete_distns.py
index cda7c4e11..2ee610827 100644
--- a/scipy/stats/_discrete_distns.py
+++ b/scipy/stats/_discrete_distns.py
@@ -1412,7 +1412,8 @@ class zipfian_gen(rv_discrete):
         return 1, n
 
     def _pmf(self, k, a, n):
-        return 1.0 / _gen_harmonic(n, a) / k**a
+        k = k.astype(np.float64)
+        return 1.0 / _gen_harmonic(n, a) * k**-a
 
     def _cdf(self, k, a, n):
         return _gen_harmonic(k, a) / _gen_harmonic(n, a)
diff --git a/scipy/stats/tests/test_discrete_distns.py b/scipy/stats/tests/test_discrete_distns.py
index c5993ebde..bbe9085ce 100644
--- a/scipy/stats/tests/test_discrete_distns.py
+++ b/scipy/stats/tests/test_discrete_distns.py
@@ -627,3 +627,12 @@ class TestBetaNBinom:
         #      return float(fourth_moment/var**2 - 3.)
         assert_allclose(betanbinom.stats(n, a, b, moments="k"),
                         ref, rtol=3e-15)
+
+
+def test_zipfian_gh_20692():
+    k = np.arange(0, 1000)
+    k_int32 = k.astype(np.int32)
+    dist = zipfian(111, 22)
+    pmf = dist.pmf(k)
+    pmf_k_int32 = dist.pmf(k_int32)
+    assert_equal(pmf, pmf_k_int32)

You shouldn't trust my judgement on stats stuff, but it does look quite similar in nature as you note. The obtained pmf floats are ridiculously tiny, maybe 111 is a bit much, but anyway I just pushed it to error out for integers. It does run lightning fast at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the last commit.

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
@dschmitz89 dschmitz89 changed the title BUG: Fix zipf.pmf for int32 k BUG: Fix zipf.pmf and zipfian.pmf for int32 k May 14, 2024
@lucascolley lucascolley changed the title BUG: Fix zipf.pmf and zipfian.pmf for int32 k BUG: stats: Fix zipf.pmf and zipfian.pmf for int32 k May 17, 2024
@dschmitz89 dschmitz89 merged commit d248b50 into scipy:main May 20, 2024
28 of 31 checks passed
@dschmitz89
Copy link
Contributor Author

I went ahead and merged this since it is a quite simple bug fix.

@dschmitz89 dschmitz89 deleted the gh20692 branch May 20, 2024 10:09
@tylerjereddy tylerjereddy modified the milestones: 1.14.0, 1.13.1 May 22, 2024
tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request May 22, 2024
…) [wheel build]

* BUG: fix zipf.pmf for integer k

* ENH: increase range for zipfian.pmf for integer k

---------

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>

[wheel build]
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: stats.zipf: incorrect pmf values
2 participants