Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with fsort when clearing sort variable #13

Open
mcaceresb opened this issue Nov 20, 2017 · 3 comments
Open

Possible issue with fsort when clearing sort variable #13

mcaceresb opened this issue Nov 20, 2017 · 3 comments

Comments

@mcaceresb
Copy link

In testing hashsort, I found that fsort sometimes did not give me an identical data set compared to sort, stable. I cannot replicate this from a blank session very easily, so here is the data that gives the issue:

local addr https://raw.githubusercontent.com/mcaceresb/stata-gtools
local path develop/src/github-issues/

use `addr'/`path'/fsort_share.dta
sort int1, stable
tempfile cmp
save `cmp'

use `addr'/`path'/fsort_share.dta
fsort int1
cf * using `cmp'

The result is

. cf * using `cmp'
           rsort:  1 mismatch
r(9);

I believe the issue is with Andrew Maurer's trick to clear : sortedby. I got around this by setting obs to =_N + 1, manipulating the last observation, and dropping it. This way the origina data is never altered.

@sergiocorreia
Copy link
Owner

It's indeed related to Maurer's trick. When I save the first value as local and then overwrite it, I end up with a loss of precision of ~8e-16 (i.e. the precision that double provides).

Expanding the dataset is indeed one alternative, that also has the advantage of not depending on whether sortvar is string or not.

One thing that was weird though is that I tried with a lot of random numbers and could not reproduce this issue in other datasets, so it must be pretty specific to some conditions (and of course you can't just assign the value in a do file because that suffers from the same loss of precision).

@mcaceresb
Copy link
Author

You still have to alter the N + 1th value from missing to non-missing to clear the sort variable, I think (else it thinks it's still sorted, presumably since missing is larger than anything).

@mcaceresb
Copy link
Author

I deleted the file in my latest commit. I think if you call 1c5fc9c21045216575313b1449544d49ee4dd283 instead of develop it should still come up, in case you stll want to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants