Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join: assert that by() vars have same general type (str vs num) #2

Open
sergiocorreia opened this issue Oct 17, 2016 · 1 comment
Open
Assignees

Comments

@sergiocorreia
Copy link
Owner

sergiocorreia commented Oct 17, 2016

EG:


key variable id1 is str5 in master but byte in using data
key variable id3 is str12 in master but int in using data
    Each key variable -- the variables on which observations are matched -- must be of the same generic type
    in the master and using datasets.  Same generic type means both numeric or both string.
@luispfonseca
Copy link

join seems to require that key variables are of the same generic type. Is this related to the hashing algorithm? It seems to be fine with both being strings or both being numeric but not one of each.

Minimal example:

* generate data
clear

* master dataset
set seed 09092019
set obs 10
gen string_id = "A"
replace string_id = "B" if _n > 5
gen number_id = int(runiform() * 2)
save temp_master_dataset, replace

* using dataset
duplicates drop
gen usingvar = "BLA" * (number_id + 1)
tostring number_id, gen(number_id_tostring)
save temp_using_dataset, replace

use master, clear
* error when joining
join, from(temp_using_dataset) by(string_id number_id)

* case with no error when using only strings
tostring number_id, gen(number_id_tostring)
join usingvar, from(temp_using_dataset) by(string_id number_id_tostring)

Thank you for the package!

@sergiocorreia sergiocorreia self-assigned this Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants