fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data #26

adamfms · 2019-10-01T12:29:03Z

Hello,

A few times I have accidentally performed a 1:1 merge using fmerge when I should have performed a m:1 merge. I then get error 3498. However, rather than failing to merge (as would happen, I think, with the standard merge command) fmerge seems to perform a correct m:1 merge anyway. Am I correct that this happens? I wonder if this is a feature or a bug?

Thanks for your help!

sergiocorreia · 2019-10-01T13:58:30Z

Hi Adam,

Wouldou be able to.give me an example so I can replicate it exactly on my side?

Perhaps something generated by the auto dataset, together with tempfile or dataex.

Thanks,
S

adamfms · 2019-10-01T14:57:28Z

Hi Sergio,

Below is a toy example using auto.dta where the 1:1 fmerge switches to a m:1 fmerge and the 1:1 merge fails:

use auto.dta, clear

* Duplicate each observation
expand 2
sort make
foreach var in price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign {

	rename `var' `var'dup

}

* fmerge
fmerge 1:1 make using auto.dta, keep(master match) keepusing(make price) nogen

rename price pricemerge

* merge
merge 1:1 make using auto.dta, keep(master match) keepusing(make price) nogen

After reading into the fact that fmerge is just a wrapper for join, I think this makes sense: join will work for m:1 and 1:1 given the keys you input.

adamfms · 2019-10-01T14:58:39Z

For what it's worth: to my mind this is very useful - it saves me having to run my code again when I accidentally specify 1:1 - but perhaps it should be documented?

sergiocorreia · 2019-10-02T04:37:34Z

Within the join command, 1:1 and m:1 are essentially identical except for one extra check at the end (equivalent to isid make). So after everything is done and the merge is finalized, the program just runs something like isid to verify that the IDs are in fact unique.

Now, because join does not have preserve+restore commands, you end up with a "dirty" dataset after the error. I chose that because otherwise running preserve on very large datasets is potentially very slow, and because I almost always run my analysis through do-files, so I would still re-run everything again.

This leads to what you found. That the results are almost like those of an m:1 join. However, there are a few lines that are actually run later: https://github.com/sergiocorreia/ftools/blob/master/src/join.ado#L101 (lines 110-130). Those lines enforce the checks of the assert() option, as well as keep the sample required by the keep() option. So they are not completely trivial.

All in all, I think it would just be better to run the isid at the beginning of the command, in order to fail earlier and minimize waiting time. So I would suggest you to depend on this method.

Finally, on a related note I have a still in-progress update to ftools, that should make merges quite faster, as well as allow string+numeric keys. Probably won't be out for a couple of weeks though.

adamfms · 2019-10-02T07:39:09Z

Thanks, Sergio!
I can't quite get my head around the implications of the assert and keep. In the cases I have worked on, they don't seem to change anything. I'll try to read more on this to see what is going on.

Great to hear there will be an even quicker merge! ftools (and gtools) have dramatically increased my pace.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data #26

fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data #26

adamfms commented Oct 1, 2019

sergiocorreia commented Oct 1, 2019

adamfms commented Oct 1, 2019

adamfms commented Oct 1, 2019

sergiocorreia commented Oct 2, 2019

adamfms commented Oct 2, 2019

fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data #26

fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data #26

Comments

adamfms commented Oct 1, 2019

sergiocorreia commented Oct 1, 2019

adamfms commented Oct 1, 2019

adamfms commented Oct 1, 2019

sergiocorreia commented Oct 2, 2019

adamfms commented Oct 2, 2019