You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we've got heavy problems here due to some bots (especially Googlebot, googleweblight and others) apparently using a broken version of Math.random(), leading to duplicated event_ids. This gets especially annoying with the resulting huge cartesian joins resulting from significant context use.
I could track down the problem down into the uuid library of the implementation that the Javascript collector uses: https://github.com/kelektiv/node-uuid/blob/master/lib/rng-browser.js#L17 - which happens to fall eventually back down to Math.random() if no better randomness sources are available.
Unfortunately, the implementation of Math.random() is left to the JS interpreter, which is broken on several systems - sources:
I'd like to propose the following solution for this problem:
Replace Math.random() with a seedable Mersenne twister: https://github.com/pigulla/mersennetwister (probably needs upstream patching of npm uuid package). While this is not suitable for cryptographic uses, all that's needed is good entropy here to prevent event ID duplication. Also, the cycle length of >2^19000 far surpasses the capacity of a UUIDv4 (2^128).
1a) Make it possible to seed the RNG by invoking sp.js with an additional parameter &rnd_seed=a4b120f48... (if the page isn't cached or served via CDN, this should be possible rather easily with templating from multiple languages) - which then gets inserted by a seed vector into the twister.
1b) Alternatively - or as a fallback to 1a) - generate the seed vector from the following entropy sources:
Difference of Snowplow cookie date to current time
Injecting Math.random() at least can't hurt
Thoughts?
Right now I've got a little too much on my plate, so feel free to grab it - but if we happen to do ourselves eventually it we'd love to upstream in case you're interested.
The text was updated successfully, but these errors were encountered:
Hey @falschparker82 - many thanks for the super-detailed and thoughtful ticket.
I like 1b) - I have nothing against 1a) but I don't know of many (any?) users who don't serve sp.js via CDN...
Thoughts from the community?
chuwy
changed the title
Javascript Collector: Duplicated event IDs from broken Math.random, especially bots
Javascript Tracker: Duplicated event IDs from broken Math.random, especially bots
Nov 30, 2016
Hi Snowplowers,
we've got heavy problems here due to some bots (especially Googlebot, googleweblight and others) apparently using a broken version of Math.random(), leading to duplicated event_ids. This gets especially annoying with the resulting huge cartesian joins resulting from significant context use.
I could track down the problem down into the uuid library of the implementation that the Javascript collector uses: https://github.com/kelektiv/node-uuid/blob/master/lib/rng-browser.js#L17 - which happens to fall eventually back down to
Math.random()
if no better randomness sources are available.Unfortunately, the implementation of
Math.random()
is left to the JS interpreter, which is broken on several systems - sources:http://stackoverflow.com/a/24224089/1281376
https://medium.com/@betable/tifu-by-using-math-random-f1c308c4fd9d#.keeelkt8v
I'd like to propose the following solution for this problem:
uuid
package). While this is not suitable for cryptographic uses, all that's needed is good entropy here to prevent event ID duplication. Also, the cycle length of >2^19000 far surpasses the capacity of a UUIDv4 (2^128).1a) Make it possible to seed the RNG by invoking
sp.js
with an additional parameter&rnd_seed=a4b120f48...
(if the page isn't cached or served via CDN, this should be possible rather easily with templating from multiple languages) - which then gets inserted by a seed vector into the twister.1b) Alternatively - or as a fallback to 1a) - generate the seed vector from the following entropy sources:
Thoughts?
Right now I've got a little too much on my plate, so feel free to grab it - but if we happen to do ourselves eventually it we'd love to upstream in case you're interested.
The text was updated successfully, but these errors were encountered: