Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestIDs Repeating or Recurring over Short Periods #2822

Closed
GLStephen opened this issue Jul 5, 2018 · 12 comments
Closed

RequestIDs Repeating or Recurring over Short Periods #2822

GLStephen opened this issue Jul 5, 2018 · 12 comments
Assignees
Labels

Comments

@GLStephen
Copy link
Collaborator

We are seeing instances of requestId (auctionId) recurring over short periods of time. Different IPs, pageviews and UAs. Is the requestId/auctionId generator sufficient to create true GUIDs? Just to be clear, we're 100% sure this is not some form of duplicate data in our ingestion.

@ScottKevill
Copy link

You're right. They rely on Math.random() which apparently has a long history of varying brokenness:

2011: Google Chrome random number generator issue
2011: Collisions when generating UUIDs in JavaScript?
2015: TIFU by using Math.random()
2015: There's Math.random(), and then there's Math.random()
2016: Replace Math.random to prevent duplicate event IDs
2018: Googlebot’s Javascript random() function is deterministic

Here are the relevant Prebid functions:

Prebid.js/src/utils.js

Lines 46 to 72 in a2bb255

/* utility method to get incremental integer starting from 1 */
var getIncrementalInteger = (function () {
var count = 0;
return function () {
count++;
return count;
};
})();
function _getUniqueIdentifierStr() {
return getIncrementalInteger() + Math.random().toString(16).substr(2);
}
// generate a random string (to be used as a dynamic JSONP callback)
exports.getUniqueIdentifierStr = _getUniqueIdentifierStr;
/**
* Returns a random v4 UUID of the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx,
* where each x is replaced with a random hexadecimal digit from 0 to f,
* and y is replaced with a random hexadecimal digit from 8 to b.
* https://gist.github.com/jed/982883 via node-uuid
*/
exports.generateUUID = function generateUUID(placeholder) {
return placeholder
? (placeholder ^ Math.random() * 16 >> placeholder / 4).toString(16)
: ([1e7] + -1e3 + -4e3 + -8e3 + -1e11).replace(/[018]/g, generateUUID);
};

getUniqueIdentifierStr() is used for adId/bidId/requestId, and bidderRequestId.
generateUUID() is used for auctionId and transactionId.

Two suggested solutions are to either use Crypto.getRandomValues where available, or instead include a Mersenne Twister implementation (eg. https://gist.github.com/banksean/300494).

@GLStephen
Copy link
Collaborator Author

GLStephen commented Jul 7, 2018

@ScottKevill That is what I determined from reading the code and some of the same posts you mention above. Crypto.getRandomValues seems to have broad support. I'll look at patching it in for auctionId. Within the scope of an auction the current level of uniqueness appears to suffice. Performance difference between the two appears to be sub millisecond on a typical browser.

@GLStephen
Copy link
Collaborator Author

GLStephen commented Jul 9, 2018

@mkendall07 has this been discussed? Any reason not to improve this? It would make analytics much more accurate without having to rely only on streaming or other temporal analysis and we've found IDs repeating very near one another recently that have caused issues with that approach.

Happy to submit a PR on this. I've done some initial testing and a single GUID for at least auctionId would not be terribly performance impacting. The other IDs in that context should then have sufficient uniqueness.

The same gist used for the current version has a crypto.getRandomValues version also. https://gist.github.com/jed/982883

@GLStephen
Copy link
Collaborator Author

@dbemiller do you have any insight on this? I'd like to make PR for a change, but don't want to rehash old ground if there is existing discussion or work in the area which I could leverage.

@dbemiller
Copy link
Contributor

dbemiller commented Jul 10, 2018

I'm not aware of any previous discussions... but I probably wouldn't have been involved in the project that long ago.

My thoughts are below:


crypto.getRandomValues sounds good to me. I personally think performance is overrated, and if we profiled this and found out that it was a bottleneck, it would be very easy to optimize (since it's limited to a single function & has no architectural impact). That said, your comment:

Performance difference between the two appears to be sub millisecond on a typical browser.

is only fair if these functions are called a few times on a page. If you're worried about performance, it's also worth seeing how many calls are made when loading a typical page.


I also skimmed the most popular UUID-generating library that I could find, and it does use crypto whenever it's available.

Two other things worth noting:

  1. Their Math.random() fallback makes 16 calls to Math.random(), whereas Prebid's only makes 1. 2. Their code comments say "if all else fails, use Math.random(). It's fast, but is of unspecified quality."

This suggests two things:

  1. 16 Math.random() calls are still faster than a single crypto one.
  2. They still decided that the performance hit was worth it.

I don't think the Mersenne twister is a good option here. If you don't give it a seed, it uses use the current time. Since many users view your page simultaneously, I expect you would still see quite a few conflicts this way.

If we did give it a seed, would we generate that seed with Math or crypto? It hasn't solved the problem--just pushed it back a level.

@ScottKevill
Copy link

Not exactly, it's nuanced.

Even if Mersenne Twister were seeded with the current time, it would still be an improvement on Math.random, as Math.random has more problems than just its initial seed.

That said, the only reason to use Mersenne Twister would be if performance were a concern -- which I highly doubt, given how few random values are required with Prebid.

If enough random values were required that performance became a problem, then you'd seed Mersenne Twister with crypto.getRandomValues and only take the hit once. In this situation, if crypto.getRandomValues were not available, you could fall back to seeding with the current time and/or Math.random. The likelihood of collisions would still be lower than using time alone, because two users would need to be concurrent and be both missing browser support for crypto.getRandomValues.

TL;DR: Use crypto.getRandomValues and if anyone cares, Math.random where that is not available.

@dbemiller
Copy link
Contributor

Agreed with all that ^^

@mkendall07
Copy link
Member

hey all. Sorry was late to this thread. I'm in agreement with the suggested approach.
@GLStephen Feel free to open a PR. thanks!

@stale
Copy link

stale bot commented Jul 25, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 25, 2018
@stale stale bot closed this as completed Aug 1, 2018
@GLStephen
Copy link
Collaborator Author

@mkendall07 @dbemiller PR incoming after we finish internal testing, probably shouldn't be marked stale

@dbemiller dbemiller removed the stale label Aug 1, 2018
@dbemiller dbemiller reopened this Aug 1, 2018
@mkendall07
Copy link
Member

@GLStephen any updates on this?

@GLStephen
Copy link
Collaborator Author

@mkendall07 PR is incoming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants