Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The output of the md5(), sha1() and sha256() generators is misleading #759

Open
TimWolla opened this issue Sep 15, 2023 · 4 comments · May be fixed by #880
Open

The output of the md5(), sha1() and sha256() generators is misleading #759

TimWolla opened this issue Sep 15, 2023 · 4 comments · May be fixed by #880
Labels
Projects

Comments

@TimWolla
Copy link

Summary

The md5(), sha1() and sha256() generators are documented to return a random MD5, SHA-1 or SHA-256 hash respectively. This is technically true, but at the same time it's also misleading, because the functions are unable to leverage the full output space of the respective hashes the way they're written. They don't return a random hash, they return the hash of a random 32 bit integer which is something entirely different.

I'm skipping the remainder of the template, because the issue is evident by looking at the code:

/**
* @example 'cfcd208495d565ef66e7dff9f98764da'
*
* @return string
*/
public static function md5()
{
return md5(self::numberBetween());
}
/**
* @example 'b5d86317c2a144cd04d0d7c03b2b02666fafadf2'
*
* @return string
*/
public static function sha1()
{
return sha1(self::numberBetween());
}
/**
* @example '85086017559ccc40638fcde2fecaf295e0de7ca51b7517b6aebeaaf75b4d4654'
*
* @return string
*/
public static function sha256()
{
return hash('sha256', self::numberBetween());
}

Possible solution:

Replace the implementation by bin2hex(random_bytes($bytes)) with $bytes being 16, 20 and 32 to make use of the entire output space [1]. But even then it would be slightly misleading, because hexadecimal is just one possible encoding for an 128/160/256 bit integer. Returning raw bytes or base64 encoding would also be valid representations that are actually used in practice in the context of a cryptographic hash.

[1] With PHP 8.2 use Randomizer::getBytes().

@pimjansen pimjansen added this to To do in Version 2.0 via automation Sep 16, 2023
kcassam added a commit to kcassam/Faker that referenced this issue Apr 2, 2024
md5(), sha1() and sha256() generators is misleading
@kcassam kcassam linked a pull request Apr 2, 2024 that will close this issue
7 tasks
@kcassam
Copy link

kcassam commented Apr 2, 2024

Seems to me that there is no need to use bin2hex(). random_bytes($bytes) returns a raw string of the desired lenght and does a better job.

@TimWolla
Copy link
Author

TimWolla commented Apr 3, 2024

random_bytes($bytes) returns a raw string of the desired lenght and does a better job.

The functions return hexadecimal characters, thus the output from random_bytes() needs to be hex encoded.

@kcassam
Copy link

kcassam commented Apr 3, 2024

The functions return hexadecimal characters, thus the output from random_bytes() needs to be hex encoded.

I don't see why.
Miscellaneous::md5, Miscellaneous::sha1 and Miscellaneous::sha256 return hexadecimal characters independently of their parameter/input

@TimWolla
Copy link
Author

TimWolla commented Apr 3, 2024

The bin2hex would replace the call to the hash function. Actually hashing is not necessary, the output is indistinguishable anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Version 2.0
  
To do
Development

Successfully merging a pull request may close this issue.

3 participants