Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse of createHash initializations #25857

Closed
ppKrauss opened this issue Jan 31, 2019 · 15 comments
Closed

Reuse of createHash initializations #25857

ppKrauss opened this issue Jan 31, 2019 · 15 comments
Labels
crypto Issues and PRs related to the crypto subsystem. duplicate Issues and PRs that are duplicates of other issues or PRs. feature request Issues that request new features to be added to Node.js.

Comments

@ppKrauss
Copy link

Reuse is a fundamental issue...

const crypto = require('crypto')
const H = crypto.createHash('md5')  // NEED REUSE THIS INITIALIZATION
console.log( H.update('some data to hash').digest('hex') ) // first ok...
console.log( H.update('some2').digest('hex') )  // ugly "Error: Digest already called"!

The problem that I am trying to solve: to reuse initialization.

The desired behavior: no error.

... No alternative solutions, but please, show me if there are some elegant one.

@ppKrauss ppKrauss changed the title Reuse of hash initialization Reuse of createHash initializations Jan 31, 2019
@sam-github
Copy link
Contributor

Could you describe what it is you need, and why?

Your example isn't helpful. I don't know if you want openssl style digest reuse, where after _final() the only call that can be made is an init, after which new data can be digested, or whether you want to continue on hashing new data as if the digest had not been extracted.

The latter would require calling EVP_MD_CTX_copy() every time before getting the digest value, so would be prohibitively expensive. It would be possible to expose a .copy() method on digests, if that's what you want, and you have a motivating use-case.

@sam-github sam-github added crypto Issues and PRs related to the crypto subsystem. feature request Issues that request new features to be added to Node.js. labels Jan 31, 2019
@ppKrauss
Copy link
Author

ppKrauss commented Jan 31, 2019

Hi @sam-github ,

Your example isn't helpful

any loop needs reuse:

const crypto = require('crypto')
const H = crypto.createHash('md5')  // NEED REUSE THIS INITIALIZATION

for (let a of  ["Ana Santos", "José Silva", "Bryan Lee", "Maria Silva", "Vitor Santos"]) 
    // ... or suppose looping over 10000 items
    console.log( a,  H.update(a).digest("hex") )  // ugly error

But the only valid (no error) syntax option is:

const H = require('crypto').createHash
for (let a of  ["Ana Santos", "José Silva", "Bryan Lee", "Maria Silva", "Vitor Santos"]) 
    console.log( a,  H('md5').update(a).digest("hex") )
    // of course, no error, but no reuse.

Please, show an example of reuse using the cited .copy().


Important: there are many crypto.getHashes() algorithms... This question is about only one, sha256, there are 2,493 pageviws of users that did not understand "Digest already called" error... This error not make sense: because reuse make sense.

@sam-github
Copy link
Contributor

This still does not answer my question, you provide code, but no explanation of how you want the code to work.

I don't know if you want openssl style digest reuse, where after _final() the only call that can be made is an init, after which new data can be digested, or whether you want to continue on hashing new data as if the digest had not been extracted.

Specifically, will iteration 2 output the hash of "José Silva", or of "Ana SantosJosé Silva"?

@ppKrauss
Copy link
Author

ppKrauss commented Feb 1, 2019

@sam-github sorry, I edited with complete example... But really not understand what you show as solution.

@sam-github
Copy link
Contributor

Specifically, will iteration 2 output the hash of "José Silva", or of "Ana SantosJosé Silva"?

It appears you want iteration 2 to output the hash of "José Silva", in other words a call to .digest() will reset the internal state back to the initial just-constructed state, is this correct?

But really not understand what you show as solution.

I offered no solution. Until now, I wasn't even sure what the problem was.

I'll try to find some time to benchmark initialization to see what kind of performance gains are possible.

Providing your own benchmarks might help motivate someone to pick this up, or you could PR the change. It seems reasonable to me.

@bnoordhuis
Copy link
Member

a call to .digest() will reset the internal state back to the initial just-constructed state

It would be kind of odd (in a 'no parity' sense) to have that work for hashes but not e.g. HMACs.

(You could theoretically make it work for HMACs by retaining the secret but I don't think we should.)

@sam-github
Copy link
Contributor

Differing from HMAC would be a definite downside.

I notice that the openssl APIs have the ability to reuse the HASH context. I assume this is for use-cases where a HASH with some params is created, then used over and over to digest large numbers of high-throughput blocks/records/whatever, and avoid having to do a digest-init for each one.

That OpenSSL does it suggests it might be useful, but whether it is relevant to any node users, or even noticable if it was done is an open question. I keep hoping someone will come up with a use-case backed by benchmarks showing that this would have a significant impact for them, but I haven't seen it yet.

@tniessen
Copy link
Member

I did a quick'n'dirty benchmark to see how much time createHash() needs in the following snippet (data is a buffer):

let hash = crypto.createHash('sha256');
hash.update(data);
hash.digest();
data createHash('sha256')
32 B 52%
256 B 40%
1 KB 25%
2 KB 16%
4 KB 9%
8 KB 5%
16 KB 3%
64 KB 0.1%

No surprises so far, the initialization time only matters for small inputs, and probably even less when dealing with strings instead of Buffers. Three questions arise:

  1. How many applications need to compute large numbers of cryptographic hash values of small inputs?
  2. What is the primary cause for the time spent, is it node or OpenSSL? (Likely node / v8.)
  3. Is there another way to solve the problem? (e.g. pooling)

@ppKrauss
Copy link
Author

ppKrauss commented Feb 22, 2019

Thanks @tniessen, perfect (!), now will be possible do finish this issue... after some discussion.

The main justification is not performance, but something like "syntax usability enhancement", as I commented before:
"... this question is about only one of many, the sha256, there are 2,650 pageviws of users that did not understand 'Digest already called' error... This error not make sense: because reuse make sense".

About performence: ok, the gain is 0 or, sometimes, greater tham zero. Is good, any gain will be a sum, a better justification, little better cost-benefit.

@tniessen
Copy link
Member

@ppKrauss You have expressed your personal opinion quite strongly, both here and on StackOverflow. Please keep the discussion productive with constructive arguments instead of repeating that things are "ugly" and that "reuse makes sense".

If the only benefit is "syntax usability enhancement", then consider the downsides:

@AshDee
Copy link

AshDee commented Apr 25, 2019

Regarding this thread.

In my code, I try to create hashes by reading different files by using createReadStream. But It is only possible to read one file. If I try to read the second file I get the error Digest already called.

From the documents, I came to know that digest can be used only once.

So, is there a workaround to read each file and generate corresponding hashes for each file?

As @sam-github had mentioned in his previous comment-

It appears you want iteration 2 to output the hash of "José Silva", in other words, a call to .digest() will reset the internal state back to the initial just-constructed state, is this correct? -

I would like to have something like this

@tniessen
Copy link
Member

In my code, I try to create hashes by reading different files by using createReadStream. But It is only possible to read one file. If I try to read the second file I get the error Digest already called.

@AshDee I don't understand, why don't you just create a new instance using createHash for each file? The additional overhead is minimal compared to IO delays.

@AshDee
Copy link

AshDee commented Apr 26, 2019

In my code, I try to create hashes by reading different files by using createReadStream. But It is only possible to read one file. If I try to read the second file I get the error Digest already called.

@AshDee I don't understand, why don't you just create a new instance using createHash for each file? The additional overhead is minimal compared to IO delays.

Thank You @tniessen .I'll try it out...

@bnoordhuis
Copy link
Member

I'm closing this out in favor of #29903.

@bnoordhuis bnoordhuis added the duplicate Issues and PRs that are duplicates of other issues or PRs. label Oct 9, 2019
@hippyaki
Copy link

hippyaki commented Mar 6, 2022

Hi @sam-github ,

Your example isn't helpful

any loop needs reuse:

const crypto = require('crypto')
const H = crypto.createHash('md5')  // NEED REUSE THIS INITIALIZATION

for (let a of  ["Ana Santos", "José Silva", "Bryan Lee", "Maria Silva", "Vitor Santos"]) 
    // ... or suppose looping over 10000 items
    console.log( a,  H.update(a).digest("hex") )  // ugly error

But the only valid (no error) syntax option is:

const H = require('crypto').createHash
for (let a of  ["Ana Santos", "José Silva", "Bryan Lee", "Maria Silva", "Vitor Santos"]) 
    console.log( a,  H('md5').update(a).digest("hex") )
    // of course, no error, but no reuse.

Please, show an example of reuse using the cited .copy().

Important: there are many crypto.getHashes() algorithms... This question is about only one, sha256, there are 2,493 pageviws of users that did not understand "Digest already called" error... This error not make sense: because reuse make sense.

@ppKrauss Love you mate. Awesome solution with that. Works GREAT for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crypto Issues and PRs related to the crypto subsystem. duplicate Issues and PRs that are duplicates of other issues or PRs. feature request Issues that request new features to be added to Node.js.
Projects
None yet
Development

No branches or pull requests

6 participants