feat: incremental-hasher #261

Gozala · 2023-07-19T17:38:14Z

Proposal for the #260

alanshaw

LGTM - I left some suggestions.

alanshaw · 2023-07-20T08:40:16Z

src/hashes/interface.ts

+  digest(): Digest
+
+  /**
+   * Computes the digest of the given input and writes it into the provided


of the given input

Do you mean "of the bytes written so far"?

alanshaw · 2023-07-20T08:57:02Z

src/hashes/interface.ts

+   * can be use to control whether multihash prefix is written, if `false`
+   * only the raw digest writtend omitting the prefix.
+   */
+  digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this


Return output? Not sure how useful it is to chain this method.

Suggested change

digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this

digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): Uint8Array

Not sure if there's a better convention for method name when receiving a parameter to mutate. digestInto is a bit awkwardly named (IMO!).

digestBYOB? 🙄

asMultihash - do we need? The point of this library is multiformats.

I went into this in the linked issue. In practice I have encountered many instances where I do need to leave out the prefix, I could go and use another library in those instances, but seems like we could just expose this. That sayid I agree that this method is kind of meh and we could do better.

@vasco-santos suggested having a whole another method, personally I'm wondering if perhaps we should have methods just to get digest without a prefix as this is low level API anyway ?

Suggested change

digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this

// multihash prefix for it

header: Uint8Array

// only writes the digest without a prefix

digestInto(output: Uint8Array, offset?: number): this

Or alternatively something like this

Suggested change

digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this

encodeMultihashInto(target: Uint8Array, offset?: number): this

encodeMultihashHeaderInto(target: Uint8Array, offset?: number): this

encodeDigestInto(target: Uint8Array, offset?: number): thsi

digestInto is a bit awkwardly named (IMO!).

I'm happy to call this something, else I was trying to align with varint.encodeInto, which as it turns out was called varint.encodeTo 😅

digestBYOB? 🙄

Works for me although I had to google to figure out what BYOB stand for.

Alternatively we could just have read(bytes: Uint8Array, offfset?: number): this

Return output? Not sure how useful it is to chain this method.

I personally find mutations that return values misleading, that said I'm amendable to the idea

I had to google to figure out what BYOB stand for.

BYOB was aiming for familiarity with https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader etc. but I guess that was a miss 😆

alanshaw · 2023-07-20T09:12:19Z

src/hashes/interface.ts

+  /**
+   * Number of bytes that were consumed.
+   */
+  count(): bigint


I mean if someone is hashing >9PiB of data in JS then 👏👏👏.

yeah ... is this overkill?

Gozala · 2023-07-20T17:57:43Z

src/hashes/interface.ts

+  digest(): Digest
+
+  /**
+   * Computes the digest of the given input and writes it into the provided


Suggested change

* Computes the digest of the given input and writes it into the provided

* Computes the digest of the bytes written so far and writes it into the provided

Gozala · 2023-07-20T19:31:29Z

I'm realizing now that interface here is intentionally non-destructive, as in I could compute digest over and over. Unfortunately node crypto APIs are destructive though

The Hash object can not be used again after hash.digest() method has been called. Multiple calls will cause an error to be thrown.

Instead node provides copy method so you could continue writing into the hasher copy.

Given this the case with node, proposed API seems impractical, perhaps instead we could also introduce same constraint and copy() method. Better implementations could return same instance from the copy while ones that wrap node crypto APIs would avoid making a copy just in case.

On the other hand copy on digest maybe negligible overhead, in which case API without copy would be nicer.

alanshaw · 2023-07-22T14:38:33Z

src/hashes/interface.ts

+  /**
+   * Writes bytes to be digested.
+   */
+  write(bytes: Uint8Array): this


Typically in streaming hashers this is called update.

I'm fine with calling it update although I do find that name confusing personally as I think of update as overwrite as opposed to append.

Right, but streaming hashers aren't appending to a buffer they are updating their internal state with the new data you pass.

alanshaw · 2023-07-22T14:40:54Z

What about digest() -> multihash and rawDigest() -> digest without multihash header?

alanshaw · 2023-08-04T15:31:24Z

I'd really like this to land! I find myself wanting to do this more and more...

rvagg · 2023-08-11T04:30:11Z

src/hashes/interface.ts

+   *
+   * @param [offset=0] - Byte offset in the `target`.
+   */
+  readDigest(target: Uint8Array, offset?: number): this


can you describe the use-case for this? it seems like this makes it an onerous API to have to implement

Sorry, my mistake, this is the output function!

I think maybe the naming could be better here. We have ample precedent of digest() in JS-land, so we could have digest() and multihash() (or multihashDigest() if you want to be more explicit). In Go-land Sum() is the standard for this action, which has grown on me to make sense (though it's taken time!).

Oh, I also see I'm discussing history here - read* being the new versions? I'm not a fan. I also wonder whether we could have nicer APIs that don't require you to pass in a target? I understand that's an important part of this, for efficiency, but casual use typically just wants that done for you. So could the APIs take a target? instead and always return Uint8Array? So you can either choose to supply the bytes to write in to (with optional offset) or not supply one, but either way you get back some bytes.

read* being the new versions? I'm not a fan.

I mean if you think of it as a transform stream, it makes sense to have write and read ops. I don't mind renaming it to something else, but please don't make me come up with a name that everyone will like.

I also wonder whether we could have nicer APIs that don't require you to pass in a target? I understand that's an important part of this, for efficiency, but casual use typically just wants that done for you. So could the APIs take a target? instead and always return Uint8Array? So you can either choose to supply the bytes to write in to (with optional offset) or not supply one, but either way you get back some bytes.

I'm not completely opposed to returning back the target, however I would caution against it as it mixes two very different modes into one and can also lead to mistakes (e.g. you may have passed undefined reference which will no through but happily give you back Uint8Array)

Idea was that if you want to compute digest you just call digest method and use this only in those rare cases when you need to work with slabs of memory.

Gozala · 2023-08-12T06:52:52Z

For what it's worth Rust Multihash hasher has a similar interface to one proposed here

/// Trait implemented by a hash function implementation.
pub trait Hasher {
    /// Consume input and update internal state.
    fn update(&mut self, input: &[u8]);

    /// Returns the final digest.
    fn finalize(&mut self) -> &[u8];

    /// Reset the internal hasher state.
    fn reset(&mut self);
}

Gozala · 2023-08-12T06:54:23Z

src/hashes/interface.ts

+  /**
+   * Number of bytes that were consumed.
+   */
+  count(): bigint


Suggested change

/**

* Number of bytes that were consumed.

*/

count(): bigint

Let's just drop this method, we can revisit if we find it really necessary.

Gozala added 3 commits July 19, 2023 10:29

feat: incremental-hasher

90c8afa

add MulticodecCode type

83513a8

import MulticodecCode

d9ffdbf

Gozala requested review from rvagg and achingbrain July 19, 2023 17:38

import as type

bdb1049

Gozala mentioned this pull request Jul 19, 2023

Proposal: StreamMultihasher interface #260

Open

alanshaw reviewed Jul 20, 2023

View reviewed changes

Gozala commented Jul 20, 2023

View reviewed changes

address some feedback

2b7743f

alanshaw reviewed Jul 22, 2023

View reviewed changes

rvagg reviewed Aug 11, 2023

View reviewed changes

Gozala commented Aug 12, 2023

View reviewed changes

achingbrain force-pushed the master branch from 6140796 to 56bbb96 Compare December 20, 2023 08:31

alanshaw mentioned this pull request Jan 22, 2024

feat: streaming sha256 CAR hash web3-storage/ipfs-car#162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incremental-hasher #261

feat: incremental-hasher #261

Gozala commented Jul 19, 2023

alanshaw left a comment

alanshaw Jul 20, 2023

alanshaw Jul 20, 2023

alanshaw Jul 20, 2023

alanshaw Jul 20, 2023

Gozala Jul 20, 2023 •

edited

Gozala Jul 20, 2023

Gozala Jul 20, 2023

alanshaw Jul 22, 2023

alanshaw Jul 20, 2023

rvagg Aug 11, 2023

Gozala Jul 20, 2023

Gozala commented Jul 20, 2023

alanshaw Jul 22, 2023

Gozala Aug 12, 2023

alanshaw Aug 12, 2023

alanshaw commented Jul 22, 2023 •

edited

alanshaw commented Aug 4, 2023

rvagg Aug 11, 2023

rvagg Aug 11, 2023

rvagg Aug 11, 2023

Gozala Aug 12, 2023

Gozala commented Aug 12, 2023

Gozala Aug 12, 2023

	digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this
	digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): Uint8Array

-  digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this
+  // multihash prefix for it
+  header: Uint8Array
+   // only writes the digest without a prefix
+  digestInto(output: Uint8Array, offset?: number): this

-  digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this
+encodeMultihashInto(target: Uint8Array, offset?: number): this
+encodeMultihashHeaderInto(target: Uint8Array, offset?: number): this
+encodeDigestInto(target: Uint8Array, offset?: number): thsi

	* Computes the digest of the given input and writes it into the provided
	* Computes the digest of the bytes written so far and writes it into the provided

feat: incremental-hasher #261

Are you sure you want to change the base?

feat: incremental-hasher #261

Conversation

Gozala commented Jul 19, 2023

alanshaw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala Jul 20, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala commented Jul 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanshaw commented Jul 22, 2023 • edited

alanshaw commented Aug 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala commented Aug 12, 2023

Choose a reason for hiding this comment

Gozala Jul 20, 2023 •

edited

alanshaw commented Jul 22, 2023 •

edited