Add ability to peek at the next decoder character #18

brandonf2002 · 2024-03-03T01:11:26Z

Hi, I'm currently writing a parser/lexer for a language that makes use of UTF-8 characters as part of its grammar and thought it would be useful to be able to "peek" at the next character in the decoder stream without advancing the decoder position forward.

There are plenty of work arounds for this situation but I thought it might be a useful feature to have built in.

I am proposing a function that would look like:

val decoder_peek : decoder ->
  [ `Await | `Uchar of Uchar.t | `End | `Malformed of string]

I would be more than happy to write this PR myself, I am just making this issue to see if a patch for this would be accepted.

dbuenzli · 2024-03-03T19:57:01Z

Thanks for making an issue first. The answer is no. For the following reasons:

It's not too hard to integrate that lookahead in your lexer or parser state. See here for an example.
In general uutf is in maintenance mode, I don't plan to add any new feature and I encourage people to directly use the stdlib decoders that have been integrated in OCaml 4.14.0 (Add UTF codec and validations support to the Stdlib ocaml/ocaml#10710). They are also likely more performant (at least they allocate less on decodes).

dbuenzli added enhancement wontfix labels Mar 3, 2024

dbuenzli closed this as completed Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to peek at the next decoder character #18

Add ability to peek at the next decoder character #18

brandonf2002 commented Mar 3, 2024

dbuenzli commented Mar 3, 2024

Add ability to peek at the next decoder character #18

Add ability to peek at the next decoder character #18

Comments

brandonf2002 commented Mar 3, 2024

dbuenzli commented Mar 3, 2024