Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'd like to volunteer! #291

Open
agluszak opened this issue Apr 10, 2023 · 13 comments
Open

I'd like to volunteer! #291

agluszak opened this issue Apr 10, 2023 · 13 comments

Comments

@agluszak
Copy link
Contributor

I'd like to help with the rewrite mentioned here: https://github.com/maciejhirsz/logos/releases/tag/v0.13 :) I already have some experience with logos' internals, at least the "frontend" part.

@maciejhirsz
Copy link
Owner

maciejhirsz commented Apr 10, 2023

Thank you!

So I had a bit of a think about this, I think the best way to proceed that will futureproof the rewrite for inlining would be to define states using a macro_rules! macro, something like:

fn lex(lex: &mut ::logos::Lexer<'s, Self>) {
    use logos::internal::{CallbackResult, LexerInternal};

    enum State {
        GoTo1,
        GoTo2,
        GoTo3,
    }

    macro_rules! goto {
        (GoTo1, $lex:ident) => {/* ... */};
        (GoTo2, $lex:ident) => {/* ... */};
        (GoTo3, $lex:ident) => {/* ... */};
    }

    let mut state = State::Goto3;

    // Assuming Goto3 is the initial state
    goto!(GoTo3, lex);

    loop {
        match state {
            State::GoTo1 => goto!(GoTo1, lex),
            State::GoTo2 => goto!(GoTo2, lex),
            State::GoTo3 => goto!(GoTo3, lex),
        }
    }
}

Just passing around the lex ident is likely not enough, I think we'll need at least one, possibly two variables in stack declared at the top of the function body to account for:

  1. A possible match that should be emitted instead of an error if current state doesn't find a match. It should replace this mess.
  2. usize value for lookahead that would replace the need to duplicate states possible combination of at and with that current macro is doing, as in:
    #[inline]
    fn goto3_at1_with3<'s>(lex: &mut Lexer<'s>) {
        match lex.read_at::<&[u8; 2usize]>(1usize) {
            Some(b"-z") => {
                lex.bump_unchecked(3usize);
                goto1_x(lex)
            }
            _ => _error(lex),
        }
    }

This should make the whole codegen much easier to reason about and debug, particularly in issues like #279. The whole passing context around can potentially be scrapped.

For starters I'd say do not care about performance. When I first got the 0.10 rewrite working it was chugging at maybe 20% of what it ended up running at eventually, so feel free to skip any complexity you feel makes things difficult, we can always add optimizations later once tests are green. Even then I think I'd be fine if the final product of the rewrite is tiny bit slower if it means it's actually 100% correct.

Let me know if you need any more pointers, or if you have any questions about anything to get going.

Powodzenia! 🥇

@Evrey
Copy link

Evrey commented Apr 10, 2023

Just a side note:

You mentioned other data structures like ropes that could be easily lexed after a rewrite. I just wanna tune in as someone who also has used Logos a few times to lex streams of binary data. For now this is largely undocumented magic, but i think it’s useful enough to advertise this in the future, and also design the Regex-replacing (yes, thanks!) DSL in a way that is binary-data-friendly.

I’ve used Logos more often to verify arbitrary user inputs than i have to perform actual lexing for some parser.

@maciejhirsz
Copy link
Owner

DSL in a way that is binary-data-friendly.

Well, the good thing about going that route is is that macros allow for a lot of neat syntax that's not easily done in regex. We could potentially abuse hexadecimal integer literals like 0xDEADBEEF to denote a byte slice &[0xDE, 0xAD, 0xBE, 0xEF] (on top of also using byte string literals like b"...").

@Evrey
Copy link

Evrey commented Apr 10, 2023

Perfect. =)

That DSL has pretty much all Rust tokens at its disposal. It could look like EBNF, ABNF, Rosie, what have you. It does rise the question, what »subpatterns« would look like in the new DSL, for i use subpatterns heavily. But as you mentioned, these are all questions for after the rewrite.

@jeertmans
Copy link
Collaborator

As mentioned in a previous email we exchanged @maciejhirsz, I’d love to give some help too :)

thanks for the detailed message above as well as the one in the release :)

@uael
Copy link

uael commented Apr 12, 2023

@maciejhirsz I have some old WIP in this fork https://github.com/uael/logos.
I hacked bases for regex group and other experiments like sublexers or error reporting with expected set of chars.
Feel free to do whatever you want with it!

@maciejhirsz
Copy link
Owner

@uael cheers, will have a look around.

Sublexing will hopefully become much easier once we can remove this cursed field in the rewrite. It should be trivial to do something like:

fn foo(lex: &mut Lexer<OuterToken>) {
    let lex: &mut Lexer<InnerToken> = lex.as();
}

Or even:

fn foo(lex: &mut Lexer<OuterToken>) {
    // Same as .next(), but but with a different token type
    let inner = lex.sublex::<InnerToken>();
}

We could even go as far and remove the Token generic from Lexer altogether, and instead make it turn into an iterator of a concrete token type via the IntoIterator trait.

There is Lexer::morph available currently, but it's a bit painful to use.

@maciejhirsz
Copy link
Owner

@agluszak have you started any work on this yet? I think I have a big enough time window to dedicate to it so I could at least start it.

@jeertmans
Copy link
Collaborator

On a side note @maciejhirsz, but related to this issue and to

If there is a living soul willing to undertake the big rewrite and help push this along faster, do reach out. I can at very least help explain what the hell current codegen is doing. I'd be also happy to give a person or two the ability to review and merge community PRs.

in v0.13, I'd be happy to help you maintain this project :-) (so being able to review issues and PRs)

@maciejhirsz
Copy link
Owner

@jeertmans thanks, everything you did so far was stellar, I've sent you an invite!

@jeertmans
Copy link
Collaborator

Nice, thank you, @maciejhirsz! I think finishing up #301 will improve future PRs, so I'll have a look at Clippy's warnings when I have time to make the CI happy.

Then, I would probably want to work on #311

Once that CI/doc step is done, I would be happy to help on the more technical parts (i.e., the macro), if you ever need help on that :-)

@DasLixou
Copy link

🙋 is there any issue / project you guys are working on where I could volunteer?

@jeertmans
Copy link
Collaborator

Hey @agluszak and @DasLixou, if you still want to help, please feel free to read through old issue and comment if they should be closed or how you could fix them. Also, v0.13 listed a few important ideas.

Do not forget to checkout latest version, i.e., v0.14!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants