Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial version of redis-codegen #1

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Conversation

valkum
Copy link

@valkum valkum commented Jun 16, 2022

redis-codegen generates medium level API interfaces based on the redis
command documention provided as commands.json from the redis-doc
repository.

redis-codegen generates medium level API interfaces based on the redis
command documention provided as commands.json from the redis-doc
repository.

Currently the generated commands still use the implement_commands macro
and are included using the include macro.

There are some breaking changes trough this. All shorthand medium level
commands from the old definition needs to be moved out and
reimplemented. Probably named as high level api.
This also adds _single fn variants that do not take a slice, as a
shorthand.
redis-code now generate a single module per type of implementation.
The traits Commands and AsyncCommands are implemented in the commands
and async_commands module.
Cmd gets a additional implementation in the command module.
The same holds for pipeline and cluster_pipeline.

Token enum generation is WIP.
We generate a tuple based interface for block arguments now. This is
over specific, but should provide a stricter API.
Copy link
Owner

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this is looking pretty great! I think we can probably do a PR to the real project soon.

Can you make a list of open questions/issues that we should discuss?

Cargo.toml Outdated
@@ -13,6 +13,12 @@ edition = "2018"
all-features = true
rustdoc-args = ["--cfg", "docsrs"]


[workspace]
members = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: let's just put this on a single line?

serde_path_to_error = {version = "0.1"}
thiserror = "1.0"
log = "0.4"
itertools = "0.10"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's going on with the indentation here?

(Also, would be nice to have trailing newlines in all the TOML files.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the current state needs some more cleanup at various points. The commits are more or less just savepoints after a days worth of work at this point. There are some parts that changed drastically from commit 1 and I am thinking of squashing the commits for a reviewable PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think squashing makes a lot of sense here.

}
}

const BASE_URI: &str = "https://raw.githubusercontent.com/redis/redis-doc/master/commands.json";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused. What's the reason to keep it here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to opentelemetry-rust I want to add a test that is ignored by default which downloads an up to date version of the commands.json. But that had low priority and therefore I have not implemented that yet.


// Compare the old version and new version and fail the test if they're different.

if versions[0] != versions[1] {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return if they're the same instead? Saves a level of indentation.

Comment on lines 25 to 26
// the following keywords are not supported as raw identifiers and are therefore suffixed with an underscore.
"self" | "super" | "extern" | "crate" => ident += "_",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually an issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this time. I can remove that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's ditch those for now.

@valkum
Copy link
Author

valkum commented Jun 20, 2022

Hey, this is looking pretty great! I think we can probably do a PR to the real project soon.

Can you make a list of open questions/issues that we should discuss?

I will work on the oneof support and token creation tomorrow and will collect a list of open todos and questions after that.

@valkum
Copy link
Author

valkum commented Jun 22, 2022

Alright. I implemented a first version of argument token generation. token.rs should now contain all oneof definitions and arguments that have a token associated with it.
The enums and structs there implement all ToRedisArgs and write the token to the connection before their value. Currently these are not enforced to use in the other generated APIs.

No to open TODOs:

  • Currently I parse everything from the commands.json but not everything is used, leading to some clippy warnings.
  • We do not utilize the command key spec currently. I have not found a usecase for this at the current state.
  • The old implement_commands!() macro generated some function bodies with std::mem::replace usage. I guess these were for move related behavior which a newer compiler is able to figure out by itself? At least I cannot come up with a need currently. I implemented the generation without std::mem:replace in the current version but left the old way in as a comment. If you agree, I can remove comment with the old way.
  • I implemented the basic functionality to generate alias commands to allow for some backwards compatibility. These get a deprecation tag and need the redis-rs version entered there.
  • Make the command* APIs aware of the now generated tokens for a true high level API.
  • We probably want to sanitize comments similar to prost right?
  • I am not 100% happy with the Box<dyn Generator> usage in code_generator/mods.rs. Maybe you have a better pattern in mind?
  • Do we want to allow for a better usability for the types in token.rs by adding traits like ToRedisString, ToRedisKey? These currently have underlying types based on the redis RESP (String, i64, f64)
  • The functions that return an iterator are currently handwritten. I hope that is alright.
  • I am not 100% happy with the way the modules are laid out in redis_codegen::code_generator. Maybe you have a cleaner idea?

@djc
Copy link
Owner

djc commented Jun 23, 2022

If you agree, I can remove comment with the old way.

Sounds good to me!

We probably want to sanitize comments similar to prost right?

Can you be more specific? I guess so but not exactly sure.

I am not 100% happy with the Box<dyn Generator> usage in code_generator/mods.rs. Maybe you have a better pattern in mind?

Can you make a review comment in the relevant spot with specific suggestions on why you're not happy yet?

Do we want to allow for a better usability for the types in token.rs by adding traits like ToRedisString, ToRedisKey? These currently have underlying types based on the redis RESP (String, i64, f64)

Adding traits how? I think this too would benefit from a review comment in context, so it's easier for me to review.

The functions that return an iterator are currently handwritten. I hope that is alright.

Sure, that's fine. Do you think we could propose a way upstream to make that information part of commands.json?

I am not 100% happy with the way the modules are laid out in redis_codegen::code_generator. Maybe you have a cleaner idea?

Here too, can you be more specific about your concerns?

Copy link
Author

@valkum valkum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions that return an iterator are currently handwritten. I hope that is alright.

Sure, that's fine. Do you think we could propose a way upstream to make that information part of commands.json?

I guess that could be added. Maybe a flag in the hints is enough? There is request_policy:special but it seems this just means there is some kind of special handling needed.

We probably want to sanitize comments similar to prost right?

Can you be more specific? I guess so but not exactly sure.

Currently we take the strings from the commands.json to render docs. Prost sanitizes urls (adds < and > and escapes [ and ]) See https://github.com/tokio-rs/prost/blob/master/prost-build/src/ast.rs#L144. I guess we should add this here as well for our docs.

I am not 100% happy with the way the modules are laid out in redis_codegen::code_generator. Maybe you have a cleaner idea?

Here too, can you be more specific about your concerns?

This is related to the files/modules in the code_generator directory. I guess we could move all generators (the ones with the _generator suffix) into a new directory/module to get this a little bit cleaner, but I guess this is purely cosmetic regarding discoverability and maintainability.

I hope the inline comments are more helpful.

kind: GenerationKind::Full,
};

let generation_unit: Box<dyn Generator> = match generation_type {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one point I am not particular happy with. The Box seem overkill for this match.
I guess if we match here we can just directly call the generate fn on these different generators.

/// Redis Type: Filterby
pub enum Filterby {
/// MODULE
Module(String),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow for a better usability for the types in token.rs by adding traits like ToRedisString, ToRedisKey? These currently have underlying types based on the redis RESP (String, i64, f64)

Adding traits how? I think this too would benefit from a review comment in context, so it's easier for me to review.

In this file in general we have types that just wrap some kind of base type. I am thinking of making this generic with a trait bound of ToRedisString so that I can put any type (that implements ToRedisString) in my code into this enum and don't have to convert it into a String first. Similar for the other types (i64, f64, and probably Keys which are currently Strings too).
But that might increase compilation times a lot as there are many types here and most of them are candidates to get a generic.

@valkum
Copy link
Author

valkum commented Jun 26, 2022

While shortly looking into adding the generated types as arguments to the API, I think I found some flaws in my current implementation. I currently check if we already generated a type by that name. I don't think I can assume uniqueness based on the commands.json. I will look into splitting the types into submodules base on the group of the command. For this the argument names or token names should be unique for each group. I probably have time on Tuesday and will look into that. I don't want to end up with a module per command, so we might need a smart way to avoid that when the arguments are not unique across a command group. We could add the command name itself to the name of the types, but I don't think that will result in a good usable API.

One quick example is the count argument from GEOSEARCHSTORE and from HRANDFIELD

This also fixes some logical flaws with the previous implementation.
Previously when two types hat the same token name or in the worst case
the same argument name, an identical type was created.

This adds a TypeRegistry which creates a submodule when such an
duplicate type is encountered. It also helps to resolve types when
generating code.
This solution is pretty regarding the names of modules, but at least
this works for now.
@valkum
Copy link
Author

valkum commented Jun 30, 2022

I fixed the issue with duplicate types by adding a TypeRegistry which will create modules on the fly when it encounters a duplicate type name.
The API should use the generated types now and also fixes the problem with duplicate issued argument identifiers in the API functions.

I guess there is still space for improvement regarding ergonomic usage. Maybe use some heuristic to create mutliple functions for each command when there are mutual exclusive arguments. This would need some support in the command.json. Currently these are just marked as optional.

@djc
Copy link
Owner

djc commented Jun 30, 2022

I think this is very promising -- I guess my biggest question is how we might get to merge this. I think a key thing that's hard to see right now is how the API changes from the old version to the new version. Maybe it would be possible to somehow (semi-manually) generate a list of function signatures from the current source code and create a diff from that to the new function signatures?

@valkum
Copy link
Author

valkum commented Jul 3, 2022

I created a cleaned up overview of all functions in the Commands trait for the old and the new api.
Normal diff output is bad at showing the differences though. diff --word-diff is working better but not optimal. difftastic had a huge memory footprint and failed to produce output.

I created a gist with both versions so you can use diff tool you prefer yourself. I hope I did not delete any commands while cleaning the output up.
The gist can be found here: https://gist.github.com/valkum/ad83391b94e76c6a684136dae79ea278
The aligned versions should output better diffs. I aligned the commands by line by hand.

@valkum
Copy link
Author

valkum commented Jul 17, 2022

A short recap of my talks with upstream:
They are happy that someone is using their json spec. They propose to query the commands.json from a redis instance instead of using the hosted version from their repository. This is something we can definitely add and replace the download "test".

Their reply schema PR looks pretty good. It is based on JSON schema which should allow us to generate types for the responses.
The question of de-duplicating response types would be important there to. So maybe we can come up with a generic way to generate these types and solve name clashes. (as posted earlier having a module for each command would probably yield a lot of identical types and is a waste of computation time when compiling redis-rs).

I proposed some additions to their PR to improve the generation of a more ergonomic high level API. Sadly before we can utilize this we would need add support for RESP3.

I am awaiting some feedback on adding a cursor hint.

I have resolved the CLIENT KILL mis-documentation and we can remove that from our denylist.

Redis Maintainers asked to use the output of a working redis instance to
generate the commands.json ourselves.
This adds a built_commands_json module which is a more typesafe copy of
the python script redis uses to generate the commands.json in the docs
repo.
This adds support to automatically generate cursor based commands using
an iterator result.
This currently is not finished, as the command json generator is missing
support for optional and multiple arguments.
The commands.json generation based on the redis-cli output now correctly
computes subcommands.
@valkum
Copy link
Author

valkum commented Sep 5, 2022

I need to retouch the type registry. The generation of the types (and deduplication) seems to be nondeterministic currently. Probably caused by iterating over a HashMap.
After that I plan to update all the tests and examples and do some small finishing touches before I submit a squashed PR to the upstream repo for broader feedback.

@djc
Copy link
Owner

djc commented Sep 7, 2022

Note, upstream now has optional support for the RedisJSON module. Should figure out how that fits into this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants