Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move group management from generic to base oauthenticator #735

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

yuvipanda
Copy link
Collaborator

@yuvipanda yuvipanda commented Mar 29, 2024

Motivating use cases

External identity providers providing JupyterHub memberships is an extremely useful feature that should be present not just for GenericOAuthenticator but for all authenticators. But to do so in helpful ways, this PR considers two motivating use cases:

  1. A hub using Auth0OAuthenticator. In Auth0, scopes granted are what is used to provide the notion of 'can the user perform this task?', which can be used as group membership. This is what auth0 recommends, there is currently no other way to do 'groups' in Auth0. scope is inside auth_model, but not auth_state, since scope is granted each time the user is logged in.
  2. In GitHubOAuthenticator, we put the list of teams the user is in inside auth_state. This is the perfect piece of information to use for group membership.
  3. oauth_user gets put inside auth_state, and in general auth_state is a good place for this kinda group information to be in. Authenticators can put arbitrary stuff inside auth_state and use them as they wish.

Approaches considered and rejected

  1. Each authenticator figures out how groups work within it. This was tried out in gh: Allow setting JupyterHub groups from GitHub teams #739, and rejected by @minrk in Move group management from generic to base oauthenticator #735 (comment). I agree with this rejection.
  2. Allow group information to be picked up from the auth_model with a auth_model_groups_key. This would be same as the current claim_groups_key, but pick from the entire auth_model instead of just from the returned user object. This was the tack this PR was taking, primarily because I thought we needed it to handle use case 1 mentioned earlier. But turns out I was wrong - I had thought that scope was part of auth_model but not auth_state, but we do! And regardless, I also realized we don't expose auth_model anywhere, but we do expose auth_state. And I had a TODO for 'document what is in auth_model', and while trying to do that, decided we shouldn't expose that to configurable tweaks like this for now. So that was reverted in b337015 and a different approach was taken.

Approach this PR takes

The general approach to group management is:

  1. Authenticator puts something that may be relevant to group membership inside auth_state.
  2. There's an auth_state_groups_key that can be either a callable or dotted specification that generates a list of groups from something in auth_state.

This handles case (2) because list of teams is already in auth_state. And can handle (1) by us putting scope in some form inside auth_state. This also provides a clear extensible mechanism in the future for all group management - get it into auth_state (where it can be used for anything), and pick that out with auth_state_groups_key.

Backwards compatibility

  • The existing claim_groups_key behavior is preserved, by being passed on to auth_state_groups_key in the base. It has been marked as deprecated. This is not a backwards compatibility break.
  • All group related functionality now requires manage_groups to be True, which was not the case earlier. Before this, if manage_groups is false but any of the group related authorization functionality (allowed_groups and admin_groups) is used, they control group related behavior but don't show up as JupyterHub groups. This causes confusion, as the 'groups' field in the admin panel will be empty, and possible other group related behavior (such as future profile list filtering, for example) would not respect these groups. We basically would end up with two group concepts - First class JupyterHub groups (which will show up in admin panel, API, can be edited by admins, etc) as well as second class 'Authenticator' groups (which are only used for authorization and 'disappear' after that). I think this is unnecessary complication, and this is a good time to remove this distinction. Now, any kind of group related authorization functionality requires manage_groups to be True, and we are back to having only one notion of 'group'. We also remove the confusing part where you may have allowed_groups set to something, manually modify the groups the user is a part of in JupyterHub admin, and it silently has no effect. This is a breaking change for people who used groups functionality but set manage_groups to be False. However, I think that usage is fairly minor, because of the confusing behavior it causes. I have added the 'breaking' label here regardless.

TODO

  • Update docs to refer to new property
  • Reword the docstring for auth_state_groups_key

Fixes #709

@yuvipanda yuvipanda requested a review from minrk March 29, 2024 23:31
@yuvipanda
Copy link
Collaborator Author

All this functionality would be extremely useful to all the authenticators, so should be in base.

I think an important question now becomes - what is the point of GenericOAuthenticator? Is that just here for backwards compatibility?

I think the answer should be 'yes'. And in the future, anything that relies on OAuth2 functionality should just go to the base OAuth2 provider, and anything specific to any of the providers can go in their own files. And we suggest people who currently use Generic migrate to just using the base OAuth2 provider.

@minrk
Copy link
Member

minrk commented Apr 2, 2024

what is the point of GenericOAuthenticator?

I believe GenericOAuthenticator should eventually go away and be merged into the base class, as long as we can do it smoothly and it doesn't complicate things unnecessarily (I don't think it will). I think the big refactor in #526 was a big step toward making that feasible.

Not having done that yet, going forward in general I think we should avoid new features in Generic in favor of putting them in the base class, at least in most cases.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I think it makes sense to promote this to the base class. I only think we should perhaps be a little more considered in naming config all classes will inherit rather than only on Generic.

@@ -264,6 +276,49 @@ class OAuthenticator(Authenticator):
""",
)

claim_groups_key = Union(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to start a naming conversation when this is just moving existing config up one level, but I think this config name is very unclear, and defining new config on a class is an opportunity to not inherit the confusing name. claim and key are really synonyms, so it's odd to have both at all, let alone one on either side of the field they populate. Could it be just groups_key (generic, Pythonic) or groups_claim (OIDC-specific, jargony, I suspect unclear to most) or userdata_groups_key to more clearly specify that it's the key in userdata where this info will be found?

If we rename, Generic can keep its claims_group_key as a deprecated trait that assigns to self.userdata_groups_key with a warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userdata_groups_key sounds good to me as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base one is called auth_state_groups_key, as the direct analog to userdata_groups_key, since now it gets info out of auth_state instead of userdata. Generic keeps its claims_group_key and issues a warning.

yuvipanda added a commit to yuvipanda/oauthenticator that referenced this pull request Apr 5, 2024
Looking at jupyterhub#735,
I realize it will not actually allow us to use GitHub teams or orgs as
JupyterHub groups, since that's an extra API call. This PR keeps it
simple, and adds config to pick this up from teams list directly.
@yuvipanda
Copy link
Collaborator Author

I was thinking about how we can do a very common thing - sync GitHub team memberships and org memberships to JupyterHub groups. With this PR as is, that's actually not possible - team memberships are not part of userdata! But they are part of the auth_model (because there's a populate_teams_in_auth_model property already).

I think there are two paths forward here for groups to work everywhere:

  1. Lift that into the base Authenticator class, but the key is for things from auth_model, not userdata. This also allows for using things like scope (which can control group membership with Auth0, from another community we are working with). We can name this appropriately, and the keep a deprecated claims_group_key in Generic that notifies it's deprecated but works the way it already does.
  2. gh: Allow setting JupyterHub groups from GitHub teams #739 or similar for each authenticator we want to support good groups.

My preference is to do (1)! There's a clean backwards compat story, and the model of 'put stuff into auth_state, that is available via auth_model, and you can pull stuff out of that for groups' seems clean enough to explain.

@minrk
Copy link
Member

minrk commented Apr 8, 2024

Option 1 sounds like a great choice, and also a great opportunity to not inherit the name while preserving compatibility.

A simplification of jupyterhub#735,
moving 2 of the 3 traitlets. This is a straight up move, without any functional
breaking changes.

- `admin_groups` allows setting members of some groups as admins.
- `allowed_groups` allows setting what groups should be allowed to login.

Both of these are more useful with claim_groups_key, as that allows
an *external* party to drive group memberships. Without that, I guess
primarily this depends on membership within the JupyterHub admin UI.

Splitting this up helps us get this moving faster, as figuring out how
to move `claim_groups_key` is going to be slightly more involved.
auth_model is currently not documented nor exposed to customizable
code (without inheriting from a class and modifying it). So instead
of documenting auth_model and trying to keep that stable, we rely instead
on auth_model being populated.
@yuvipanda yuvipanda marked this pull request as ready for review May 3, 2024 00:07
@yuvipanda
Copy link
Collaborator Author

This went through a few iterations, but is ready for review again! I've updated the PR body with more detailed information as well. Please take another look when you got time, @minrk :)

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request May 3, 2024
- `allowed_groups` functionality is put in basehub, and hence
  available to everyone! Individual authenticators still need to
  figure out how to enable groups, but that's separated out from
  `profile_list` filtering functionality.
- Pending jupyterhub/oauthenticator#735,
  we explicitly also treat GitHub teams from auth_state as 'groups'.
  This allows us to bring all our existing users along, without issue.
- Get rid of the code duplication in earthscope
- Rename all `allowed_teams` to `allowed_groups`.
Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've agreed on the design can you also review the docs?
For example if claim_groups_key is deprecated we should update the example

c.GenericOAuthenticator.claim_groups_key = "groups"

oauthenticator/generic.py Outdated Show resolved Hide resolved
oauthenticator/generic.py Outdated Show resolved Hide resolved
Can be a string key name (use periods for nested keys), or a callable
that accepts the auth state (as a dict) and returns the groups list.

This configures how group membership in the upstream provider is determined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does upstream provider mean JupyterHub?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manics I copied this from the current docstring (

This configures how group membership in the upstream provider is determined
) but I agree it's actually quite confusing. I'll think about it and try to rewrite it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manics I spent some time rewording this, and this led me to the conclusion that we should just not allow usage of these without manage_groups also being True. I've added a more full rationale under 'Backwards compatibility' describing it. I've also rewritten the docstring now! Let me know what you think :)

oauthenticator/oauth2.py Outdated Show resolved Hide resolved
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request May 7, 2024
- `allowed_groups` functionality is put in basehub, and hence
  available to everyone! Individual authenticators still need to
  figure out how to enable groups, but that's separated out from
  `profile_list` filtering functionality.
- Pending jupyterhub/oauthenticator#735,
  we explicitly also treat GitHub teams from auth_state as 'groups'.
  This allows us to bring all our existing users along, without issue.
- Get rid of the code duplication in earthscope
- Rename all `allowed_teams` to `allowed_groups`.
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request May 7, 2024
- `allowed_groups` functionality is put in basehub, and hence
  available to everyone! Individual authenticators still need to
  figure out how to enable groups, but that's separated out from
  `profile_list` filtering functionality.
- Pending jupyterhub/oauthenticator#735,
  we explicitly also treat GitHub teams from auth_state as 'groups'.
  This allows us to bring all our existing users along, without issue.
- Get rid of the code duplication in earthscope
- Rename all `allowed_teams` to `allowed_groups`.
yuvipanda and others added 3 commits May 7, 2024 10:41
Co-authored-by: Simon Li <orpheus+devel@gmail.com>
Co-authored-by: Simon Li <orpheus+devel@gmail.com>
Co-authored-by: Simon Li <orpheus+devel@gmail.com>
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request May 7, 2024
- `allowed_groups` functionality is put in basehub, and hence
  available to everyone! Individual authenticators still need to
  figure out how to enable groups, but that's separated out from
  `profile_list` filtering functionality.
- Pending jupyterhub/oauthenticator#735,
  we explicitly also treat GitHub teams from auth_state as 'groups'.
  This allows us to bring all our existing users along, without issue.
- Get rid of the code duplication in earthscope
- Rename all `allowed_teams` to `allowed_groups`.
@yuvipanda
Copy link
Collaborator Author

After trying to write clear docs for the traitlet, I came to the conclusion that we shouldn't allow usage of these without manage_groups also being True. I've updated a paragraph under Backwards compatibiltiy with my rationale.

With that, this is ready to go!

@yuvipanda
Copy link
Collaborator Author

I found #709, which I think is closed by this PR. And my change with respect to manage_groups reflects:

do we need to deal with additional groups, not specified upstream somehow? I'd say no, at least for now.

Which I totally agree with. Groups should have a 'single source of truth'.

yuvipanda added a commit to yuvipanda/jupyterhub that referenced this pull request May 24, 2024
Similar to 'kubespawner_override' in KubeSpawner, this allows
admins to selectivel override spawner configuration based on
groups a user belongs to. This allows for low maintenance but
extremely powerful customization based on group membership.
This is particularly powerful when combined with
jupyterhub/oauthenticator#735

\#\# Dictionary vs List

Ordering is important here, but still I choose to implement this
configuration as a dictionary of dictionaries vs a list. This is
primarily to allow for easy overriding in z2jh (and similar places),
where Lists are just really hard to override. Ordering is provided
by lexicographically sorting the keys, similar to how we do it in z2jh.

\#\# Merging config

The merging code is literally copied from KubeSpawner, and provides
the exact same behavior. Documentation of how it acts is also copied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistently implement manage_groups
4 participants