Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.

exposing a scale resource and way to scale jobs #425

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Padarn
Copy link

@Padarn Padarn commented Feb 28, 2021

Replaces #394 and addresses #389

This PR is to break down the complexity of adding autoscaling: It only attempts to add a scaling mechanism that will work for standalone Jobs. This is because the operator doesn't otherwise know what jobs are running on the cluster, or how to scale them in/out so it doesn't make much sense to try and auto scale them.

The PR adds:

  • a new scale and status sub resource so that an HPA can be attached (not done in this PR)
  • a new parameter inside the job spec that allows the user to specify how the parallelism should scale
  • calculating parallelism from this value when job is submitted

Note some limitations:

To Do

  • documentation
  • unit tests

@Padarn
Copy link
Author

Padarn commented Feb 28, 2021

@functicons I will work on the docs for this approach, but I think these are limitations we have to live with for now. Although it would be nice if it were possible to set max parallelism in the properties file.. I'll ask the flink mailing list about this.

@functicons functicons self-requested a review March 1, 2021 05:33
@functicons
Copy link
Collaborator

/gcbrun

api/v1beta1/flinkcluster_validate.go Outdated Show resolved Hide resolved
api/v1beta1/flinkcluster_validate.go Outdated Show resolved Hide resolved
api/v1beta1/flinkcluster_types.go Show resolved Hide resolved
@Padarn
Copy link
Author

Padarn commented Mar 14, 2021

Hey @functicons I've added some unit tests (and fixes while I wrote them), and some documentation. Let me know what you think.

@Padarn
Copy link
Author

Padarn commented Mar 22, 2021

Any comments on the documentation so far? Too little, too much?

@vvondruska
Copy link

Hi. Thanks a lot for this pull request. We really like the scaling functionality and we are currently testing at with a plan to use it for our work.
While testing the scaling feature we encountered a case where the operator crashed because of a nil pointer dereference - specifically here: controllers/flinkcluster_updater.go:392.
It occurs when the operator tries to get a selector for task manager(s) of a terminating job. At that point the observed stateful set is nil and therefore its properties are no longer available.
Is this crash considered a legitimate result of providing an incorrect or incomplete configuration or should it be fixed?
Possible fixes include providing an empty label or parsing it from the recorded component.
We chose to fix it by parsing the selector from the recorded component in case it's available. I can push the fix if you'd like.

@Padarn
Copy link
Author

Padarn commented Apr 13, 2021

Hey @vvondruska thanks a lot for the feedback! Happy to have your fix, you could either push it or create a small snippet and I'll incorporate and push it. Either way works.

@vvondruska
Copy link

Thank you for a quick response @Padarn. I can push the fix. I was wondering if it would be OK with you if I pushed the code directly to the repository with the scaling functionality (Padarn:expose-scale). I would need a temporary write privilege to do that. Or would you prefer a different way to apply the fix?

@Padarn
Copy link
Author

Padarn commented Apr 14, 2021

Makes sense to me. I sent you an invite to the repo so you can push.

@google-cla
Copy link

google-cla bot commented Apr 15, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Apr 15, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Apr 15, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@vvondruska

This comment has been minimized.

1 similar comment
@Padarn
Copy link
Author

Padarn commented Apr 16, 2021

@googlebot I consent.

@mvbakker
Copy link

mvbakker commented May 5, 2021

@Padarn @functicons is there anything we (@vvondruska and I) can do to help to get this merged in?

@Padarn
Copy link
Author

Padarn commented May 9, 2021

I'm happy to @mvbakker but cannot approve anything. Also happy to make any further changes if there is anything blocking.

@Padarn
Copy link
Author

Padarn commented Jul 1, 2021

Hi @functicons any change we can get this merged?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants