Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DH-3] try using a test fork of the chp #5501

Merged
merged 9 commits into from Feb 27, 2024

Conversation

shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 2, 2024

@consideRatio pointed me to this... there were a couple of memory leaks in node-http-proxy that haven't been rolled in to a release yet, so let's test this out and see if it helps.

jupyterhub/configurable-http-proxy#434 (comment)
http-party/node-http-proxy#1559
Jimbly/http-proxy-node16@56283e3
https://github.com/consideRatio/configurable-http-proxy/commits/main/

Copy link
Collaborator

@ryanlovett ryanlovett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like valid chart yaml to me, based on https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/jupyterhub/templates/proxy/deployment.yaml and https://z2jh.jupyter.org/en/stable/resources/reference.html#proxy-chp-extracommandlineflags.

At first I thought the timeouts were 1000 days, but I see the units are milliseconds.

hub/values.yaml Outdated Show resolved Hide resolved
hub/values.yaml Outdated Show resolved Hide resolved
@shaneknapp
Copy link
Contributor Author

shaneknapp commented Feb 3, 2024

hmm, interesting. i did a manual deployment to test this out on a staging hub and got the following error in the chp pod: error: unknown option '--proxyTimeout=86400000'

timeout worked just fine... i'll comment out proxyTimeout and re-test my deployment.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Feb 3, 2024

ok, commenting out proxyTimeout let me deploy, and timeout is showing up in the pod config:

Containers:
  chp:
    Container ID:  containerd://2400011a0b0b7fca551735393d81b8b6b1efc60b2080e3061ce669c526844098
    Image:         quay.io/jupyterhub/configurable-http-proxy:4.6.1-fork
    Image ID:      quay.io/jupyterhub/configurable-http-proxy@sha256:55345bfa2f47f4d0c995e25d8cd14a09ce20b0aaeed65df648ec818441629c87
    Ports:         8000/TCP, 8001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      configurable-http-proxy
      --ip=
      --api-ip=
      --api-port=8001
      --default-target=http://hub:$(HUB_SERVICE_PORT)
      --error-target=http://hub:$(HUB_SERVICE_PORT)/hub/error
      --port=8000
      --timeout=86400000

@shaneknapp
Copy link
Contributor Author

@shaneknapp
Copy link
Contributor Author

there we go!

Containers:
  chp:
    Container ID:  containerd://99cab96888c3d6609d224e6b78bc9936156a6f98a12d08105f1d7f6396d224fb
    Image:         quay.io/jupyterhub/configurable-http-proxy:4.6.1-fork
    Image ID:      quay.io/jupyterhub/configurable-http-proxy@sha256:55345bfa2f47f4d0c995e25d8cd14a09ce20b0aaeed65df648ec818441629c87
    Ports:         8000/TCP, 8001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      configurable-http-proxy
      --ip=
      --api-ip=
      --api-port=8001
      --default-target=http://hub:$(HUB_SERVICE_PORT)
      --error-target=http://hub:$(HUB_SERVICE_PORT)/hub/error
      --port=8000
      --timeout=86400000
      --proxy-timeout=86400000

@shaneknapp
Copy link
Contributor Author

@consideRatio fyi, this looks quite promising. i've been running the fork without timeouts on our two biggest/most active hubs since friday last week and have only had one chp oomkill between then and now.

here's the memory usage over the past 24 hours on one:
image

i'm keeping a close eye on these two hubs for the rest of the week and will roll out to our remaining hubs on monday.

since our core node has plenty of available ram, i'm also bumping the chp's memory limit from 1Gi to 1.5Gi. this should give is a little bit more headroom and minimize (hopefully) and additional oomkills.

@shaneknapp shaneknapp merged commit 05bfa15 into berkeley-dsep-infra:staging Feb 27, 2024
21 checks passed
@shaneknapp shaneknapp deleted the potential-chp-fix branch February 27, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants