Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

porch: kpt alpha rpkg get fails when a couple hundred branches #599

Open
liamfallon opened this issue Apr 8, 2024 · 1 comment
Open
Labels
area/platform area/porch Porch related issues bug Something isn't working triaged
Milestone

Comments

@liamfallon
Copy link
Member

Original issue URL: kptdev/kpt#3882
Original issue user: https://github.com/johnbelamaric
Original issue created at: 2023-03-14T16:58:44Z
Original issue last updated at: 2023-03-16T21:10:45Z
Original issue body: ### Expected behavior
Valid list of package revisions is returned.

Actual behavior

jbelamaric@jbelamaric:~/proj/tmp/cachingdns-topology$ kpt alpha rpkg get
Error: Get "https://35.192.14.90/apis/porch.kpt.dev/v1alpha1/namespaces/default/packagerevisions": stream error: stream ID 1; INTERNAL_ERROR; received from peer 
jbelamaric@jbelamaric:~/proj/tmp/cachingdns-topology$ k get packagerevisions
Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR; received from peer
jbelamaric@jbelamaric:~/proj/tmp/cachingdns-topology$ k get po -n porch-system
NAME                                 READY   STATUS    RESTARTS       AGE
function-runner-77946d6686-jv8kk     1/1     Running   0              5d18h
function-runner-77946d6686-rn57r     1/1     Running   0              5d18h
porch-controllers-5d67bb9fdf-4fs4l   1/1     Running   0              22h
porch-server-78dd559589-qmrvl        1/1     Running   17 (23h ago)   5d

Information

Due to #3877 there are a couple hundred branches after running overnight (see image below).

Porch v0.0.15
kpt v1.0.0-beta.23

image

Steps to reproduce the behavior

Original issue comments:
Comment user: https://github.com/johnbelamaric
Comment created at: 2023-03-14T16:59:36Z
Comment last updated at: 2023-03-14T16:59:36Z
Comment body: porch-server.log

Comment user: https://github.com/johnbelamaric
Comment created at: 2023-03-14T16:59:55Z
Comment last updated at: 2023-03-14T16:59:55Z
Comment body: I didn't see any obvious crashes in the porch server logs.

Comment user: https://github.com/johnbelamaric
Comment created at: 2023-03-14T18:18:29Z
Comment last updated at: 2023-03-14T18:18:29Z
Comment body: FYI, I manually deleted all those 200+ branches and now it's working again.

Comment user: https://github.com/natasha41575
Comment created at: 2023-03-16T16:56:28Z
Comment last updated at: 2023-03-16T16:57:28Z
Comment body: Hmm, not able to reproduce this one either. I thought maybe your packages might be too large but they all seem reasonably small. I tried to reproduce with https://github.com/natasha41575/blueprints (which has 333 branches atm) and it does take a second or two, but kpt alpha rpkg get still works with porch both running in kind and locally.

Might this be similar to kptdev/kpt#3877 (comment), that porch may have entered a strange error state near the beginning? Would you be able to recreate the 200 branches and see if the issue is still there?

If you need a quick way to create the branches, I created my 200 branches by setting in my PV deletionPolicy: orphan and running for i in {1..200}; do kubectl delete -f packagevariant.yaml; sleep 0.5; kubectl apply -f packagevariant.yaml; sleep 0.5; done.

Comment user: https://github.com/johnbelamaric
Comment created at: 2023-03-16T17:18:21Z
Comment last updated at: 2023-03-16T17:18:21Z
Comment body: I wonder if it has to do with running on an autopilot cluster with guaranteed pods (not burstable):

        name: porch-server
        resources:
          limits:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 512Mi
          requests:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 512Mi

Comment user: https://github.com/natasha41575
Comment created at: 2023-03-16T17:47:29Z
Comment last updated at: 2023-03-16T17:48:48Z
Comment body: Could you share the memory utilization of your pods to see if it is going over the limits? I spun up an autopilot cluster with the same limits to try it out but again did not hit the same issue.

Comment user: https://github.com/natasha41575
Comment created at: 2023-03-16T21:10:45Z
Comment last updated at: 2023-03-16T21:10:45Z
Comment body: I said this on the other issue too, but I'm going to try to reproduce your setup with the script you sent me so I can investigate more productively.

@tliron tliron transferred this issue from nephio-project/porch-issue-transfer Apr 23, 2024
@liamfallon
Copy link
Member Author

Triaged
Triage Comment: Reproduce this, see how serious it is, part of scaling/stability work

@liamfallon liamfallon added this to the R3 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform area/porch Porch related issues bug Something isn't working triaged
Projects
Status: No status
Development

No branches or pull requests

2 participants