Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket #5755

xmudrii · 2023-08-24T22:06:37Z

I've observed that downloads using curl going over cdn.dl.k8s.io (dl.k8s.io) are much slower than direct downloads from the bucket (storage.googleapis.com/kubernetes-release).

For example, downloading kubelet v1.28.1 directly from the bucket yields the following results:

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.28.1/bin/linux/amd64/kubelet

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  105M  100  105M    0     0  23.8M      0  0:00:04  0:00:04 --:--:-- 23.8M

The download took 4 seconds in total. However, downloading via the CDN yields much different results:

curl -LO https://dl.k8s.io/v1.28.1/bin/linux/amd64/kubelet

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    744      0 --:--:-- --:--:-- --:--:--   745
100  105M  100  105M    0     0  1643k      0  0:01:05  0:01:05 --:--:-- 1784k

It took one minute and five seconds to download the same file.

Update: it turns out that cache miss downloads are slow, and cache hit downloads are fast. This can be be determined from x-cache: MISS and x-cache: HIT headers. Once the file is cached on Fastly side, downloads are fast, but prior to that, downloads are insanely slow.

/sig k8s-infra
/priority important-soon
/kind bug
cc @ameukam @BenTheElder

The text was updated successfully, but these errors were encountered:

xmudrii · 2023-08-24T22:40:15Z

Update: it turns out that cache miss downloads are slow, and cache hit downloads are fast. This can be be determined from x-cache: MISS and x-cache: HIT headers. Once the file is cached on Fastly side, downloads are fast, but prior to that, downloads are insanely slow.

xrstf · 2023-09-26T10:42:52Z

It might be related, but the CDN is not just slow, it's inconsistent. 1.29-alpha.1 was released yesterday, but depending from where you perform a curl -L https://dl.k8s.io/release/latest-1.29.txt, you receive either alpha.0 or alpha.1

This will even change on the same computer if you just re-run the same curl command a few seconds later. Not sure if individual CDN servers "downgrade" their data or if I'm just hitting tons of random CDN nodes that all have an inconsistent state, but it's weird and sadly unreliable :/

These two request happened basically at the same time:

< HTTP/2 200 
< x-guploader-uploadid: ADPycdutDBgx7kyHbX7GUaTmNyxVRNVE82erWSx3_jmUaV5c01OeI7dkYmcu9pfg9gj5BTsgpYgYhWRUMYxkNtP4PVKi26f6HtKM
< expires: Sun, 24 Sep 2023 12:42:09 GMT
< last-modified: Wed, 26 Jul 2023 09:06:19 GMT
< etag: "9b59bd47d18f2395481cf230a43a56e0"
< content-type: text/plain
< cache-control: private, no-store
< accept-ranges: bytes
< date: Tue, 26 Sep 2023 10:40:55 GMT
< via: 1.1 varnish
< age: 165525
< x-served-by: cache-fra-etou8220117-FRA
< x-cache: HIT
< x-cache-hits: 1
< access-control-allow-origin: *
< content-length: 15
< 
* Connection #1 to host cdn.dl.k8s.io left intact
v1.29.0-alpha.0

and

< HTTP/2 200
< x-guploader-uploadid: ADPycds7gWeT690zb-SSaamOrnGHAi6AgaV_K0SWCSe5XMLoJ1zFIE0NiJNe0v8Nr0STrfLXh5GwEv5JBgB6RhU6cqOdVHcHyJIy
< expires: Tue, 26 Sep 2023 07:08:47 GMT
< last-modified: Mon, 25 Sep 2023 20:56:50 GMT
< etag: "7d852bf327f00c76b50173de7dbaebf6"
< content-type: text/plain
< cache-control: private, no-store
< accept-ranges: bytes
< date: Tue, 26 Sep 2023 10:40:50 GMT
< via: 1.1 varnish
< age: 12723
< x-served-by: cache-muc13944-MUC
< x-cache: HIT
< x-cache-hits: 1
< access-control-allow-origin: *
< content-length: 15
<
* Connection #1 to host cdn.dl.k8s.io left intact
v1.29.0-alpha.1

Both claim a cache hit, but return different results.

xmudrii · 2023-09-26T10:58:19Z

This can lead to serious issues. It looks like you're getting served from FRA and MUC, and these nodes might indeed have different cache. I think we should ignore version markers from cache, these can get changed often, especially latest ones.

ameukam · 2023-09-26T11:03:14Z

Yeah. We are not specific about file extensions for the cache configuration.

I'll open a PR to fix it this week. Another option could be to directly serve those version makers through the nginx instance instance of the CDN provider.

ameukam · 2023-09-26T19:58:13Z

@xrstf can you open an new issue with what you described ?
To better track what's happening. Thanks!

xrstf · 2023-09-26T20:34:09Z

Can do, done => #5900.

ameukam · 2023-09-27T19:49:27Z

We increased the TTL for the different objects in #5871. Hopefully the situation should be better.

The current CDN is a "pull-through" cache so a MISS is expected for any object at the POP close the client for the first request. Our real issue the number of the objects that need to be cached at edge. We have a lot of objects (in this case binaries) rarely pulled. I don't think there is an efficient mechanism to warm all the POP of the CDN provider for all the objects we currently host but I open to any suggestions.

Note that our cache is currently over 99% now. I don't think we can do more that.

BenTheElder · 2023-09-27T19:50:39Z

IIRC a mid-level cache was mentioned talking to fastly previously?

ameukam · 2023-09-27T21:07:10Z

IIRC a mid-level cache was mentioned talking to fastly previously?

maybe you're talking about Origin Shield ? If that the case, the feature is mostly efficient with regional buckets which is not the case for gs://kubernetes-release. I'll ask about the exact requirements for this feature.

k8s-triage-robot · 2024-01-29T07:08:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam · 2024-01-29T08:32:51Z

@xmudrii is the problem still happening ?

xmudrii · 2024-01-29T09:33:09Z

@ameukam I'll check and get back to you

xmudrii · 2024-02-12T15:16:43Z

@ameukam This is still the issue for non-cached artifacts downloaded over dl.k8s.io, see the screenshot:

xmudrii · 2024-02-12T15:16:57Z

/remove-lifecycle stale

ameukam · 2024-02-12T15:28:01Z

Non-cached artifacts going through Fasly will always be slow for the first request on the POP close the requester. Fastly don't replicate all the objects over it's entire network. Objects are cached based on the requests. If the object is not present at Fastly Edge, it will always be slower than the origin.

xmudrii · 2024-02-12T15:31:46Z

@ameukam Is there anything that we can do to make it at least a little faster? The difference is huge, it takes 5 seconds when downloading directly from the bucket, but about 1 minutes and 30 seconds when downloading from the CDN. Subsequent requests might be slow as well because there's a chance to get you redirected to some other edge location.

ameukam · 2024-02-12T15:40:19Z

One possibility could be Fastly Origin Shield but we need to switch to the origin to a regional bucket.

xmudrii · 2024-02-12T15:51:25Z

Even cached requests are much slower for me. Something that takes 3-5 seconds when downloaded from the bucket directly takes 30-40 seconds when downloaded via CDN. I double-checked with @xrstf and he sees okay speeds on 2nd and 3rd try (the 1st try is also slow for him), but that's not the case for me.

k8s-triage-robot · 2024-05-12T16:17:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

xmudrii · 2024-05-20T12:21:07Z

I think this has been mostly fixed, I didn't observe it for a while, closing the issue for now
/close

k8s-ci-robot · 2024-05-20T12:21:12Z

@xmudrii: Closing this issue.

In response to this:

I think this has been mostly fixed, I didn't observe it for a while, closing the issue for now
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/bug Categorizes issue or PR as related to a bug. labels Aug 24, 2023

xmudrii mentioned this issue Aug 24, 2023

cdn.dl.k8s.io doesn't seem to be cutting bandwidth to the backing GCS bucket #5726

Closed

xmudrii changed the title ~~Downloads using curl going over cdn.dl.k8s.io are much slower than direct downloads from the bucket~~ Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket Aug 25, 2023

xrstf mentioned this issue Sep 26, 2023

Inconsistent release information on dl.k8s.io #5900

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 29, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 12, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2024

k8s-ci-robot closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket #5755

Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket #5755

xmudrii commented Aug 24, 2023 •

edited

xmudrii commented Aug 24, 2023

xrstf commented Sep 26, 2023

xmudrii commented Sep 26, 2023

ameukam commented Sep 26, 2023

ameukam commented Sep 26, 2023

xrstf commented Sep 26, 2023

ameukam commented Sep 27, 2023

BenTheElder commented Sep 27, 2023

ameukam commented Sep 27, 2023

k8s-triage-robot commented Jan 29, 2024

ameukam commented Jan 29, 2024

xmudrii commented Jan 29, 2024

xmudrii commented Feb 12, 2024

xmudrii commented Feb 12, 2024

ameukam commented Feb 12, 2024

xmudrii commented Feb 12, 2024

ameukam commented Feb 12, 2024

xmudrii commented Feb 12, 2024

k8s-triage-robot commented May 12, 2024

xmudrii commented May 20, 2024

k8s-ci-robot commented May 20, 2024

Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket #5755

Downloads going over cdn.dl.k8s.io are much slower than direct downloads from the bucket #5755

Comments

xmudrii commented Aug 24, 2023 • edited

xmudrii commented Aug 24, 2023

xrstf commented Sep 26, 2023

xmudrii commented Sep 26, 2023

ameukam commented Sep 26, 2023

ameukam commented Sep 26, 2023

xrstf commented Sep 26, 2023

ameukam commented Sep 27, 2023

BenTheElder commented Sep 27, 2023

ameukam commented Sep 27, 2023

k8s-triage-robot commented Jan 29, 2024

ameukam commented Jan 29, 2024

xmudrii commented Jan 29, 2024

xmudrii commented Feb 12, 2024

xmudrii commented Feb 12, 2024

ameukam commented Feb 12, 2024

xmudrii commented Feb 12, 2024

ameukam commented Feb 12, 2024

xmudrii commented Feb 12, 2024

k8s-triage-robot commented May 12, 2024

xmudrii commented May 20, 2024

k8s-ci-robot commented May 20, 2024

xmudrii commented Aug 24, 2023 •

edited