Increase resources for relay-stdout container #38064
Unanswered
trevordjones
asked this question in
Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
First off, thanks to all for your work on Airbyte. It's a great tool and I've enjoyed working with it!
I have a question around increasing resources for the
relay-stdout
container. I'll give a little background and make my question more concrete.We use Kubernetes to host and manage our Airbyte setup. Our connection has a Postgres source and a Snowflake destination. I noticed that even syncing a small amount of data (7GB) was taking ~35-40 minutes. I read through Scaling Airbyte and adjusted the recommended environment variables to increase CPU and memory (
JOB_MAIN_CONTAINER_xxx
). It sped up to take about half the time, but increasing resources after that did not result in any change in speed. With this speed up in time, it's still only processing ~7Mb/s. And each individual record that I'm processing is small (32-56B). So I would anticipate this to be much faster.I dug into the architecture of Airbyte and found Scaling Data Pipelines in Kubernetes (which was incredibly helpful) and learned about the sidecars that Airbyte uses to utilize socat.
My understanding from this is that a new pod is launched and has a "main" container. This container queries the source for data and processes it (maybe converts to json?) and pipes to the relay-stdout container. This is what we were able to speed up by increasing resources for main.
The relay-stdout container where socat is installed. This is able to pipe data to a port that other pods know about and can read. I believe this is where our new bottleneck is. No matter how fast main is able to process data, it won't get to the destination pod as fast because the resources available to relay-stdout are minimal (500m CPU and 25Mi).
Does this track? Or am I incorrect? It's possible the resources in relay-stdout do not matter and only those available in main matter. If so, we'll have to get bigger instances so we can increase resources more and see if that will make a difference.
If I am correct and we need to increase resources in relay-stdout, how do we do that? I see in this application.yml file that 500m and 25Mi seem to be hard-coded and are thus not configurable with environment variables.
Beta Was this translation helpful? Give feedback.
All reactions