Flow: bigger priority or fifo for jobs after going back from waiting-children to waiting when using the WaitingChildrenError to generate children #1899
Labels
enhancement
New feature or request
Is your feature request related to a problem? Please describe.
We have currently refactored our workflow to initialize a customer to use the flow as described here:
https://docs.bullmq.io/patterns/process-step-jobs#waiting-children
We first run a job, which adds the required children and then throws a WaitingChildrenException. This jobs all have children too and use the same mechanic too. (There are usally 3 steps, create customer, fetch all data for a specific data type, do a job for each fetched data and add it to the database)
This all works like a charm when only initializing a small bunch of customers at the same time, however when doing something like a re-initialize of all customers (which we sometimes require to do) this comes up to about 80 million jobs ran in the end.
Due to the fact, that the parent gets added at the end of the queue with the same priority as the parents who still have to create its children, this causes to swell the required redis memory over time and at some point redis (we use AWS elasticache) runs out of memory and everything starts failing.
Describe the solution you'd like
The best way would be a possibility for flow parents to optionally change their priority when they move from waiting-children back to waiting. Also putting them as fifo when they get into waiting queue a second time.
Describe alternatives you've considered
As a current workaround we had to limit the amount of customers initializing at the same time. This works as a workaround, however we always lose some time at the end of the number x customers initializing due to the fact that our cron checking if update/initializing is required only runs once every 5 minutes due to API limits.
Additional context
Using fifo directly is not an option. We run jobs with different priorities to make sure all child-data are processed before the next parent is touched, to prevent a memory overflow in redis.
PS: if required i can try to find some time implementing this and creating a PR, however want the OK that this logic would be ok for the bullmq project owners
The text was updated successfully, but these errors were encountered: