Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Default CRON causes extremely high memory usage until Out of Memory exception #5312

Open
rboylesDev opened this issue Apr 30, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@rboylesDev
Copy link

Description

We are using Elsa 3 with the default scheduling. Whenever we have a Timer or CRON triggered workflow, we have noticed jumps in memory usage when the workflow is triggered and then this memory is never garbage collected. We are running Elsa on Azure Container Apps with 1 GB of RAM. With a CRON that runs every 15 minutes, it runs for about two hours until the container app crashes with an out of memory exception and then is automatically restarted. We have experienced this memory leak as well locally, but with significantly more RAM, it is not causing issues unless it runs for days.

Steps to Reproduce

Create even a simple workflow with a Timer or CRON expression, even one that just writes to console. Allow it to run and trigger and observe memory allocations.

Expected Behavior

While a memory jump isn't unexpected, having GC run and release memory is expected.

Actual Behavior

Memory appears to not be de-allocated.

Screenshots

Here is a sample of memory usage from our container app showing the raising memory until the app crashes and restarts.
image

Environment

  • Elsa Package Version: Specify the version. If you've cloned the repo, mention if you are using the latest source code from the main branch.
  • Operating System: Include your OS version (e.g., Windows 10, macOS Big Sur, Ubuntu 18.04).
  • Browser and Version: If applicable, include your browser and version (e.g., Chrome 92).

Troubleshooting Attempts

Initially we thought it might be the amount of work that the workflow is trying to do. But even a simple workflow of writing a console message or dispatching another workflow will still continue to allocate memory.

@rboylesDev rboylesDev added the bug Something isn't working label Apr 30, 2024
Copy link

glime-ai bot commented Apr 30, 2024

Based on the context provided, it seems that you are experiencing a memory leak issue with Elsa 3 when using Timer or CRON triggered workflows. This is a known issue that has been reported by other users as well, as seen in the GitHub issue #4884. The issue describes a memory leak related to CodeActivity which may be similar to what you are experiencing with Timer or CRON activities.

To address this issue, you can try the following steps:

  1. Update Elsa to the latest version: Ensure that you are using the latest version of Elsa, as the issue might have been addressed in a newer release.

  2. Monitor Garbage Collection (GC): Use tools like dotMemory to monitor memory allocations and GC. This can help you identify if the memory is indeed not being collected.

  3. Implement IDisposable: If you have custom activities, ensure that they implement IDisposable and properly dispose of any unmanaged resources.

  4. Check for Subscriptions: Ensure that any event handlers or message subscriptions are being properly unsubscribed or disposed of to prevent memory leaks.

  5. Review Configuration: Review your Elsa configuration to ensure that you are not unintentionally retaining workflow instances in memory. For example, check if you are using a persistent workflow store and if the configuration of the store is correct.

  6. Report the Issue: If none of the above steps help, and you are using the latest version of Elsa, consider reporting the issue on the Elsa GitHub repository. Provide detailed information about your environment, the version of Elsa you are using, and the steps to reproduce the issue.

  7. Seek Community Help: You can also seek help from the Elsa community on platforms like Stack Overflow or the Elsa Slack channel. Other users or the developers might have encountered similar issues and could have solutions.

  8. Workaround: As a temporary workaround, you might consider implementing a mechanism to periodically restart your service to clear the memory until a permanent fix is available.

Please note that without specific details about your environment and the version of Elsa you are using, it's challenging to provide a precise solution. If you can provide more details, I may be able to offer more targeted advice.

This comment was generated by Glime.

@rboylesDev
Copy link
Author

Oh, I forgot to mention, we are using version 3.1.2 of Elsa. So I do not believe it is the same issue as the Code Activity memory leak.

@sfmskywalker sfmskywalker self-assigned this May 6, 2024
@sfmskywalker
Copy link
Member

Confirmed:

image

@rboylesDev thanks for reporting!

@sfmskywalker sfmskywalker added this to the Elsa 3.2 milestone May 6, 2024
@sfmskywalker
Copy link
Member

Actually, when I let it running, it seems to purge memory:

image

So now I am not really sure there is a memory leak 🤔

@sfmskywalker
Copy link
Member

Unless the "23,4 unreachable" is a hint.

@sfmskywalker
Copy link
Member

@rboylesDev Just to make sure: is the application using non-memory stores for all of the modules:

  • Workflow Management
  • Workflow Runtime

Unless you explicitly configure the persistence provider for these two, they will use the Memory provider by default - which will most certainly lead to an increase of memory consumption as new workflow instances and execution records are stored in an in-memory dictionary.

@rboylesDev
Copy link
Author

We are using Elsa with EF Core and SQL Server. I believe these are configured correctly as it is a very simple code setup.

image

@cristinamudura cristinamudura removed this from the Elsa 3.2 milestone May 16, 2024
@rboylesDev
Copy link
Author

Minor update on our end. We decided to look at Quartz scheduler instead of the built-in scheduler. This had the same result of allocating ~200MB per scheduled workflow run and never seeming to release it. What is interesting is taking the same workflow and manually running it does not see the same jump in allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

3 participants