Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Memory ballooning with breadcrumbs #1276

Open
mjholtkamp opened this issue Jul 26, 2018 · 3 comments
Open

Memory ballooning with breadcrumbs #1276

mjholtkamp opened this issue Jul 26, 2018 · 3 comments

Comments

@mjholtkamp
Copy link

Possibly related to #929, but more specific and we think there is an easy fix, but we would like some feedback if we're on the right track.

The symptoms
Two weeks ago we noticed problems with a project that uses raven-python to send exceptions to sentry. We use aiohttp in combination with the raven-aiohttp package and the QueuedAioHttpTransport transport.

The issue occured when we had problems with a component, causing lots of exceptions to be logged (and sent to sentry). This sadly caused sentry to slow down.

There are already two issues here that we should solve by ourselves: lots of exceptions when a component fails and not enough capacity for sentry. Regardless, we believe that raven-python increased the problem and we think it can be solved.

Investigation
After investigation it turned out that timeouts to sentry and a full queue (messages to sentry) caused extra exceptions to be logged. These exceptions are excluded from sending to sentry, so they cause no problem there.

However, they do end up in breadcrumbs. Breadcrumbs are limited in several ways:

  1. The amount of breadcrumbs has an upper bound (default: 100)
  2. The size of the messages has an upper bound (default: 1K)

So far so good.

The problem
Breadcrumbs are not limited in their data though (there is a TODO in the code regarding this:

# TODO(dcramer): we should trim data
).

We believe that this can cause memory ballooning (with an upper bound, because of the limits described above).

The process can recover from this, as long as it can eventually deliver the messages to sentry.

A workaround
We worked around this problem by excluding certain loggers from the breadcrumbs (this is better than turning off breadcrumbs:

from raven import breadcrumbs

POTENTIAL_BALLOONING_LOGGERS = ('raven.base.Client', 'sentry.errors.uncaught', 'sentry.errors')

for logger in POTENTIAL_BALLOONING_LOGGERS:
    breadcrumbs.ignore_logger(logger)

Solution
We think the solution would be to:

  1. put an upper bound on the data, or
  2. exclude data from showing up in breadcrumbs for exceptions that happen during client.captureException()
  3. both.

Any thoughts?

@mjholtkamp
Copy link
Author

Small example to reproduce the problem (this is still difficult to reproduce, it depends on the speed of the machine, the network and the speed of the sentry server):

from unittest import mock

import aiohttp.resolver

import asyncio
from functools import partial

from raven import Client
from raven import breadcrumbs
from raven_aiohttp import QueuedAioHttpTransport


SENTRY_DSN = ''

# logger names should be exact, ignoring parents does not ignore childs
# ignore_loggers = ('raven.base.Client', 'sentry.errors.uncaught', 'sentry.errors')
# for logger in ignore_loggers:
#     breadcrumbs.ignore_logger(logger)


with mock.patch('aiohttp.resolver.DefaultResolver', aiohttp.resolver.AsyncResolver):
    client = Client(dsn=SENTRY_DSN, transport=partial(QueuedAioHttpTransport, workers=2, qsize=10, verify_ssl=False))

    class KeepMemory:
        def __init__(self, a, b, c):
            self.a = a
            self.b = b
            self.c = c


    async def produce():
        for i in range(1, 10000):
            try:
                big = 1
                a = 'a' * big
                b = 'b' * big + a
                c = 'c' * big + b
                raise Exception(KeepMemory(a, b, c))
            except Exception as e:
                client.captureException()
            await asyncio.sleep(0.001)
            if i % 100 == 0:
                print(i)


    loop = asyncio.get_event_loop()
    loop.run_until_complete(produce())
    loop.run_until_complete(client.remote.get_transport().close())


print('waiting 10 seconds to show memory')
import time
time.sleep(10)

@mjholtkamp
Copy link
Author

I wonder if someone already had the chance to look at this? :)

@mjholtkamp
Copy link
Author

Is this the right place to mention/discuss bugs? Because if I have to mention it somewhere else, I would like to know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant