Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Add Offline Transport wrapper #6884

Merged
merged 10 commits into from
Jan 31, 2023

Conversation

timfish
Copy link
Collaborator

@timfish timfish commented Jan 19, 2023

Closes #6403
Closes #3046
Also ref #3046

This PR adds a method that wraps existing transports and uses a supplied store implementation to queue envelopes when they fail to send.

  • It takes hints and customer feedback from the Electron offline transport
  • This PR does not include platform specific storage/persist implementations

It's worth noting that this wrapper only queues after failures. A more fail-safe approach would be to queue the envelope and only remove it when it has been sent. I think a best-effort approach is better than hitting the store on every envelope send.

Currently it queues on:

  • Exception from underlying transport
  • Hit rate limits

Is this correct? Should we also queue on any specific status codes?

If an event is queued or dropped, an empty response is returned to the caller (ie. {}) since it doesn't make sense to return a status code here.

How would this be used to create offline transports?

For example, in the browser:

function indexedDbOfflineStore(maxQueueSize: number) {
  return {
    insert: (env) => { /* cool implementation here */ },
    pop: () => { /* cool implementation here */ },
  }
}

const makeOfflineFetchTransport = makeOfflineTransport(makeFetchTransport, indexedDbOfflineStore);

Other Notes

  • On sending failure, should we be offline queueing Replay envelopes? They could fill the queue rather quickly.
  • The Electron offline transport also has a maxAge config option that drops envelopes over a certain age. Is that a needed?
  • There's a flushAtStartup option which flushes 5 seconds after initialisation but is disabled by default. Should this be default enabled?

@timfish timfish marked this pull request as ready for review January 19, 2023 23:16
@mydea
Copy link
Member

mydea commented Jan 20, 2023

Since we've spent quite some time recently with the replay team to think about how to handle failed replay event sends, it's exciting to see this work!

One thing right away: The solution we ended up agreeing on in replay regarding rate limits was that it's not safe to queue them up. Queuing up rate limited events is potentially tricky, so I'd say we should not cache those, but discard them.

Basically, the strategy we ended up on with replay - and I believe we can/should use the same one here - is:

  • When we hit a rate_limit, do not retry it
  • When we hit an API error (e.g. 400 or 500 code), do not retry it - same reason as above, we may hammer the already down API
  • When we hit a network_error, retry --> good to cache
  • When we hit other exceptions from the transport, retry --> good to cache

So in this case here, I'd really only try-catch send, and only handle it when send throws (not for any other API error, rate limit, etc).

For reference, you can check this here: https://github.com/getsentry/sentry-javascript/blob/master/packages/replay/src/util/sendReplay.ts#L25

What we do in replay, is retry a single request up to three times (with increasing backoff), then discard it. The main difference in replay is that we depend on event order, and cannot afford to miss a single event, which is different here.
Still, I think it may be safer to also limit # of retries here as well. I think it's better to define an actual limit here, and just discard the whole queue after e.g. 3 or 5 (or whatever) failed tries of sending it.

Also, we may want to debounce flushQueue here, what do you think? We have a debounce implementation in replay which we could move to utils, I think.

@timfish
Copy link
Collaborator Author

timfish commented Jan 20, 2023

When we hit a rate_limit, do not retry it

only handle it when send throws

Yep, I tend to agree. I can't remember exactly why we settled on retying rate limed requests in Electron.

The main difference in replay is that we depend on event order, and cannot afford to miss a single event

I sounds like for Replay envelopes, we need to transparently pass through because you rely on the responses.

Still, I think it may be safer to also limit # of retries here as well. I think it's better to define an actual limit here, and just discard the whole queue after e.g. 3 or 5 (or whatever) failed tries of sending it.

One major issue is that navigator.online is next to useless so there's no reliable way to detect that the device is online other than trying to send a request.

One of the things we need to cater for are "offline first" progressive web apps. How much customers care about events that occur while offline will vary wildly, so configuration is key. We have a maxAgeDays setting with the Electron transport which might be worth adding.

Also, we may want to debounce flushQueue here, what do you think?

Ah yes, good point!

@mydea
Copy link
Member

mydea commented Jan 20, 2023

Yeah, I see. Maybe we can add a maxRetries config or so (may default to e.g. 5, or set it to 0 for unlimited = current behavior)?

Regarding the response, actually for the behavior I proposed (=do not cache/retry when we hit either a rate limit or an API error) we don't necessarily need it, we only need to try catch. In replay we do use the response (thanks to your great work updating the send API!), as there we handle rate limits specially (we pause the replay and restart it once the rate limit duration is over).

@AbhiPrasad
Copy link
Member

We should never set a retry on rate limits - we could potenially DDOS Sentry if we do.

@AbhiPrasad
Copy link
Member

Reminder that we also have getsentry/develop#476 and https://develop.sentry.dev/sdk/features/#buffer-to-disk - we should make sure we update the develop docs with our learnings here as well.

Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on not retrying for rate limits. Also, IMO we shouldn't retry forever. So if we provide a user-configurable maxRetries option, we should still set an upper boundary.

packages/core/src/transports/offline.ts Outdated Show resolved Hide resolved
@timfish
Copy link
Collaborator Author

timfish commented Jan 20, 2023

provide a user-configurable maxRetries option,

My concern with maxRetries is that we should almost certainly be backing off non-linearly and that makes it difficult to reason over what maxRetries: 10 means in time.

@Lms24
Copy link
Member

Lms24 commented Jan 20, 2023

My concern with maxRetries is that we should almost certainly be backing off non-linearly and that makes it difficult to reason over what maxRetries: 10 means in time.

Good point, then let's not provide such an option. We can still add it later if requested.

@timfish timfish force-pushed the feat/offline branch 3 times, most recently from 7b26cea to 79b26f6 Compare January 20, 2023 19:23
@timfish
Copy link
Collaborator Author

timfish commented Jan 21, 2023

Reminder that we also have https://develop.sentry.dev/sdk/features/#buffer-to-disk

Thanks, I should have checked this first as a lot has been added there!

Now an event is only queued when:

  • The wrapped transport send throws
  • If supplied, options?.shouldStore(envelope, error, retryDelay) returns true
  • Drops Replay envelopes and returns the error

When retrying:

  • Waits at least 100ms between envelopes
  • Considers retry-after header
  • It's debounced

This PR does not include a means to disable sending while still queuing envelopes as it does seem too close to potential lifecycle hooks.

However it's possible to create a simple transport wrapper that returns an error unless a user has granted permission and then wrap the transports offline > permission > fetch. This will result in envelopes being queued:

function makePermissionTransport<T>(
  createTransport: (options: T) => Transport,
  hasPermission: () => Promise<boolean>,
): (options: T) => Transport {
  return options => {
    const transport = createTransport(options);
    return {
      send: async envelope => {
        const has = hasPermission();
        if (!has) {
          return transport.send(envelope);
        } else {
          throw new Error('No permission');
        }
      },
      flush: timeout => transport.flush(timeout),
    };
  };
}

let has = false;

function grantPermission() {
  has = true;
}

function revokePermission() {
  has = false;
}

const makePermissionFetchTransport = makePermissionTransport(makeFetchTransport, async () => has);
const makeOfflinePermissionFetchTransport = makeOfflineTransport(makePermissionFetchTransport , indexedDbOfflineStore);

@timfish timfish self-assigned this Jan 21, 2023
Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a good composition API!

packages/core/src/transports/offline.ts Show resolved Hide resolved
packages/core/src/transports/offline.ts Outdated Show resolved Hide resolved
packages/core/src/transports/offline.ts Outdated Show resolved Hide resolved
packages/core/src/transports/offline.ts Show resolved Hide resolved
@timfish
Copy link
Collaborator Author

timfish commented Jan 26, 2023

One minor limitation with this API is that the store implementation cannot add properties to options and be passed them at initialisation.

An alternative would be to move the store to options and then the store can be added by wrapping the transport creator. This way the store can be passed the full options object and supplement the options type, much like makeOfflineTransport does.

Examples of options for the store:

  • Browser - indexedDb database name
  • Node - Path to persist location

@timfish
Copy link
Collaborator Author

timfish commented Jan 26, 2023

Right, changes complete.

So rather than pass createStore as a parameter of makeOfflineTransport, it's a property of the transport options.

We can wrap it and supply createStore like below which supplements the transport options types too:

// Options specific to the store
interface IndexedDbOptions {
  storeName?: string,
  maxQueueSize?: number,
}

// The store implementation 
function createIndexedDbStore(options: IndexedDbOptions) {
  return {
    insert: (env) => { /* cool implementation here */ },
    pop: () => { /* cool implementation here */ },
  }
}

// A wrapper that supplies the store and modifies the transport options so types/autocomplete work
function makeIndexedDbStoreTransport<T>(
  createTransport: (options: T) => Transport,
): (options: T & IndexedDbOptions) => Transport { 
  return options => createTransport({...options, createStore: createIndexedDbStore });
}

const makeOfflineFetchTransport = makeIndexedDbStoreTransport(makeOfflineTransport(makeFetchTransport));

Sentry.init({
  dsn: 'DSN',
  transport: makeOfflineFetchTransport,
  transportOptions: {
    storeName: 'something' // the types all work because of above...
  }
})

Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the refactor to move it into options.

const found = await store.pop();
if (found) {
log('Attempting to send previously queued event');
void send(found).catch(e => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: So if we try to re-send an envelope, we pop it from the store. but if it fails, we do not re-add it? So we would discard this envelope?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it will be re-added to the queued and we will continue to retry the envelope later. This means when offline, an envelope will be popped and re-inserted multiple times. The initial retry delay is only 5 seconds so if we don't retry, this is more of a "retry once" wrapper than offline.

I considered having a peek from the store and delete after successful send but this increases the chance of envelopes being sent multiple times, for example if delete fails or the page is closed between peek/delete.

await store.insert(envelope);
flushWithBackOff();
log('Error sending. Event queued', e);
return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: Do we need to return {} here, or would void be fine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The void return option will be removed in v8. It was only kept for now to avoid an API breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, makes sense! just an idea, maybe we could return some content in that object (?) that makes it clear this is a failed resend, not sure. Prob. not a big deal but could help in the future with debugging or so ("Why am I getting an empty object back here???")?

@AbhiPrasad AbhiPrasad merged commit 56f6c7c into getsentry:master Jan 31, 2023
@timfish timfish deleted the feat/offline branch March 14, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offline support via Transport wrapper Can't get Offline integration to work
4 participants