Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert span to ReadableSpan before sending to span processor #1902

Open
obecny opened this issue Feb 4, 2021 · 15 comments
Open

Convert span to ReadableSpan before sending to span processor #1902

obecny opened this issue Feb 4, 2021 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@obecny
Copy link
Member

obecny commented Feb 4, 2021

Currently when Span ends it sends the whole self to the span processor. Span Processor expects the span to be ReadableSpan but in fact the span that is being sent is a whole object with methods etc. I think we should send only an object in correct format. We could have for example a private function "_serialize" or "_toReadableSpan" which will send only required data and nothing more. We should also think about sealing the data so it cannot be modified.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#interface-definition

@obecny obecny added the enhancement New feature or request label Feb 4, 2021
@obecny obecny self-assigned this Feb 4, 2021
@Flarna
Copy link
Member

Flarna commented Feb 4, 2021

As a SpanProcessor gets the real span in onStart it's not really a protection.

I could imagine that there are SpanProcessor implementations which add some property to the span received in onStart for pure internal use. Or if modification of the span should be avoided they could use it as key in a Map/WeakMap which is effective the same problematic scenario as they get something different in onEnd and want find their data.
Most likely they could use the SpanContext instead the span but if we intend to protect our spans it would be also needed to create a read only copy of them.

I assume a deep copy + sealing of all spans is quite costly.

Passing sealed/frozen objects to users is in my opinion not what one expects. Usually I assume that I can always add a private symbol to an object.

And still this doesn't protect 100% for missues as a SpanProcessor could simply monkey patch this part of the SDK to skip this sealing/deep copy.

We are in JavaScript world and if people want to shoot themself in their foot they will always find a way to do this.

How is this done in other languages?

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

@Flarna you weren't at last sig meeting. We won't be sealing the object, but we will map the values so only this what is required will be sent and nothing more. currently we are sending the whole span which is more then only a readable span.
The sealing / freeze is something we can discuss but we agreed that this might be too costly to do it.

@Flarna
Copy link
Member

Flarna commented Feb 4, 2021

So this means the object passed to onStart is never passed to onEnd?

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

So this means the object passed to onStart is never passed to onEnd?

if you look closely you will see that even the type of those 2 functions is different.

@dyladan
Copy link
Member

dyladan commented Feb 4, 2021

So this means the object passed to onStart is never passed to onEnd?

Hmm I didn't think about this before. IMO that might be a dealbreaker. A span processor needs to be able to use onEnd to clean up things done in onStart. If a different object is sent a WeakMap can't be used.

if you look closely you will see that even the type of those 2 functions is different.

The type can be different to encourage users not to modify the span or call functions in onEnd, but IMO it's not a problem to not match exactly as long as it matches the type at least.

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

Well the spec says that onEnd it should readable object, modification should not be allowed. I can make a deep freeze function then which will work fine across all different browsers too ass it is all well supported.

@dyladan
Copy link
Member

dyladan commented Feb 4, 2021

The spec also says exceptions are allowed in cases where it isn't practical. deep freeze may be quite costly.

@Flarna
Copy link
Member

Flarna commented Feb 4, 2021

Well, a SpanProcessor could use a Map<TraceId, Map<SpanId, Span>> or simialar to find back to the span seen in onStart. It's quite some overhead but it would effect only SpanProcessors which have the need for this.

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

deep freeze on my machine for 1k spans with 1 attribute takes around 5ms-6ms

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

const { BasicTracerProvider } = require('./build/src');
const provider = new BasicTracerProvider();

const tracer = provider.getTracer('test');

const span = tracer.startSpan('span');
span.setAttribute('test', 'value1');

const start = new Date().getTime();
function deepFreeze(obj) {

  // Retrieve the property names defined on obj
  var propNames = Object.getOwnPropertyNames(obj);

  // Freeze properties before freezing self
  propNames.forEach(function(name) {
    var prop = obj[name];

    // Freeze prop if it is an object
    if (typeof prop == 'object' && prop !== null)
      deepFreeze(prop);
  });

  // Freeze self (no-op if already frozen)
  return Object.freeze(obj);
}
for (let i = 0, j = 1000; i < j; i++) {

  const readableSpan = deepFreeze(span);
}
const end = new Date().getTime();

console.log(end - start, 'ms');

source for deep freeze
https://developer.mozilla.org/pl/docs/Web/JavaScript/Referencje/Obiekty/Object/freeze

@dyladan
Copy link
Member

dyladan commented Feb 4, 2021

Well, a SpanProcessor could use a Map<TraceId, Map<SpanId, Span>> or simialar to find back to the span seen in onStart. It's quite some overhead but it would effect only SpanProcessors which have the need for this.

This could be a memory leak in the case span is not ended

@Flarna
Copy link
Member

Flarna commented Feb 4, 2021

Well, a SpanProcessor could use a Map<TraceId, Map<SpanId, Span>> or simialar to find back to the span seen in onStart. It's quite some overhead but it would effect only SpanProcessors which have the need for this.

This could be a memory leak in the case span is not ended

That's the reason why I prefer to set a private symbol in such cases.

I still don't understand what the benefit of freezing/creation of a fresh readable span is. Which problem does/should this solve?

The SpanProcessor is the thing between SDK and Exporter. Even if it gets only deep frozen spans it can deep clone, modify and pass the modified span to exporter. An evil SpanProcessor can even discard/invent spans as needed.

IMHO doing this doesn't protect from misuse but adds overhead and risk of surprises for users.

But if there is consensus to go this path I will not block it.

There is a reason why I'm not a big fan of read-only/freezing: A while ago I added a read-only property to EventEmitter in node. It seemed to be the right to set it read-only as modification of this would be always wrong. It turned out that this broke quite something (see nodejs/node#30932 (comment)), was reverted and added again as "normal", writable property.

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

Well, a SpanProcessor could use a Map<TraceId, Map<SpanId, Span>> or simialar to find back to the span seen in onStart. It's quite some overhead but it would effect only SpanProcessors which have the need for this.

This could be a memory leak in the case span is not ended

According to spec the main purpose of onStart is to keep reference to the span. This is because you can then do some periodic export. But you can't use a WeakMap for that because WeakMap is not iterable object and thus is useless for such purpose. So you have to save reference in some array / map object anyway to be able to iterate over it periodically. Based on that having scenario that you are keeping a reference just to check for something when spans ends sounds unrealistic for me and it would not make any sense. This way if anyone wants to implement any extra logic for periodic export of unfinished spans will have to deal with such problem anyway and exporting the exactly the same object as in onStart doesn't sound for me as a realistic scenario.

@obecny
Copy link
Member Author

obecny commented Feb 4, 2021

I still don't understand what the benefit of freezing/creation of a fresh readable span is. Which problem does/should this solve?

The SpanProcessor is the thing between SDK and Exporter. Even if it gets only deep frozen spans it can deep clone, modify and pass the modified span to exporter. An evil SpanProcessor can even discard/invent spans as needed.

IMHO doing this doesn't protect from misuse but adds overhead and risk of surprises for users.

As a user I would read spec which says that when span is ended it is only readable and cannot be changed. Imagine now a scenario where you have multiple span processors or/and exporters etc. and one of such will modify the span. It will suddenly affect all processors and spans. If you freeze span it cannot be modified and thus nothing strange can happen. You have the same with badge I can guess that it was similar approach with that and similar reasons.

@Flarna
Copy link
Member

Flarna commented Feb 5, 2021

FWIW I implemented already a SpanProcessor sending non finished spans to our backend. It uses a Set to hold the spans to send regularly. To avoid leaks for never ended spans I wipe out all spans older then ~2h.

Currently I rely on onStart() and onEnd() providing the same span, different objects would result in two entries in my Set.

As mentioned above this can be fixed by replacing the single Set by a more complex, significant slower construct.

Object.freeze() doesn't break the identity so this would work without changes.

But considering the overhead I would vote for an opt-in.

if someone wonders why we send in progress spans to our backend: it allows to correlate the whole span tree early even if there is a long running span somewhere in the middle. e.g. ShortSpan1 => LongSpan2 => ShortSpan3
In this case our backend knows about LongSpan2 early and can show the whole tree and not after e.g. 1.5h when it's ended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants