Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rumqttd: Changes remote start span from error to info. #782

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nathansizemore
Copy link
Contributor

This fixes initial connection span being treated as error instead of info.

Type of change

Miscellaneous (related to maintenance)

Checklist:

  • Formatted with cargo fmt
  • Make an entry to CHANGELOG.md if it's relevant to the users of the library. If it's not relevant mention why.

@swanandx
Copy link
Member

swanandx commented Jan 6, 2024

it was set to error_span! because otherwise tracing won't capture / show additional fields with error log normally ( unless we set verbosity to -vvv )

see #510

@nathansizemore
Copy link
Contributor Author

This causes false error types to show up in logs, through the logging facade tracing is built upon. So in the instance of the remote link being established, an error log is produced:
image

This shows itself when logging blocks are built around the log facade directly (which we do for our network logger). This causes false alarming in our stuff, which we'd like to decrease.

@nathansizemore
Copy link
Contributor Author

This also would bring in the same span type as websocket remote. It uses info, this uses error.

@swanandx
Copy link
Member

swanandx commented Jan 6, 2024

This shows itself when logging blocks are built around the log facade directly

can you elaborate more? didn't get it 😅

@nathansizemore
Copy link
Contributor Author

nathansizemore commented Jan 6, 2024

This shows itself when logging blocks are built around the log facade directly

can you elaborate more? didn't get it 😅

Sure. If you look at the Rust ecosystem logging, after the dependency chain is done, you'll usually end with log. It doesn't do any logging itself, but provides the schemes to do so with the levels and macros. A common pattern is good, everyone can use it, and almost all crates do, this does, we do with ours, etc... Because it isn't a logger directly, just an interface, anyone can build a logger up around it to their own needs. Using us as an example, we have 3 impls in our stuff:

  • One that intercepts logs and sends them to stdout
  • One that intercepts logs and sends them to a file
  • One that intercepts logs and sends them to network store, where metrics, dashboards, alarming, etc... are centrailized.

Log is nice because each of those can have their own impl and format as needed. Log is also nice because you get all the other logs from applications that use it. However, this can also be a downside when people use logging differently. The case of rumqttd and us, it is great that we don't have to fork the broker and put in specialty logs because rumqttd uses log, but because rumqttd uses that error span, that shows up as error type on our end, because we log error types as error types. This causes a problem in the network logger because alarms and dashboards are built around it.

If for example, 100 clients connect at the same time, this produces 100 "error" logs of them connecting because of that error span. In alarming, if we have an alarm that is triggered when x error types reach a certain trreshold, this triggers that alarm, someone has to investigate, etc...

@swanandx
Copy link
Member

swanandx commented Jan 6, 2024

because rumqttd uses that error span, that shows up as error type on our end

wdym by error type ?

btw, error_span! is used in other places, like for all events:

let span = tracing::error_span!("[>] incoming", connection_id = id);
and other places in routing.rs so why only this error span trigger it?

maybe some issue with instrument()?

note: rumqttd uses tracing instead of log 😄

@nathansizemore
Copy link
Contributor Author

because rumqttd uses that error span, that shows up as error type on our end

wdym by error type ?

The log type comes in as "error", not info, debug, trace, etc...

btw, error_span! is used in other places, like for all events:

let span = tracing::error_span!("[>] incoming", connection_id = id);

and other places in routing.rs so why only this error span trigger it?

maybe some issue with instrument()?

Maybe it is just when instrument is called with it, this is the only spot it happens.

note: rumqttd uses tracing instead of log 😄

Yes, but if you run the dependency chain all the way down, you are using log :): https://github.com/tokio-rs/tracing/blob/master/tracing/Cargo.toml#L35

@swanandx
Copy link
Member

swanandx commented Jan 6, 2024

Maybe it is just when instrument is called with it, this is the only spot it happens.

Might be issue with instrument then? 👀 We should investigate it further to figure out actual cause!

@nathansizemore
Copy link
Contributor Author

nathansizemore commented Jan 6, 2024

Maybe it is just when instrument is called with it, this is the only spot it happens.

Might be issue with instrument then? 👀 We should investigate it further to figure out actual cause!

Not particularly a problem with instrumented. The problem is saying that events emitted inside this span are to be treated as errors, which I do not think is ultimately what is wanted.

When a thing is instrumented, anytime a poll or drop occurs, the span is entered:

impl<T: Future> Future for Instrumented<T> {
    type Output = T::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let (span, inner) = self.project().span_and_inner_pin_mut();
        let _enter = span.enter();
        inner.poll(cx)
    }
}

In our case, instrumenting begins, new connection happens, thing is polled, event is emitted, that event is an error event because the span it is in is an error span.

In regards to #510, it was changed from info to error not because the level needed changed for events that occurred, but because changing it to error was a way to get auto information logged. Keeping the span as info, and adding in the pieces of information to the log where needed is the way to go imo.

After checking, this does effect all other logs. I just didn't notice immediately because I forgot I muted all of routing because of this reason as weill:

    journal::init(
        env!("CARGO_BIN_NAME"),
        env!("CARGO_PKG_VERSION"),
        Some(vec!["rumqttd::router::routing".to_owned()]),
    );

(The last param is a vec of modules to blacklist from our logging). Currently it is blacklisted because everything reports as error type.

@swanandx
Copy link
Member

swanandx commented Jan 8, 2024

After checking, this does effect all other logs. I just didn't notice immediately because I forgot I muted all of routing because of this reason as weill

oh, then it makes sense. ThenTtis means we should also change error_span! to appropriate span level in routing.rs as well right?

Keeping the span as info, and adding in the pieces of information to the log where needed is the way to go imo.

adding those pieces, like client_id/connection_id in every log defeats the purpose of span, we should think of some other way to tackle it.

I will see if there is something we can do at tracing_subscriber end 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants