rumqttd: Changes remote start span from error to info. #782

nathansizemore · 2024-01-06T12:51:06Z

This fixes initial connection span being treated as error instead of info.

Type of change

Miscellaneous (related to maintenance)

Checklist:

Formatted with cargo fmt
Make an entry to CHANGELOG.md if it's relevant to the users of the library. If it's not relevant mention why.

swanandx · 2024-01-06T13:45:14Z

it was set to error_span! because otherwise tracing won't capture / show additional fields with error log normally ( unless we set verbosity to -vvv )

see #510

nathansizemore · 2024-01-06T14:04:11Z

This causes false error types to show up in logs, through the logging facade tracing is built upon. So in the instance of the remote link being established, an error log is produced:

This shows itself when logging blocks are built around the log facade directly (which we do for our network logger). This causes false alarming in our stuff, which we'd like to decrease.

nathansizemore · 2024-01-06T14:12:57Z

This also would bring in the same span type as websocket remote. It uses info, this uses error.

swanandx · 2024-01-06T14:27:34Z

This shows itself when logging blocks are built around the log facade directly

can you elaborate more? didn't get it 😅

nathansizemore · 2024-01-06T14:47:59Z

This shows itself when logging blocks are built around the log facade directly

can you elaborate more? didn't get it 😅

Sure. If you look at the Rust ecosystem logging, after the dependency chain is done, you'll usually end with log. It doesn't do any logging itself, but provides the schemes to do so with the levels and macros. A common pattern is good, everyone can use it, and almost all crates do, this does, we do with ours, etc... Because it isn't a logger directly, just an interface, anyone can build a logger up around it to their own needs. Using us as an example, we have 3 impls in our stuff:

One that intercepts logs and sends them to stdout
One that intercepts logs and sends them to a file
One that intercepts logs and sends them to network store, where metrics, dashboards, alarming, etc... are centrailized.

Log is nice because each of those can have their own impl and format as needed. Log is also nice because you get all the other logs from applications that use it. However, this can also be a downside when people use logging differently. The case of rumqttd and us, it is great that we don't have to fork the broker and put in specialty logs because rumqttd uses log, but because rumqttd uses that error span, that shows up as error type on our end, because we log error types as error types. This causes a problem in the network logger because alarms and dashboards are built around it.

If for example, 100 clients connect at the same time, this produces 100 "error" logs of them connecting because of that error span. In alarming, if we have an alarm that is triggered when x error types reach a certain trreshold, this triggers that alarm, someone has to investigate, etc...

swanandx · 2024-01-06T15:25:45Z

because rumqttd uses that error span, that shows up as error type on our end

wdym by error type ?

btw, error_span! is used in other places, like for all events:

rumqtt/rumqttd/src/router/routing.rs

Line 236 in 8c04339

let span = tracing::error_span!("[>] incoming", connection_id = id);

and other places in routing.rs so why only this error span trigger it?

maybe some issue with instrument()?

note: rumqttd uses tracing instead of log 😄

nathansizemore · 2024-01-06T15:28:38Z

because rumqttd uses that error span, that shows up as error type on our end

wdym by error type ?

The log type comes in as "error", not info, debug, trace, etc...

btw, error_span! is used in other places, like for all events:

rumqtt/rumqttd/src/router/routing.rs

Line 236 in 8c04339

let span = tracing::error_span!("[>] incoming", connection_id = id);

and other places in routing.rs so why only this error span trigger it?

maybe some issue with instrument()?

Maybe it is just when instrument is called with it, this is the only spot it happens.

note: rumqttd uses tracing instead of log 😄

Yes, but if you run the dependency chain all the way down, you are using log :): https://github.com/tokio-rs/tracing/blob/master/tracing/Cargo.toml#L35

swanandx · 2024-01-06T16:04:26Z

Maybe it is just when instrument is called with it, this is the only spot it happens.

Might be issue with instrument then? 👀 We should investigate it further to figure out actual cause!

nathansizemore · 2024-01-06T17:24:37Z

Maybe it is just when instrument is called with it, this is the only spot it happens.

Might be issue with instrument then? 👀 We should investigate it further to figure out actual cause!

Not particularly a problem with instrumented. The problem is saying that events emitted inside this span are to be treated as errors, which I do not think is ultimately what is wanted.

When a thing is instrumented, anytime a poll or drop occurs, the span is entered:

impl<T: Future> Future for Instrumented<T> {
    type Output = T::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let (span, inner) = self.project().span_and_inner_pin_mut();
        let _enter = span.enter();
        inner.poll(cx)
    }
}

In our case, instrumenting begins, new connection happens, thing is polled, event is emitted, that event is an error event because the span it is in is an error span.

In regards to #510, it was changed from info to error not because the level needed changed for events that occurred, but because changing it to error was a way to get auto information logged. Keeping the span as info, and adding in the pieces of information to the log where needed is the way to go imo.

After checking, this does effect all other logs. I just didn't notice immediately because I forgot I muted all of routing because of this reason as weill:

    journal::init(
        env!("CARGO_BIN_NAME"),
        env!("CARGO_PKG_VERSION"),
        Some(vec!["rumqttd::router::routing".to_owned()]),
    );

(The last param is a vec of modules to blacklist from our logging). Currently it is blacklisted because everything reports as error type.

swanandx · 2024-01-08T09:55:45Z

After checking, this does effect all other logs. I just didn't notice immediately because I forgot I muted all of routing because of this reason as weill

oh, then it makes sense. ThenTtis means we should also change error_span! to appropriate span level in routing.rs as well right?

Keeping the span as info, and adding in the pieces of information to the log where needed is the way to go imo.

adding those pieces, like client_id/connection_id in every log defeats the purpose of span, we should think of some other way to tackle it.

I will see if there is something we can do at tracing_subscriber end 👀

nathansizemore added 2 commits January 6, 2024 09:38

asdf

f0e642d

asdf

9666907

nathansizemore force-pushed the connection-start-span branch from 19a84e3 to 9666907 Compare January 6, 2024 14:38

de-sh force-pushed the main branch from 1fd3c83 to e63bcab Compare May 29, 2024 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rumqttd: Changes remote start span from error to info. #782

rumqttd: Changes remote start span from error to info. #782

nathansizemore commented Jan 6, 2024

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024 •

edited

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024 •

edited

swanandx commented Jan 8, 2024

rumqttd: Changes remote start span from error to info. #782

Are you sure you want to change the base?

rumqttd: Changes remote start span from error to info. #782

Conversation

nathansizemore commented Jan 6, 2024

Type of change

Checklist:

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024 • edited

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024

swanandx commented Jan 6, 2024

nathansizemore commented Jan 6, 2024 • edited

swanandx commented Jan 8, 2024

nathansizemore commented Jan 6, 2024 •

edited

nathansizemore commented Jan 6, 2024 •

edited