Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path for Rocket Analytics should be the route name. #20

Open
twitchax opened this issue Aug 15, 2023 · 14 comments
Open

Path for Rocket Analytics should be the route name. #20

twitchax opened this issue Aug 15, 2023 · 14 comments

Comments

@twitchax
Copy link
Contributor

Hello!

Awesome project. For the rocket analytics, the "true path" is recorded. This ends up yielding endpoints in the UI like /api/things/some_unique_id, rather than likely the grouping that most people would want, like /api/things/<id>, which is the route name.

This would make all calls to /api/things/<id> grouped in the UI, which, I believe, is how most people would like to see the data grouped.

This can be fixed for rocket by encoding the route rather than the path (e.g., req.route().as_ref().uri.unwrap_or_else("UNKNOWN")).

I'm happy to contribute, but I wanted to verify that you are OK with this.

@twitchax
Copy link
Contributor Author

Or, maybe you could allow the user to pass in a closure that computes some of the fields. I run my services in fly.io, and the IP is from a gateway device; however, the real IP can be ascertained from the headers. Allowing me to custom override how IP is computed might be cool.

Again, I am willing to contribute on that, as well, but want to get your thoughts first.

@tom-draper
Copy link
Owner

Hi! Thanks a lot, great suggestions!

I could think of some situations in which capturing the true path would be beneficial, e.g. identifying the most active user, so maybe this could be an optional flag that you pass into the middleware at the same time you provide the API key. Maybe a true_path bool that defaults to true?

The closures would be a very cool upgrade but now you've mentioned it I think in general the IP address should be changed so it attempts to take from headers first and if not resorts to the IP from the gateway device.

I think we're going to need an optional parameters struct for these settings that we pass in along with the API key .attach(Analytics::new(<API-KEY>, parameters)).

I'm very happy for you to make these upgrades and I'll check them out and merge. If they work well I'll try and mirror them on actix and axum
Thanks :)

@twitchax
Copy link
Contributor Author

Cool, sounds good.

One issue with the IP is there are some serverless platforms like fly.io that uses custom headers. That's why it might be valuable to allow the application to specify a "mapper" function.

@tom-draper
Copy link
Owner

Ah, I didn't realise that! Passing in a closure is definitely for the todo list then. Is it only the IP address that would be affected by this?

@twitchax
Copy link
Contributor Author

twitchax commented Aug 17, 2023 via email

@twitchax
Copy link
Contributor Author

Ok, here is a pretty good way to implement what I was thinking. I am deploying using my client on a running service. Feel free to hit me up on Discord ('twitchax`) if you want to discuss.

#21

@twitchax
Copy link
Contributor Author

The API usage looks like this.

image

And, after deploying, location and request routes are working in the way the suits my needs.

image

Still some issues with the response times chart, and I just did a 100,000 request test, and only 4,000 got captured.

@twitchax
Copy link
Contributor Author

Ohhh, I just realized having a filter would be good, as well.

I want to filter away my health checks, so we could add a with_filter that takes a predicate, and just ignores some requests specified by the user.

@twitchax
Copy link
Contributor Author

Ok, I see why the response times are so low. Most of my handlers return within the low microsecond, sometimes nanosecond, timeframe, so the as_millis is just 0.

@twitchax
Copy link
Contributor Author

Also, some gzip or brotli from the server might be good.

I'd be curious to make this play-aroundable. Like, what if the frontend could also take an "endpoint", and we could try some other server technologies? I love what you have done here, and I'd love to help you offer it for free, but I think that moving to something like SurrealDB might make your response times cut by 10x - 100x. Would be cool to go off an experiment with a reimplementation, and just point the frontend at it.

@tom-draper
Copy link
Owner

tom-draper commented Aug 17, 2023

The API usage looks like this.

image And, after deploying, location and request routes are working in the way the suits my needs. image Still some issues with the response times chart, and I just did a 100,000 request test, and only 4,000 got captured.

That seems like a great way to do it! Thanks
Yeah response times don't capture any network latency so a 0ms is fairly common if you're not doing much computation.
4000/100,000 does not sound good. There is a I knew about an issue with inconsistent request loggings for FastAPI but this narrows it down to the server. I'll investigate that.
Edit: There's a temporary rate limiter set for 1000 requests logged per minute that has been set until I come up with a more scalable database solution that I forgot about.

@tom-draper
Copy link
Owner

tom-draper commented Aug 17, 2023

Also, some gzip or brotli from the server might be good.

I'd be curious to make this play-aroundable. Like, what if the frontend could also take an "endpoint", and we could try some other server technologies? I love what you have done here, and I'd love to help you offer it for free, but I think that moving to something like SurrealDB might make your response times cut by 10x - 100x. Would be cool to go off an experiment with a reimplementation, and just point the frontend at it.

Compression is definitely a great upgrade but I haven't had time to get around to it yet. At the moment I'm using a basic postgresql database but I've always thought migration to a time series database like timescaledb would be a big improvement. I've never used SurrealDB, but it looks like it could be cool.

Do you mean response times when loading the dashboard? I'm fairly ~90% is due to rendering the graphs, especially with a large amount of requests logged.

@twitchax
Copy link
Contributor Author

Yeah, the network call to the backend takes about 4 seconds. The frontend render is nearly instantaneous.

@tom-draper
Copy link
Owner

Hey, thanks a lot for your help and sorry about the delay I've had a lot going on. I've merged your pull request, but I've removed the user configurable batch_length_seconds for the time being to avoid the potential for abuse/overloading the server or causing API performance issues due to frequent thread creation. The only downside to this is that the dashboard data can be up to 60 seconds delayed which I think is still acceptable. This would be a good one to re-include in the future when the server is more stable and I have a more scalable storage solution.
I'll add those features to the analytics packages for the other frameworks and release them as new versions soon. I'll also try and improve that response time when I get a chance.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants