Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A80: gRPC Metrics for TCP connection #428

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
dedfb16
Create A80-grpc-metrics-for-tcp-connection
nanahpang Apr 22, 2024
ffaeb22
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 23, 2024
d413291
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 24, 2024
5b5ba3f
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 25, 2024
583e6b3
Update and rename A80-grpc-metrics-for-tcp-connection to A80-grpc-met…
nanahpang Apr 29, 2024
8aa21c1
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang Apr 29, 2024
9f8038c
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 1, 2024
59ab138
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 2, 2024
ce27a69
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 2, 2024
d239c39
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 10, 2024
0726f6e
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 15, 2024
3bfe76b
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 15, 2024
2ccf768
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 21, 2024
83ac908
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
2a11aea
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
b6dc6d9
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
0aceebe
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
052d5cf
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
7e5bc86
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
092fbc1
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
bd18940
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
62 changes: 0 additions & 62 deletions A80-grpc-metrics-for-tcp-connection

This file was deleted.

74 changes: 74 additions & 0 deletions A80-grpc-metrics-for-tcp-connection.md
@@ -0,0 +1,74 @@
A80: gRPC Metrics for TCP connection
----
* Author(s): Yash Tibrewal (@yashykt), Nana Pang (@nanahpang), Yousuk Seung (@yousukseung)
* Approver: Craig Tiller (@ctiller), Mark Roth (@markdroth)
* Status: {Draft, In Review, Ready for Implementation, Implemented}
* language: {...}
* Last updated: 2024-04-18
* Discussion at: https://groups.google.com/g/grpc-io/c/AyT0LVgoqFs

## Abstract

This document proposes adding new TCP connection metrics to gRPC for improved network analysis and debugging.

## Background

To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79].

### Related Proposals:
* [A79]: gRPC Non-Per-Call Metrics Framework (pending)
markdroth marked this conversation as resolved.
Show resolved Hide resolved

[A79]: https://github.com/grpc/proposal/pull/421

## Proposal

This document proposes changes to the following gRPC components.

#### Per-Connection TCP Metrics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be ### instead of ####.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion.


We will provide the following metrics:
- `grpc.tcp.min_rtt`
- `grpc.tcp.delivery_rate`
- `grpc.tcp.packets_sent`
- `grpc.tcp.packets_retransmitted`
- `grpc.tcp.packets_spurious_retransmitted`

The metrics will have label:

| Name | Disposition | Description |
| ----------- | ----------- | ----------- |
| grpc.tcp.peer_address | optional | Store the peer address info in URI format such as `ipv4:1.2.3.4:567`. |
| grpc.tcp.local_address | optional | Store the local address info in URI format such as `ipv4:1.2.3.4:567`. |

The metrics will be exported as:

| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashykt What types should we be defining here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the Counter type to uint64 based on the register methods in metrics.h.
@yashykt Feel free to modify if it will use other types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int64 and double are very C/C++ specific. I prefer integer and floating-point to keep it generic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion.

| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|

The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked.

#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
* Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst

### Metric Stability

All metrics added in this proposal will start as experimental. The long term goal will be to
de-experimentalize them and have them be on by default, but the exact
criteria for that change are TBD.

### Temporary environment variable protection

This proposal does not include any features enabled via external I/O, so
it does not need environment variable protection.

## Implementation

Will be implemented in C-core, and currently have no plans to implement in other languages.