New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Valkey Over RDMA transport #477
base: unstable
Are you sure you want to change the base?
Conversation
RDMA is the abbreviation of remote direct memory access. It is a technology that enables computers in a network to exchange data in the main memory without involving the processor, cache, or operating system of either computer. This means RDMA has a better performance than TCP, the test results show Valkey Over RDMA has a ~2.5X QPS and lower latency. In recent years, RDMA gets popular in the data center, especially RoCE(RDMA over Converged Ethernet) architecture has been widely used. Introduce Valkey Over RDMA protocol as a new transport for Valkey. For now, we defined 4 commands: - GetServerFeature & SetClientFeature: the two commands are used to negotiate features for further extension. There is no feature definition in this version. Flow control and multi-buffer may be supported in the future, this needs feature negotiation. - Keepalive - RegisterXferMemory: the heart to transfer the real payload. The 'TX buffer' and 'RX buffer' are designed by RDMA remote memory with RDMA write/write with imm, it's similar to several mechanisms introduced by papers(but not same): - Socksdirect: datacenter sockets can be fast and compatible <https://dl.acm.org/doi/10.1145/3341302.3342071> - LITE Kernel RDMA Support for Datacenter Applications <https://dl.acm.org/doi/abs/10.1145/3132747.3132762> - FaRM: Fast Remote Memory <https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf> Co-authored-by: Xinhao Kong <xinhao.kong@duke.edu> Co-authored-by: Huaping Zhou <zhouhuaping.san@bytedance.com> Co-authored-by: zhuo jiang <jiangzhuo.cs@bytedance.com> Co-authored-by: Yiming Zhang <zhangyiming1201@bytedance.com> Co-authored-by: Jianxi Ye <jianxi.ye@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Main changes in this patch: * implement *Valkey Over RDMA* protocol, see *Protocol* section in RDMA.md * implement server side of connection module only, this means we can *NOT* compile RDMA support as built-in. * add necessary information in RDMA.md * support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380', then check this by 'rdma res show cm_id' and valkey-cli(with RDMA support, but not implemented in this patch) * the full listeners show like(): listener0:name=tcp,bind=*,bind=-::*,port=6379 listener1:name=unix,bind=/var/run/valkey.sock listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379 listener3:name=tls,bind=*,bind=-::*,port=16379 valgrind test works fine: valgrind --track-origins=yes --suppressions=./src/valgrind.sup --show-reachable=no --show-possibly-lost=no --leak-check=full --log-file=err.txt ./src/valkey-server --port 6379 --loadmodule src/valkey-rdma.so port=6379 bind=xx.xx.xx.xx --loglevel verbose --protected-mode no --server_cpulist 2 --bio_cpulist 3 --aof_rewrite_cpulist 3 --bgsave_cpulist 3 --appendonly no performance test: server side: ./src/valkey-server --port 6379 # TCP port 6379 has no conflict with RDMA port 6379 --loadmodule src/valkey-rdma.so port=6379 bind=xx.xx.xx.xx bind=yy.yy.yy.yy --loglevel verbose --protected-mode no --server_cpulist 2 --bio_cpulist 3 --aof_rewrite_cpulist 3 --bgsave_cpulist 3 --appendonly no build a valkey-benchmark with RDMA support(not implemented in this patch), run on a x86(Intel Platinum 8260) with RoCEv2 interface(Mellanox ConnectX-5): client side: ./src/valkey-benchmark -h xx.xx.xx.xx -p 6379 -c 30 -n 10000000 --threads 4 -d 1024 -t ping,get,set --rdma ====== PING_INLINE ====== 480561.28 requests per second, 0.060 msec avg latency. ====== PING_MBULK ====== 540482.06 requests per second, 0.053 msec avg latency. ====== SET ====== 399952.00 requests per second, 0.073 msec avg latency. ====== GET ====== 443498.31 requests per second, 0.065 msec avg latency. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #477 +/- ##
============================================
+ Coverage 68.91% 69.81% +0.90%
============================================
Files 109 109
Lines 61792 61792
============================================
+ Hits 42581 43138 +557
+ Misses 19211 18654 -557 |
This PR could be tested by client. To build client with RDMA:
To test by commands:
|
Many cloud providers offer RDMA acceleration on their cloud platforms, and I think that there is a foundational basis for the application of Valkey over RDMA. We performed some performance tests on this PR on the 8th generation ECS instances (g8ae.4xlarge, 16 vCPUs, 64G DDR) provided by Alibaba Cloud. Test results indicate that, compared to TCP sockets, the use of RDMA can significantly enhance performance. Test command of server side:
Test command of client side:
The performance test results are as shown in the following table. Apart from LRANGE_100 (performance improvement but not substantially), in other scenarios (PING, SET, GET) the throughput can be increased by at least 76%, and the average (AVG) and P99 latencies can be reduced by at least 40%.
|
Hi, @hz-cheng I notice that you are the author of alibaba-cloud erdma driver for both linux kernel and rdma-core. Cooooooooool! |
More, If necessary, I could try reaching out to relevant colleagues to see if we can offer some Alibaba Cloud ECS instances to the community for free, so that the community can use and test Valkey over RDMA, as well as for future CI/CD purposes. |
Is there a corresponding client that enables RDMA? |
See this comment please. |
Hi @madolson , |
Hi,
Since 2021/06, I created a PR for Redis Over RDMA proposal. Then I did some work to fully abstract connection and make TLS dynamically loadable, a new connection type could be built into Redis statically, or a separated shared library(loaded by Redis on startup) since Redis 7.2.0.
Base on the new connection framework, I created a new PR, some guys(@xiezhq-hermann @zhangyiming1201 @JSpewock @uvletter @FujiZ) noticed, played and tested this PR. However, because of the lack of time and knowledge from the maintainers, this PR has been pending about 2 years.
Changes in this PR:
Finally, if this feature is considered to merge, I volunteer to maintain it.