You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
osu_mbw_mr D D shows bad performance with UCX_PROTO_ENABLE=y in compare to UCX_PROTO_ENABLE=n.
bad:
yt_slot_7@vla2-1174-gpu-node-hahn:~$ mpirun --prefix /opt/hpcx/ompi-ipv6 -x LD_LIBRARY_PATH --allow-run-as-root --bind-to none -mca orte_keep_fqdn_hostnames true -mca oob_tcp_if_include veth -mca plm_rsh_num_concurrent 300 -mca routed_radix 600 -mca plm_rsh_no_tree_spawn 1 -mca pmix_base_async_modex 1 --mca btl ^openib -mca pml ucx -x HPC_WORKSPACE --report-bindings -hostfile /slot/sandbox/mpi_hosts.txt -np 16 -x UCX_PROTO_ENABLE=y -x UCX_TLS=rc,cuda_copy /opt/osu-micro-benchmarks/install/mpi/pt2pt/osu_mbw_mr D D
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01694] Warning: could not find environment variable "HPC_WORKSPACE"
Warning: Permanently added '[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net]:8089,[2a02:6b8:c1d:6e8d:10d:e7ae:bcd7:0]:8089' (RSA) to the list of known hosts.
Warning: Permanently added '[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net]:8087,[2a02:6b8:c1d:6d91:10d:e7ae:5e93:0]:8087' (RSA) to the list of known hosts.
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01634] MCW rank 9 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01633] MCW rank 8 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01716] MCW rank 5 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01713] MCW rank 2 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01711] MCW rank 0 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01717] MCW rank 6 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01715] MCW rank 4 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01636] MCW rank 11 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01637] MCW rank 12 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01638] MCW rank 13 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01639] MCW rank 14 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01714] MCW rank 3 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01712] MCW rank 1 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01718] MCW rank 7 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01635] MCW rank 10 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01640] MCW rank 15 is not bound (or bound to all available processors)
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# [ pairs: 8 ] [ window size: 64 ]
# Size MB/s Messages/s
# Datatype: MPI_CHAR.
1 1.30 1301272.82
2 2.59 1292791.72
4 5.23 1307511.58
8 10.37 1295688.95
16 20.81 1300802.14
32 41.28 1289854.91
64 83.16 1299317.21
128 165.58 1293585.26
256 718.29 2805839.37
512 1406.10 2746282.98
1024 2843.87 2777215.61
2048 5729.96 2797829.95
4096 10108.80 2467968.94
8192 16898.19 2062767.02
16384 18847.49 1150359.41
32768 19343.57 590319.06
65536 19699.38 300588.63
131072 18589.76 141828.64
262144 18676.20 71244.04
524288 19973.50 38096.43
1048576 20399.55 19454.53
2097152 19899.64 9488.89
4194304 19865.26 4736.25
good (see results for 128 KB):
yt_slot_7@vla2-1174-gpu-node-hahn:~$ mpirun --prefix /opt/hpcx/ompi-ipv6 -x LD_LIBRARY_PATH --allow-run-as-root --bind-to none -mca orte_keep_fqdn_hostnames true -mca oob_tcp_if_include veth -mca plm_rsh_num_concurrent 300 -mca routed_radix 600 -mca plm_rsh_no_tree_spawn 1 -mca pmix_base_async_modex 1 --mca btl ^openib -mca pml ucx -x HPC_WORKSPACE --report-bindings -hostfile /slot/sandbox/mpi_hosts.txt -np 16 -x UCX_PROTO_ENABLE=n -x UCX_TLS=rc,cuda_copy /opt/osu-micro-benchmarks/install/mpi/pt2pt/osu_mbw_mr D D
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01767] Warning: could not find environment variable "HPC_WORKSPACE"
Warning: Permanently added '[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net]:8087,[2a02:6b8:c1d:6d91:10d:e7ae:5e93:0]:8087' (RSA) to the list of known hosts.
Warning: Permanently added '[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net]:8089,[2a02:6b8:c1d:6e8d:10d:e7ae:bcd7:0]:8089' (RSA) to the list of known hosts.
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01700] MCW rank 8 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01702] MCW rank 10 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01704] MCW rank 12 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01703] MCW rank 11 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01706] MCW rank 14 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01701] MCW rank 9 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01705] MCW rank 13 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01707] MCW rank 15 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01785] MCW rank 1 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01790] MCW rank 6 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01789] MCW rank 5 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01786] MCW rank 2 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01787] MCW rank 3 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01784] MCW rank 0 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01788] MCW rank 4 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01791] MCW rank 7 is not bound (or bound to all available processors)
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# [ pairs: 8 ] [ window size: 64 ]
# Size MB/s Messages/s
# Datatype: MPI_CHAR.
1 0.65 654098.23
2 1.47 734891.91
4 2.91 726428.07
8 5.87 734064.96
16 11.67 729143.99
32 23.45 732678.31
64 45.97 718281.94
128 91.89 717865.58
256 183.69 717543.84
512 369.53 721732.15
1024 726.79 709751.45
2048 1422.15 694410.14
4096 2778.02 678226.51
8192 5499.50 671325.74
16384 5514.58 336583.34
32768 5610.06 171205.37
65536 5646.15 86153.46
131072 95134.90 725821.70
262144 96816.85 369326.97
524288 97889.50 186709.40
1048576 98319.89 93765.15
2097152 98569.71 47001.70
4194304 98699.50 23531.79
suboptimal selection of thresholds with UCX_PROTO_ENABLE=n, setting UCX_RNDV_THRESHOLD=0 improves it
UCX_RNDV_THRESH default:
yt_slot_7@vla2-1174-gpu-node-hahn:~$ mpirun --prefix /opt/hpcx/ompi-ipv6 -x LD_LIBRARY_PATH --allow-run-as-root --bind-to none -mca orte_keep_fqdn_hostnames true -mca oob_tcp_if_include veth -mca plm_rsh_num_concurrent 300 -mca routed_radix 600 -mca plm_rsh_no_tree_spawn 1 -mca pmix_base_async_modex 1 --mca btl ^openib -mca pml ucx -x HPC_WORKSPACE --report-bindings -hostfile /slot/sandbox/mpi_hosts.txt -np 16 -x UCX_PROTO_ENABLE=n -x UCX_TLS=rc,cuda_copy /opt/osu-micro-benchmarks/install/mpi/pt2pt/osu_mbw_mr D D
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01767] Warning: could not find environment variable "HPC_WORKSPACE"
Warning: Permanently added '[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net]:8087,[2a02:6b8:c1d:6d91:10d:e7ae:5e93:0]:8087' (RSA) to the list of known hosts.
Warning: Permanently added '[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net]:8089,[2a02:6b8:c1d:6e8d:10d:e7ae:bcd7:0]:8089' (RSA) to the list of known hosts.
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01700] MCW rank 8 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01702] MCW rank 10 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01704] MCW rank 12 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01703] MCW rank 11 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01706] MCW rank 14 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01701] MCW rank 9 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01705] MCW rank 13 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01707] MCW rank 15 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01785] MCW rank 1 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01790] MCW rank 6 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01789] MCW rank 5 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01786] MCW rank 2 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01787] MCW rank 3 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01784] MCW rank 0 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01788] MCW rank 4 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01791] MCW rank 7 is not bound (or bound to all available processors)
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# [ pairs: 8 ] [ window size: 64 ]
# Size MB/s Messages/s
# Datatype: MPI_CHAR.
1 0.65 654098.23
2 1.47 734891.91
4 2.91 726428.07
8 5.87 734064.96
16 11.67 729143.99
32 23.45 732678.31
64 45.97 718281.94
128 91.89 717865.58
256 183.69 717543.84
512 369.53 721732.15
1024 726.79 709751.45
2048 1422.15 694410.14
4096 2778.02 678226.51
8192 5499.50 671325.74
16384 5514.58 336583.34
32768 5610.06 171205.37
65536 5646.15 86153.46
131072 95134.90 725821.70
262144 96816.85 369326.97
524288 97889.50 186709.40
1048576 98319.89 93765.15
2097152 98569.71 47001.70
4194304 98699.50 23531.79
UCX_RNDV_THRESH=0:
yt_slot_7@vla2-1174-gpu-node-hahn:~$ mpirun --prefix /opt/hpcx/ompi-ipv6 -x LD_LIBRARY_PATH --allow-run-as-root --bind-to none -mca orte_keep_fqdn_hostnames true -mca oob_tcp_if_include veth -mca plm_rsh_num_concurrent 300 -mca routed_radix 600 -mca plm_rsh_no_tree_spawn 1 -mca pmix_base_async_modex 1 --mca btl ^openib -mca pml ucx -x HPC_WORKSPACE --report-bindings -hostfile /slot/sandbox/mpi_hosts.txt -np 16 -x UCX_PROTO_ENABLE=n -x UCX_TLS=rc,cuda_copy -x UCX_RNDV_THRESH=0 /opt/osu-micro-benchmarks/install/mpi/pt2pt/osu_mbw_mr D D
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01840] Warning: could not find environment variable "HPC_WORKSPACE"
Warning: Permanently added '[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net]:8089,[2a02:6b8:c1d:6e8d:10d:e7ae:bcd7:0]:8089' (RSA) to the list of known hosts.
Warning: Permanently added '[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net]:8087,[2a02:6b8:c1d:6d91:10d:e7ae:5e93:0]:8087' (RSA) to the list of known hosts.
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01768] MCW rank 9 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01863] MCW rank 6 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01769] MCW rank 10 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01772] MCW rank 13 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01771] MCW rank 12 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01859] MCW rank 2 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01773] MCW rank 14 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01770] MCW rank 11 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01774] MCW rank 15 is not bound (or bound to all available processors)
[vla2-0693-gpu-node-hahn.vla.yp-c.yandex.net:01767] MCW rank 8 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01860] MCW rank 3 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01864] MCW rank 7 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01858] MCW rank 1 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01862] MCW rank 5 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01861] MCW rank 4 is not bound (or bound to all available processors)
[vla2-1174-gpu-node-hahn.vla.yp-c.yandex.net:01857] MCW rank 0 is not bound (or bound to all available processors)
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# OSU MPI Multiple Bandwidth / Message Rate Test v7.2
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# [ pairs: 8 ] [ window size: 64 ]
# Size MB/s Messages/s
# Datatype: MPI_CHAR.
1 1.09 1087855.69
2 2.18 1090733.52
4 4.34 1083929.55
8 8.71 1088492.09
16 17.37 1085410.99
32 34.89 1090425.40
64 69.67 1088545.76
128 715.01 5585980.85
256 1420.29 5548017.93
512 2851.20 5568754.76
1024 5727.37 5593134.43
2048 11417.72 5575059.19
4096 21845.33 5333331.67
8192 45847.66 5596638.26
16384 70479.59 4301732.96
32768 82019.35 2503031.99
65536 89499.37 1365652.02
131072 94064.08 717651.99
262144 96358.34 367577.91
524288 97573.52 186106.72
1048576 98179.40 93631.18
2097152 98501.57 46969.21
4194304 98661.19 23522.66
Describe the bug
osu_mbw_mr D D
shows bad performance withUCX_PROTO_ENABLE=y
in compare toUCX_PROTO_ENABLE=n
.bad:
good (see results for 128 KB):
UCX_PROTO_ENABLE=n
, settingUCX_RNDV_THRESHOLD=0
improves itUCX_RNDV_THRESH default:
UCX_RNDV_THRESH=0:
Steps to Reproduce
The host file contains two hosts with
slot=8
per each.ucx_info -v
)Nope
Setup and versions
cat /etc/issue
orcat /etc/redhat-release
+uname -a
cat /etc/mlnx-release
(the string identifies software and firmware setup)rpm -q rdma-core
orrpm -q libibverbs
ofed_info -s
ibstat
oribv_devinfo -vv
commandAdditional information (depending on the issue)
ucx_info -d
to show transports and devices recognized by UCXSee
ucx_info -v
aboveThe text was updated successfully, but these errors were encountered: