Skip to content

Commit 3b99149

Browse files
committed
btl tcp: Don't set socket buffer size by default
Set the default send and receive socket buffer size to 0, which means Open MPI will not try to set a buffer size during startup. The default behavior since near day one of the TCP BTL has been to set the send and receive socket buffer sizes to 128 KiB. A number that works great on 1 GbE, but not so great on 10 GbE fabrics of any real size. Modern TCP stacks, particularly on Linux, have gotten much smarter about buffer sizes and are much less efficient if a buffer size is set (even if set to something large). Signed-off-by: Brian Barrett <bbarrett@amazon.com>
1 parent c793dc8 commit 3b99149

File tree

1 file changed

+14
-2
lines changed

1 file changed

+14
-2
lines changed

opal/mca/btl/tcp/btl_tcp_component.c

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -250,8 +250,20 @@ static int mca_btl_tcp_component_register(void)
250250
mca_btl_tcp_param_register_int ("free_list_num", NULL, 8, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_num);
251251
mca_btl_tcp_param_register_int ("free_list_max", NULL, -1, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_max);
252252
mca_btl_tcp_param_register_int ("free_list_inc", NULL, 32, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_inc);
253-
mca_btl_tcp_param_register_int ("sndbuf", NULL, 128*1024, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_sndbuf);
254-
mca_btl_tcp_param_register_int ("rcvbuf", NULL, 128*1024, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_rcvbuf);
253+
mca_btl_tcp_param_register_int ("sndbuf",
254+
"The size of the send buffer socket option for each connection. "
255+
"Modern TCP stacks generally are smarter than a fixed size and in some "
256+
"situations setting a buffer size explicitly can actually lower "
257+
"performance. 0 means the tcp btl will not try to set a send buffer "
258+
"size.",
259+
0, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_sndbuf);
260+
mca_btl_tcp_param_register_int ("rcvbuf",
261+
"The size of the receive buffer socket option for each connection. "
262+
"Modern TCP stacks generally are smarter than a fixed size and in some "
263+
"situations setting a buffer size explicitly can actually lower "
264+
"performance. 0 means the tcp btl will not try to set a send buffer "
265+
"size.",
266+
0, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_rcvbuf);
255267
mca_btl_tcp_param_register_int ("endpoint_cache",
256268
"The size of the internal cache for each TCP connection. This cache is"
257269
" used to reduce the number of syscalls, by replacing them with memcpy."

0 commit comments

Comments
 (0)