netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wen Gu <guwen@linux.alibaba.com>
To: "Iordache Costin (XC-AS/EAE-UK)" <Costin.Iordache2@bosch.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	"D . Wythe" <alibuda@linux.alibaba.com>,
	Tony Lu <tonylu@linux.alibaba.com>,
	Nils Hoppmann <niho@linux.ibm.com>
Cc: "linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	Jan Karcher <jaka@linux.ibm.com>,
	Dust Li <dust.li@linux.alibaba.com>,
	Alexandra Winter <wintera@linux.ibm.com>
Subject: Re: SMC-R throughput drops for specific message sizes
Date: Mon, 19 Feb 2024 16:44:58 +0800	[thread overview]
Message-ID: <c9bf102b-a084-4c9a-8a19-1d884910912c@linux.alibaba.com> (raw)
In-Reply-To: <95858f61-f47b-452e-98e0-bc18a20b4687@linux.alibaba.com>

[-- Attachment #1: Type: text/plain, Size: 15782 bytes --]



On 2024/2/5 11:50, Wen Gu wrote:
> 
> 
> On 2024/2/1 21:50, Iordache Costin (XC-AS/EAE-UK) wrote:
>> Hi,
>>
>> This is Costin, Alex's colleague. We've got additional updates which we thought would be helpful to share with the 
>> community.
>>
>> Brief reminder, our hardware/software context is as follows:
>>       - 2 PCs, each equipped with one Mellanox ConnectX-5 HCA (MT27800), dual port
>>       - only one HCA port is active/connected on each side (QSFP28 cable)
>>       - max HCA throughput: 25Gbps ~ 3.12GBs.
>>       - max/active MTU: 4096
>>       - kernel: 6.5.0-14
>>       - benchmarking tool: qperf 0.4.11
>>
>> Our goal has been to gauge the SMC-R benefits vs TCP/IP . We are particularly interested in maximizing the throughput 
>> whilst reducing CPU utilisation and DRAM memory bandwidth for large data (> 2MB) transfers.
>>
>> Our main issue so far has been SMC-R halves the throughput for some specific message sizes (as opposed to TCP/IP) - 
>> see "SMC-R vs TCP" plot.
>>
>> Since our last post the kernel was upgraded from 5.4 to 6.5.0-14 hoping it would alleviate the throughput drops, but 
>> it did not, so we bit the bullet and delved into the SMC-R code.
>>
> 
> Hi Costin,
> 
> FYI, I have also reproduced this in my environment(see attached), with Linux6.8-rc1
> 
> - 2 VMs, each with 2 passthrough ConnectX-4.
> - kernel: Linux6.8-rc1
> - benchmarking tool: qperf 0.4.11
> 
> But it might take me some time to dive deeper into what exactly happened.
> 

Hi Costin and community,

I would like to share some findings with you. The performance drop might be
related to the SMC window size update strategy. Under certain message sizes,
the RMB utilization may drop and transmission is inefficient.

SMC window size update strategy can be referred from page 63 of RFC7609:
https://datatracker.ietf.org/doc/html/rfc7609#page-63 .
and the linux implementation is smc_tx_consumer_update() in net/smc/smc_tx.c.

"
The current window size (from a sender's perspective) is less than
half of the receive buffer space, and the consumer cursor update
will result in a minimum increase in the window size of 10% of the
receive buffer space.
"

void smc_tx_consumer_update(struct smc_connection *conn, bool force)
{
<...>
	if (conn->local_rx_ctrl.prod_flags.cons_curs_upd_req ||
	    force ||
	    ((to_confirm > conn->rmbe_update_limit) &&
	     ((sender_free <= (conn->rmb_desc->len / 2)) ||
	      conn->local_rx_ctrl.prod_flags.write_blocked))) {
		if (conn->killed ||
		    conn->local_rx_ctrl.conn_state_flags.peer_conn_abort)
			return;
		if ((smc_cdc_get_slot_and_msg_send(conn) < 0) &&            <- update the window
		    !conn->killed) {
			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
					   SMC_TX_WORK_DELAY);
			return;
		}
	}
<...>
}


# Test results

In my qperf test, one of performance drops can be observed when msgsize
changes from (90% * RMB - 1) to (90% * RMB).

Take 128KB sndbuf and RMB as an example,

#sysctl net.smc.wmem=131072	# both sides
#sysctl net.smc.rmem=131072

10% * RMB is 13107 bytes, so compare the qperf results when msgsize is
117964 bytes (128KB - 13107B - 1) and 117965 bytes (128KB - 13107B):

- msgsize 117964

# smc_run taskset -c <serv_cpu> qperf
# smc_run taskset -c <clnt_cpu> qperf <serv_ip> -m 117964 -t 10 -vu tcp_bw
tcp_bw:
     bw        =  4.12 GB/sec
     msg_size  =   118 KB
     time      =    10 sec
CPU:
%serv_cpu  :  1.8 us, 54.6 sy,  0.0 ni, 42.9 id,  0.0 wa,  0.7 hi,  0.0 si,  0.0 st
%clnt_cpu  :  2.5 us, 40.0 sy,  0.0 ni, 57.1 id,  0.0 wa,  0.4 hi,  0.0 si,  0.0 st


- msgsize 117965

# smc_run taskset -c <serv_cpu> qperf
# smc_run taskset -c <clnt_cpu> qperf <serv_ip> -m 117965 -t 10 -vu tcp_bw
tcp_bw:
     bw        =  2.86 GB/sec
     msg_size  =   118 KB
     time      =    10 sec
CPU:
%serv_cpu  :  1.7 us, 30.0 sy,  0.0 ni, 68.0 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%clnt_cpu  :  1.0 us, 23.1 sy,  0.0 ni, 75.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

The bandwidth drop is obvious and the CPU utilization is relatively low.


# Analysis

I traced the copylen in smc_tx_sendmsg and smc_rx_recvmsg, and found that:

tracepoint:smc:smc_tx_sendmsg
{
         @tx_len = hist(args->len);
}

tracepoint:smc:smc_rx_recvmsg
{
         @rx_len = hist(args->len);
}

- msgsize 117964

@rx_len:
[4, 8)                 1 |                                                    |
[8, 16)            24184 |@@                                                  |
[16, 32)              40 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)         177963 |@@@@@@@@@@@@@@@@@@                                  |
[16K, 32K)        212122 |@@@@@@@@@@@@@@@@@@@@@@                              |
[32K, 64K)        490755 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64K, 128K)       111529 |@@@@@@@@@@@                                         |


@tx_len:
[8, 16)            30768 |@@@                                                 |
[16, 32)              61 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)         138977 |@@@@@@@@@@@@@@@                                     |
[16K, 32K)        265929 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[32K, 64K)        469183 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64K, 128K)       111082 |@@@@@@@@@@@@                                        |


- msgsize 117965

@rx_len:
[4, 8)                 1 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               2 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)         237415 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K)             0 |                                                    |
[32K, 64K)             0 |                                                    |
[64K, 128K)       237389 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |


@tx_len:
[16, 32)               2 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)             22 |                                                    |
[16K, 32K)        237357 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32K, 64K)            23 |                                                    |
[64K, 128K)       237390 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

According to the trace results, when msgsize is 117964 (good), the Tx or Rx data
size are distributed between 8K-128K, but when msgsize is 117965 (bad), the Tx or
Rx data size are concentrated in the two intervals [16K, 32K) and [64K, 128K).

To further see what's going on, I wrote a simple reproduce program (see server.c
and client.c in attachment, they transfer 10 specific size message and acts like
qperf tcp_bw) and added some debug information in SMC kernel
(see 0001-debug-record-cursor-update-process.patch in attachment).

Then run my simple reproducer and check the dmesg.

# smc_run ./server -p <serv_port> -m <msgsize: 117964 or 117965>
# smc_run ./client -i <serv_ip> -p <serv_port> -m <msgsize: 117964 or 117965>

According to the dmesg debug information (see dmesg.txt in attachment),
we can recover how data is produced and consumed in RMB when msgsize is 117964
or 117965 (see diagram.svg in attachment, it is made based on server-side
demsg information).

(Please check the above diagram and dmesg information first)

When msgsize is 117964 (good), each time the server consumes data in RMB, the
window size will be updated to client in time (cfed cursor in linux) because
the size of data consumed is always larger than 10% * RMB, and then new data
can be produced in RMB.

However, when msgsize is 117965 (bad), the window will only be updated when the
entire 117965 bytes data has been consumed since the first part of 117965 is
always no larger than 10% * RMB, which is 13107 (see diagram.svg for details).
This results in low utilization of RMB.

To verify this, I made a simple modification to the SMC kernel code, forcing the
receiver always updates the window size.
(However, this modification is far from a solution for the period drop issue,
SMC window can't be always updated for silly window syndrome avoidance.)

"
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index 9a2f3638d161..6a4d8041d6b5 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -61,7 +61,7 @@ static int smc_rx_update_consumer(struct smc_sock *smc,
  {
         struct smc_connection *conn = &smc->conn;
         struct sock *sk = &smc->sk;
-       bool force = false;
+       bool force = true;
         int diff, rc = 0;

         smc_curs_add(conn->rmb_desc->len, &cons, len);
"

Then run qperf test under message size of 117964 and 117965:

# smc_run taskset -c <cpu> qperf <ip> -m 117964 -t 10 -vu tcp_bw
tcp_bw:
     bw        =  4.13 GB/sec
     msg_size  =   118 KB
     time      =    10 sec
# smc_run taskset -c <cpu> qperf <ip> -m 117965 -t 10 -vu tcp_bw
tcp_bw:
     bw        =  4.18 GB/sec
     msg_size  =   118 KB
     time      =    10 sec

The performance drop disapper, and copylen tracing results under the two
msgsize are similar:

- msgsize 117964

@rx_len:
[4, 8)                 1 |                                                    |
[8, 16)             7521 |                                                    |
[16, 32)             422 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)         225144 |@@@@@@@@@@@@@@@@@@@@                                |
[16K, 32K)        311010 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
[32K, 64K)        571816 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64K, 128K)        62061 |@@@@@                                               |

- msgsize 117965

@rx_len:
[2, 4)             54360 |@@@@                                                |
[4, 8)              1025 |                                                    |
[8, 16)                2 |                                                    |
[16, 32)               2 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)         214217 |@@@@@@@@@@@@@@@@@@                                  |
[16K, 32K)        288575 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
[32K, 64K)        593644 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64K, 128K)        50187 |@@@@                                                |


Hope the above analysis can be of some help for you.

Thanks,
Wen Gu


>> The SMC-R source code revealed that __smc_buf_create / smc_compress_bufsize functions are in charge of computing the 
>> size of the RMB buffer and allocating either physical or virtual contiguous memory. We suspected that the throughput 
>> drops were caused by the size of this buffer being too small.
>>
>> We set out to determine whether there is a correlation between the drops and the size of the RMB buffer, and for that 
>> we set the size of the RMB buffer to 128KB, 256KB, 512KB, 1MB, 2MB, 4MB and 8MB and benchmarked the throughput for 
>> different message size ranges.
>>
>> The attached plot collates the benchmark results and shows that the period of the drops coincides with the size of the 
>> RMB buffer. Whilst increasing the size of the buffer seems to attenuate the throughput drops, we believe that the real 
>> root of the drops might lie somewhere else in the SMC-R code. We are suspecting that, for reasons unknown to us, the 
>> CDC messages that are sent after the RDMA WRITE operation are delayed in some circumstances.
>>
>> cheers,
>> Costin.
>>
>> PS. for the sake of brevity many details have been omitted on purpose but we'd be happy to provide them if need be, 
>> e.g. by default the RMB buffer size is capped to 512KB so we remove the cap and recompile the SMC module; we use 
>> alternative tools such as iperf and iperf 3 for benchmarking to dismiss the possibility of the drops to be tool 
>> specific; corking has been disabled; etc.

[-- Attachment #2: 0001-debug-record-cursor-update-process.patch --]
[-- Type: text/plain, Size: 2950 bytes --]

From 5a871abe32fb493b06b63f2ef75fd2686881f2f3 Mon Sep 17 00:00:00 2001
From: Wen Gu <guwen@linux.alibaba.com>
Date: Sun, 18 Feb 2024 14:51:37 +0800
Subject: [PATCH] debug: record cursor update process

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_cdc.c | 8 ++++++++
 net/smc/smc_rx.c  | 3 +++
 net/smc/smc_tx.c  | 6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index e938fe3bcc7c..da08c578546f 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -369,6 +369,10 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 
 			smc_tx_sndbuf_nonfull(smc);
 		}
+		pr_warn("%s: peer consumed %d, cons_cur {%d:%d}, peer_rmbe_space %d\n",
+			__func__, diff_cons, conn->local_rx_ctrl.cons.wrap,
+			conn->local_rx_ctrl.cons.count,
+			atomic_read(&conn->peer_rmbe_space));
 	}
 
 	diff_prod = smc_curs_diff(conn->rmb_desc->len, &prod_old,
@@ -381,6 +385,10 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 		atomic_add(diff_prod, &conn->bytes_to_rcv);
 		/* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
 		smp_mb__after_atomic();
+		pr_warn("%s: peer produced %d, prod_cur {%d:%d}, bytes_to_recv %d\n",
+			__func__, diff_prod, conn->local_rx_ctrl.prod.wrap,
+			conn->local_rx_ctrl.prod.count,
+			atomic_read(&conn->bytes_to_rcv));
 		smc->sk.sk_data_ready(&smc->sk);
 	} else {
 		if (conn->local_rx_ctrl.prod_flags.write_blocked)
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index 9a2f3638d161..cbb42a203370 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -89,6 +89,9 @@ static int smc_rx_update_consumer(struct smc_sock *smc,
 	}
 
 	smc_curs_copy(&conn->local_tx_ctrl.cons, &cons, conn);
+	pr_warn("%s: local consumed %ld B, cons_cur {%d:%d}, bytes_to_recv %d\n",
+		__func__, len, cons.wrap, cons.count,
+		atomic_read(&conn->bytes_to_rcv));
 
 	/* send consumer cursor update if required */
 	/* similar to advertising new TCP rcv_wnd if required */
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 214ac3cbcf9a..f290777d4a43 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -538,6 +538,10 @@ static int smc_tx_rdma_writes(struct smc_connection *conn,
 							/* dst: peer RMBE */
 	smc_curs_copy(&conn->tx_curs_sent, &sent, conn);/* src: local sndbuf */
 
+	pr_warn("%s: local produced %ld, prod {%d:%d}, peer_rmbe_space %d\n",
+		__func__, len, conn->local_tx_ctrl.prod.wrap,
+		conn->local_tx_ctrl.prod.count,
+		atomic_read(&conn->peer_rmbe_space));
 	return 0;
 }
 
@@ -713,6 +717,8 @@ void smc_tx_consumer_update(struct smc_connection *conn, bool force)
 		if (conn->killed ||
 		    conn->local_rx_ctrl.conn_state_flags.peer_conn_abort)
 			return;
+		pr_warn("%s: notify RMB wnd increase %d, cons_cur {%d:%d}\n",
+			__func__, to_confirm, cons.wrap, cons.count);
 		if ((smc_cdc_get_slot_and_msg_send(conn) < 0) &&
 		    !conn->killed) {
 			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
-- 
2.32.0.3.g01195cf9f


[-- Attachment #3: server.c --]
[-- Type: text/plain, Size: 2379 bytes --]

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <errno.h>
#include <stdbool.h>
#include <netinet/tcp.h>
#include <pthread.h>

#ifndef AF_SMC
#define AF_SMC		43
#endif
#define NET_PROTOCAL	AF_INET
#define SERV_IP		INADDR_ANY
#define SERV_PORT	10012

#define BUF_SIZE	(5 * 128 * 1024)

int stream_recv(int fd, char *buf, int msgsize)
{
	int n = msgsize;

	while (n) {
		int i = read(fd, buf, n);

		if (i < 0)
			return i;
		buf += i;
		n   -= i;
		if (i == 0)
			break;
		printf("Successfully recv %d B message\n", i);
	}
	return msgsize-n;
}
int net_serv(int port, int msgsize)
{
	int count, recv;

	if (!port)
		port = SERV_PORT;

	int l_sock = socket(NET_PROTOCAL, SOCK_STREAM, 0);

	struct sockaddr_in s_addr;
	memset(&s_addr, 0, sizeof(struct sockaddr_in));
	s_addr.sin_family = NET_PROTOCAL;
	s_addr.sin_addr.s_addr = SERV_IP;
	s_addr.sin_port = htons(port);
	// bind listen socket
	if (bind(l_sock, (struct sockaddr*)&s_addr, sizeof(s_addr))) {
		printf("bind listen socket error %d\n", errno);
		return 0;
	}
	// listen
	if (listen(l_sock, 20)) {
		printf("listen error\n");
		return 0;
	}

	struct sockaddr_in c_addr;
	socklen_t c_addr_len = sizeof(c_addr);
	int s_sock = accept(l_sock, (struct sockaddr*)&c_addr,
			    &c_addr_len);
	if (s_sock < 0) {
		printf("accept fail\n");
		return 0;
	} else {
		char ip[16] = { 0 };
		inet_ntop(NET_PROTOCAL, &(c_addr.sin_addr), ip, INET_ADDRSTRLEN);
		printf("accept connection: ip %s port %d\n",
			ip, c_addr.sin_port);
	}

	char *buf = (char *)malloc(sizeof(char) * BUF_SIZE);
	while (1) {
		if (msgsize > BUF_SIZE)
			break;
		printf("Recv msgsize: %d\n", msgsize);
		recv = stream_recv(s_sock, buf, msgsize);
		if (recv <= 0) {
			if (recv)
				printf("Error recv %d\n", recv);
			break;
		}
	}

	close(s_sock);
	close(l_sock);
	return 0;
}

int main(int argc, char **argv)
{
	bool wrong_param = false;
	int msgsize = BUF_SIZE;
	int port = 0;
	int c;
	while(!wrong_param &&
	      (-1 != (c = getopt(argc, argv, "p:m:")))) {
		switch (c) {
			case 'p':
				port = atoi(optarg);
				break;
			case 'm':
				msgsize = atoi(optarg);
				break;
			case '?':
				printf("usage: ./server -p <port> -m <msgsize>\n");
				wrong_param = true;
				break;
		}
	}
	if (!wrong_param)
		net_serv(port, msgsize);
	return 0;
}

[-- Attachment #4: client.c --]
[-- Type: text/plain, Size: 2048 bytes --]

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdbool.h>
#include <errno.h>
#include <netinet/tcp.h>

#ifndef AF_SMC
#define AF_SMC          43
#endif
#define NET_PROTOCAL    AF_INET
#define SERV_IP         "11.213.5.33"
#define SERV_PORT       10012

#define BUF_SIZE	(5 * 128 * 1024)

int stream_send(int fd, char *buf, int msgsize)
{
	int n = msgsize;

	while (n) {
		int i = write(fd, buf, n);

		if (i < 0)
			return i;
		buf += i;
		n   -= i;
		if (i == 0)
			break;
		printf("Successfully send %d B message\n", i);
	}
	return msgsize-n;
}

int net_clnt(char *ip, int port, int msgsize, int msgcnt)
{
	int count = 0, send;
	char c;

	if (!ip)
		ip = SERV_IP;
	if (!port)
		port = SERV_PORT;
	int sock = socket(NET_PROTOCAL, SOCK_STREAM, 0);

	struct sockaddr_in s_addr;
	memset(&s_addr, 0, sizeof(s_addr));
	s_addr.sin_family = NET_PROTOCAL;
	s_addr.sin_addr.s_addr = inet_addr(ip);
	s_addr.sin_port = htons(port);
	if (connect(sock, (struct sockaddr*)&s_addr, sizeof(s_addr))){
		printf("connect fail\n");
		return 0;
	}

	char *buf = (char *)malloc(sizeof(char) * BUF_SIZE);
	while (--msgcnt) {
		if (msgsize > BUF_SIZE)
			break;
		printf("Send msgsize: %d\n", msgsize);
		send = stream_send(sock, buf, msgsize);
		if (send <= 0) {
			printf("Error send %d\n", send);
			break;
		}
	}
	close(sock);
}
int main(int argc, char **argv){
	int msgsize = BUF_SIZE, msgcnt = 10;
	char *ip = NULL;
	bool wrong_param = false;
	int port = 0;
	int c;
	while(!wrong_param &&
	      (-1 != (c = getopt(argc, argv, "i:p:m:c:")))) {
		switch (c) {
			case 'i':
				ip = optarg;
				break;
			case 'p':
				port = atoi(optarg);
				break;
			case 'm':
				msgsize = atoi(optarg);
				break;
			case 'c':
				msgcnt = atoi(optarg);
				break;
			case '?':
				printf("usage: ./client -i <ip> -p <port> -m <msgsize> -c <cnt>\n");
				wrong_param = true;
				break;
		}
	}
	if (!wrong_param)
		net_clnt(ip, port, msgsize, msgcnt);
	return 0;
}

[-- Attachment #5: diagram.svg --]
[-- Type: image/svg+xml, Size: 534635 bytes --]

[-- Attachment #6: dmesg.txt --]
[-- Type: text/plain, Size: 13445 bytes --]


====== msgsize 117964 =======


server:

[  +0.001754] smc_cdc_msg_recv_action: peer produced 117964, prod_cur {0:117964}, bytes_to_recv 117964
[  +0.000850] smc_cdc_msg_recv_action: peer produced 13108, prod_cur {1:0}, bytes_to_recv 131072
[  +0.001020] smc_rx_update_consumer: local consumed 117964 B, cons_cur {0:117964}, bytes_to_recv 13108
[  +0.000791] smc_tx_consumer_update: notify RMB wnd increase 117964, cons_cur {0:117964}
[  +0.000713] smc_rx_update_consumer: local consumed 13108 B, cons_cur {1:0}, bytes_to_recv 0
[  +0.000760] smc_tx_consumer_update: notify RMB wnd increase 13108, cons_cur {1:0}
[  +0.000827] smc_cdc_msg_recv_action: peer produced 117964, prod_cur {1:117964}, bytes_to_recv 117964
[  +0.000847] smc_rx_update_consumer: local consumed 104856 B, cons_cur {1:104856}, bytes_to_recv 13108
[  +0.000630] smc_cdc_msg_recv_action: peer produced 13108, prod_cur {2:0}, bytes_to_recv 26216
[  +0.000203] smc_tx_consumer_update: notify RMB wnd increase 104856, cons_cur {1:104856}
[  +0.001445] smc_rx_update_consumer: local consumed 26216 B, cons_cur {2:0}, bytes_to_recv 0
[  +0.000733] smc_tx_consumer_update: notify RMB wnd increase 26216, cons_cur {2:0}
[  +0.000862] smc_cdc_msg_recv_action: peer produced 104856, prod_cur {2:104856}, bytes_to_recv 104856
[  +0.000845] smc_rx_update_consumer: local consumed 91748 B, cons_cur {2:91748}, bytes_to_recv 13108
[  +0.000718] smc_cdc_msg_recv_action: peer produced 26216, prod_cur {3:0}, bytes_to_recv 39324
[  +0.000190] smc_tx_consumer_update: notify RMB wnd increase 91748, cons_cur {2:91748}
[  +0.001414] smc_rx_update_consumer: local consumed 39324 B, cons_cur {3:0}, bytes_to_recv 0
[  +0.000763] smc_tx_consumer_update: notify RMB wnd increase 39324, cons_cur {3:0}
[  +0.000806] smc_cdc_msg_recv_action: peer produced 91748, prod_cur {3:91748}, bytes_to_recv 91748
[  +0.000832] smc_rx_update_consumer: local consumed 78640 B, cons_cur {3:78640}, bytes_to_recv 13108
[  +0.000662] smc_cdc_msg_recv_action: peer produced 39324, prod_cur {4:0}, bytes_to_recv 52432
[  +0.000170] smc_tx_consumer_update: notify RMB wnd increase 78640, cons_cur {3:78640}
[  +0.001397] smc_rx_update_consumer: local consumed 52432 B, cons_cur {4:0}, bytes_to_recv 0
[  +0.000753] smc_tx_consumer_update: notify RMB wnd increase 52432, cons_cur {4:0}
[  +0.000818] smc_cdc_msg_recv_action: peer produced 78640, prod_cur {4:78640}, bytes_to_recv 78640
[  +0.000821] smc_rx_update_consumer: local consumed 65532 B, cons_cur {4:65532}, bytes_to_recv 13108
[  +0.000662] smc_cdc_msg_recv_action: peer produced 52432, prod_cur {5:0}, bytes_to_recv 65540
[  +0.000155] smc_tx_consumer_update: notify RMB wnd increase 65532, cons_cur {4:65532}
[  +0.001402] smc_rx_update_consumer: local consumed 65540 B, cons_cur {5:0}, bytes_to_recv 0
[  +0.000754] smc_tx_consumer_update: notify RMB wnd increase 65540, cons_cur {5:0}
[  +0.000802] smc_cdc_msg_recv_action: peer produced 65532, prod_cur {5:65532}, bytes_to_recv 65532
[  +0.000855] smc_rx_update_consumer: local consumed 52424 B, cons_cur {5:52424}, bytes_to_recv 13108
[  +0.000651] smc_cdc_msg_recv_action: peer produced 65540, prod_cur {6:0}, bytes_to_recv 78648
[  +0.000179] smc_tx_consumer_update: notify RMB wnd increase 52424, cons_cur {5:52424}
[  +0.001512] smc_rx_update_consumer: local consumed 78648 B, cons_cur {6:0}, bytes_to_recv 0
[  +0.000051] smc_cdc_msg_recv_action: peer produced 52424, prod_cur {6:52424}, bytes_to_recv 52424
[  +0.000296] smc_tx_consumer_update: notify RMB wnd increase 78648, cons_cur {6:0}
[  +0.002754] smc_cdc_msg_recv_action: peer produced 78648, prod_cur {7:0}, bytes_to_recv 131072
[  +0.001050] smc_rx_update_consumer: local consumed 39316 B, cons_cur {6:39316}, bytes_to_recv 91756
[  +0.001071] smc_tx_consumer_update: notify RMB wnd increase 39316, cons_cur {6:39316}
[  +0.000994] smc_rx_update_consumer: local consumed 91756 B, cons_cur {7:0}, bytes_to_recv 0
[  +0.001046] smc_tx_consumer_update: notify RMB wnd increase 91756, cons_cur {7:0}
[  +0.000507] smc_cdc_msg_recv_action: peer produced 39316, prod_cur {7:39316}, bytes_to_recv 39316
[  +0.001617] smc_rx_update_consumer: local consumed 26208 B, cons_cur {7:26208}, bytes_to_recv 13108
[  +0.000314] smc_cdc_msg_recv_action: peer produced 91756, prod_cur {8:0}, bytes_to_recv 104864
[  +0.000144] smc_tx_consumer_update: notify RMB wnd increase 26208, cons_cur {7:26208}
[  +0.004014] smc_rx_update_consumer: local consumed 104864 B, cons_cur {8:0}, bytes_to_recv 0
[  +0.001536] smc_cdc_msg_recv_action: peer produced 13100, prod_cur {8:13100}, bytes_to_recv 13100
[  +0.000158] smc_tx_consumer_update: notify RMB wnd increase 104864, cons_cur {8:0}
[  +0.003356] smc_rx_update_consumer: local consumed 13100 B, cons_cur {8:13100}, bytes_to_recv 0


client:

[  +0.000996] smc_tx_rdma_writes: local produced 117964, prod {0:117964}, peer_rmbe_space 13108
[  +0.000835] smc_tx_rdma_writes: local produced 13108, prod {1:0}, peer_rmbe_space 0
[  +0.003357] smc_cdc_msg_recv_action: peer consumed 117964, cons_cur {0:117964}, peer_rmbe_space 117964
[  +0.000857] smc_tx_rdma_writes: local produced 117964, prod {1:117964}, peer_rmbe_space 0
[  +0.000734] smc_cdc_msg_recv_action: peer consumed 13108, cons_cur {1:0}, peer_rmbe_space 13108
[  +0.000789] smc_tx_rdma_writes: local produced 13108, prod {2:0}, peer_rmbe_space 0
[  +0.002335] smc_cdc_msg_recv_action: peer consumed 104856, cons_cur {1:104856}, peer_rmbe_space 104856
[  +0.000841] smc_tx_rdma_writes: local produced 104856, prod {2:104856}, peer_rmbe_space 0
[  +0.000754] smc_cdc_msg_recv_action: peer consumed 26216, cons_cur {2:0}, peer_rmbe_space 26216
[  +0.000792] smc_tx_rdma_writes: local produced 26216, prod {3:0}, peer_rmbe_space 0
[  +0.002378] smc_cdc_msg_recv_action: peer consumed 91748, cons_cur {2:91748}, peer_rmbe_space 91748
[  +0.000840] smc_tx_rdma_writes: local produced 91748, prod {3:91748}, peer_rmbe_space 0
[  +0.000726] smc_cdc_msg_recv_action: peer consumed 39324, cons_cur {3:0}, peer_rmbe_space 39324
[  +0.000777] smc_tx_rdma_writes: local produced 39324, prod {4:0}, peer_rmbe_space 0
[  +0.002283] smc_cdc_msg_recv_action: peer consumed 78640, cons_cur {3:78640}, peer_rmbe_space 78640
[  +0.000825] smc_tx_rdma_writes: local produced 78640, prod {4:78640}, peer_rmbe_space 0
[  +0.000747] smc_cdc_msg_recv_action: peer consumed 52432, cons_cur {4:0}, peer_rmbe_space 52432
[  +0.000793] smc_tx_rdma_writes: local produced 52432, prod {5:0}, peer_rmbe_space 0
[  +0.002246] smc_cdc_msg_recv_action: peer consumed 65532, cons_cur {4:65532}, peer_rmbe_space 65532
[  +0.000814] smc_tx_rdma_writes: local produced 65532, prod {5:65532}, peer_rmbe_space 0
[  +0.000743] smc_cdc_msg_recv_action: peer consumed 65540, cons_cur {5:0}, peer_rmbe_space 65540
[  +0.000800] smc_tx_rdma_writes: local produced 65540, prod {6:0}, peer_rmbe_space 0
[  +0.000907] smc_cdc_msg_recv_action: peer consumed 52424, cons_cur {5:52424}, peer_rmbe_space 52424
[  +0.000817] smc_tx_rdma_writes: local produced 52424, prod {6:52424}, peer_rmbe_space 0
[  +0.001039] smc_cdc_msg_recv_action: peer consumed 78648, cons_cur {6:0}, peer_rmbe_space 78648
[  +0.000776] smc_tx_rdma_writes: local produced 78648, prod {7:0}, peer_rmbe_space 0
[  +0.005074] smc_cdc_msg_recv_action: peer consumed 39316, cons_cur {6:39316}, peer_rmbe_space 39316
[  +0.000825] smc_tx_rdma_writes: local produced 39316, prod {7:39316}, peer_rmbe_space 0
[  +0.001213] smc_cdc_msg_recv_action: peer consumed 91756, cons_cur {7:0}, peer_rmbe_space 91756
[  +0.000770] smc_tx_rdma_writes: local produced 91756, prod {8:0}, peer_rmbe_space 0
[  +0.004822] smc_cdc_msg_recv_action: peer consumed 26208, cons_cur {7:26208}, peer_rmbe_space 26208
[  +0.000803] smc_tx_rdma_writes: local produced 13100, prod {8:13100}, peer_rmbe_space 13108
[  +0.003763] smc_cdc_msg_recv_action: peer consumed 104864, cons_cur {8:0}, peer_rmbe_space 117972
[  +0.001634] smc_cdc_msg_recv_action: peer consumed 13100, cons_cur {8:13100}, peer_rmbe_space 131072






====== msgsize 117965 =======


server:

[  +0.002255] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {0:117965}, bytes_to_recv 117965
[  +0.001229] smc_rx_update_consumer: local consumed 117965 B, cons_cur {0:117965}, bytes_to_recv 0
[  +0.000175] smc_cdc_msg_recv_action: peer produced 13107, prod_cur {1:0}, bytes_to_recv 13107
[  +0.000155] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {0:117965}
[  +0.002855] smc_rx_update_consumer: local consumed 13107 B, cons_cur {1:0}, bytes_to_recv 0
[  +0.002232] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {1:117965}, bytes_to_recv 117965
[  +0.001161] smc_rx_update_consumer: local consumed 104858 B, cons_cur {1:104858}, bytes_to_recv 13107
[  +0.001112] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {1:104858}
[  +0.001039] smc_rx_update_consumer: local consumed 13107 B, cons_cur {1:117965}, bytes_to_recv 0
[  +0.002270] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {2:104858}, bytes_to_recv 117965
[  +0.001156] smc_rx_update_consumer: local consumed 104858 B, cons_cur {2:91751}, bytes_to_recv 13107
[  +0.001088] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {2:91751}
[  +0.001365] smc_rx_update_consumer: local consumed 13107 B, cons_cur {2:104858}, bytes_to_recv 0
[  +0.001906] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {3:91751}, bytes_to_recv 117965
[  +0.001136] smc_rx_update_consumer: local consumed 104858 B, cons_cur {3:78644}, bytes_to_recv 13107
[  +0.001109] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {3:78644}
[  +0.001388] smc_rx_update_consumer: local consumed 13107 B, cons_cur {3:91751}, bytes_to_recv 0
[  +0.001952] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {4:78644}, bytes_to_recv 117965
[  +0.001127] smc_rx_update_consumer: local consumed 104858 B, cons_cur {4:65537}, bytes_to_recv 13107
[  +0.001097] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {4:65537}
[  +0.001366] smc_rx_update_consumer: local consumed 13107 B, cons_cur {4:78644}, bytes_to_recv 0
[  +0.001876] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {5:65537}, bytes_to_recv 117965
[  +0.001135] smc_rx_update_consumer: local consumed 104858 B, cons_cur {5:52430}, bytes_to_recv 13107
[  +0.001133] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {5:52430}
[  +0.001376] smc_rx_update_consumer: local consumed 13107 B, cons_cur {5:65537}, bytes_to_recv 0
[  +0.001944] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {6:52430}, bytes_to_recv 117965
[  +0.001149] smc_rx_update_consumer: local consumed 104858 B, cons_cur {6:39323}, bytes_to_recv 13107
[  +0.001131] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {6:39323}
[  +0.001315] smc_rx_update_consumer: local consumed 13107 B, cons_cur {6:52430}, bytes_to_recv 0
[  +0.001906] smc_cdc_msg_recv_action: peer produced 117965, prod_cur {7:39323}, bytes_to_recv 117965
[  +0.001167] smc_rx_update_consumer: local consumed 104858 B, cons_cur {7:26216}, bytes_to_recv 13107
[  +0.001108] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {7:26216}
[  +0.001015] smc_rx_update_consumer: local consumed 13107 B, cons_cur {7:39323}, bytes_to_recv 0
[  +0.002255] smc_cdc_msg_recv_action: peer produced 104858, prod_cur {8:13109}, bytes_to_recv 104858
[  +0.001140] smc_rx_update_consumer: local consumed 104858 B, cons_cur {8:13109}, bytes_to_recv 0
[  +0.001079] smc_tx_consumer_update: notify RMB wnd increase 117965, cons_cur {8:13109}


client:

[  +0.001136] smc_tx_rdma_writes: local produced 117965, prod {0:117965}, peer_rmbe_space 13107
[  +0.001481] smc_tx_rdma_writes: local produced 13107, prod {1:0}, peer_rmbe_space 0
[  +0.004065] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {0:117965}, peer_rmbe_space 117965
[  +0.001183] smc_tx_rdma_writes: local produced 117965, prod {1:117965}, peer_rmbe_space 0
[  +0.004355] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {1:104858}, peer_rmbe_space 117965
[  +0.001205] smc_tx_rdma_writes: local produced 117965, prod {2:104858}, peer_rmbe_space 0
[  +0.004334] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {2:91751}, peer_rmbe_space 117965
[  +0.001227] smc_tx_rdma_writes: local produced 117965, prod {3:91751}, peer_rmbe_space 0
[  +0.004303] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {3:78644}, peer_rmbe_space 117965
[  +0.001175] smc_tx_rdma_writes: local produced 117965, prod {4:78644}, peer_rmbe_space 0
[  +0.004368] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {4:65537}, peer_rmbe_space 117965
[  +0.001196] smc_tx_rdma_writes: local produced 117965, prod {5:65537}, peer_rmbe_space 0
[  +0.004322] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {5:52430}, peer_rmbe_space 117965
[  +0.001165] smc_tx_rdma_writes: local produced 117965, prod {6:52430}, peer_rmbe_space 0
[  +0.004455] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {6:39323}, peer_rmbe_space 117965
[  +0.001159] smc_tx_rdma_writes: local produced 117965, prod {7:39323}, peer_rmbe_space 0
[  +0.004317] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {7:26216}, peer_rmbe_space 117965
[  +0.001195] smc_tx_rdma_writes: local produced 104858, prod {8:13109}, peer_rmbe_space 13107
[  +0.004305] smc_cdc_msg_recv_action: peer consumed 117965, cons_cur {8:13109}, peer_rmbe_space 131072



  reply	other threads:[~2024-02-19  8:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 13:50 SMC-R throughput drops for specific message sizes Iordache Costin (XC-AS/EAE-UK)
2024-02-05  3:50 ` Wen Gu
2024-02-19  8:44   ` Wen Gu [this message]
2024-02-27 11:28     ` Iordache Costin (XC-AS/EAE-UK)
     [not found] <GV2PR10MB8037B30A9D2CE67F267D5E61BB3B2@GV2PR10MB8037.EURPRD10.PROD.OUTLOOK.COM>
     [not found] ` <GV2PR10MB80376BEB9EE8E03F98CC86A1BB3B2@GV2PR10MB8037.EURPRD10.PROD.OUTLOOK.COM>
2024-03-28 12:18   ` Goerlitz Andreas (SO/PAF1-Mb)
  -- strict thread matches above, loose matches on Subject: below --
2024-02-01 10:30 Iordache Costin (XC-AS/EAE-UK)
2023-12-13 15:52 Nikolaou Alexandros (SO/PAF1-Mb)
     [not found] <PAWPR10MB72701758A24DD8DF8063BEE6C081A@PAWPR10MB7270.EURPRD10.PROD.OUTLOOK.COM>
     [not found] ` <ccc03f00-02ee-4af6-8e57-b6de3bc019be@linux.ibm.com>
     [not found]   ` <PAWPR10MB7270731C91544AEF25E0A33CC084A@PAWPR10MB7270.EURPRD10.PROD.OUTLOOK.COM>
2023-12-13 12:17     ` Gerd Bayer
2023-12-13 13:38       ` Wen Gu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9bf102b-a084-4c9a-8a19-1d884910912c@linux.alibaba.com \
    --to=guwen@linux.alibaba.com \
    --cc=Costin.Iordache2@bosch.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=gbayer@linux.ibm.com \
    --cc=jaka@linux.ibm.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=niho@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).