All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc
@ 2022-05-16  5:51 Guangguan Wang
  2022-05-16  5:51 ` [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space Guangguan Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel

Send cdc msgs and write data inline if qp has sufficent inline
space, helps latency reducing. 

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      10.6 us (-1.3 us)
2B          11.7 us      10.7 us (-1.0 us)
4B          11.7 us      10.7 us (-1.0 us)
8B          11.6 us      10.6 us (-1.0 us)
16B         11.7 us      10.7 us (-1.0 us)
32B         11.7 us      10.6 us (-1.1 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.3 us (-0.5 us)
1KB         11.9 us      11.5 us (-0.4 us)
2KB         12.1 us      11.5 us (-0.6 us)

Guangguan Wang (2):
  net/smc: send cdc msg inline if qp has sufficient inline space
  net/smc: rdma write inline if qp has sufficient inline space

 net/smc/smc_ib.c |  1 +
 net/smc/smc_tx.c | 17 ++++++++++++-----
 net/smc/smc_wr.c |  5 ++++-
 3 files changed, 17 insertions(+), 6 deletions(-)

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space
  2022-05-16  5:51 [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Guangguan Wang
@ 2022-05-16  5:51 ` Guangguan Wang
  2022-05-16  5:51 ` [PATCH net-next v3 2/2] net/smc: rdma write " Guangguan Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel, kernel test robot

As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      11.2 us (-0.7 us)
2B          11.7 us      11.2 us (-0.5 us)
4B          11.7 us      11.3 us (-0.4 us)
8B          11.6 us      11.2 us (-0.4 us)
16B         11.7 us      11.3 us (-0.4 us)
32B         11.7 us      11.3 us (-0.4 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.4 us (-0.4 us)
1KB         11.9 us      11.4 us (-0.5 us)
2KB         12.1 us      11.5 us (-0.6 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
---
 net/smc/smc_ib.c | 1 +
 net/smc/smc_wr.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index a3e2d3b89568..dcda4165d107 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -671,6 +671,7 @@ int smc_ib_create_queue_pair(struct smc_link *lnk)
 			.max_recv_wr = SMC_WR_BUF_CNT * 3,
 			.max_send_sge = SMC_IB_MAX_SEND_SGE,
 			.max_recv_sge = sges_per_buf,
+			.max_inline_data = 0,
 		},
 		.sq_sig_type = IB_SIGNAL_REQ_WR,
 		.qp_type = IB_QPT_RC,
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 24be1d03fef9..26f8f240d9e8 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -554,10 +554,11 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk)
 static void smc_wr_init_sge(struct smc_link *lnk)
 {
 	int sges_per_buf = (lnk->lgr->smc_version == SMC_V2) ? 2 : 1;
+	bool send_inline = (lnk->qp_attr.cap.max_inline_data > SMC_WR_TX_SIZE);
 	u32 i;
 
 	for (i = 0; i < lnk->wr_tx_cnt; i++) {
-		lnk->wr_tx_sges[i].addr =
+		lnk->wr_tx_sges[i].addr = send_inline ? (uintptr_t)(&lnk->wr_tx_bufs[i]) :
 			lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
 		lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
 		lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
@@ -575,6 +576,8 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 		lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
 		lnk->wr_tx_ibs[i].send_flags =
 			IB_SEND_SIGNALED | IB_SEND_SOLICITED;
+		if (send_inline)
+			lnk->wr_tx_ibs[i].send_flags |= IB_SEND_INLINE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[1].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.sg_list =
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next v3 2/2] net/smc: rdma write inline if qp has sufficient inline space
  2022-05-16  5:51 [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Guangguan Wang
  2022-05-16  5:51 ` [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space Guangguan Wang
@ 2022-05-16  5:51 ` Guangguan Wang
  2022-05-16 16:58 ` [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Karsten Graul
  2022-05-18  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Guangguan Wang @ 2022-05-16  5:51 UTC (permalink / raw)
  To: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel, kernel test robot

Rdma write with inline flag when sending small packages,
whose length is shorter than the qp's max_inline_data, can
help reducing latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.5us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.2 us      10.6 us (-0.6 us)
2B          11.2 us      10.7 us (-0.5 us)
4B          11.3 us      10.7 us (-0.6 us)
8B          11.2 us      10.6 us (-0.6 us)
16B         11.3 us      10.7 us (-0.6 us)
32B         11.3 us      10.6 us (-0.7 us)
64B         11.2 us      11.2 us (0 us)
128B        11.2 us      11.2 us (0 us)
256B        11.2 us      11.2 us (0 us)
512B        11.4 us      11.3 us (-0.1 us)
1KB         11.4 us      11.5 us (0.1 us)
2KB         11.5 us      11.5 us (0 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
---
 net/smc/smc_tx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 98ca9229fe87..805a546e8c04 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -391,12 +391,20 @@ static int smcr_tx_rdma_writes(struct smc_connection *conn, size_t len,
 	int rc;
 
 	for (dstchunk = 0; dstchunk < 2; dstchunk++) {
-		struct ib_sge *sge =
-			wr_rdma_buf->wr_tx_rdma[dstchunk].wr.sg_list;
+		struct ib_rdma_wr *wr = &wr_rdma_buf->wr_tx_rdma[dstchunk];
+		struct ib_sge *sge = wr->wr.sg_list;
+		u64 base_addr = dma_addr;
+
+		if (dst_len < link->qp_attr.cap.max_inline_data) {
+			base_addr = (uintptr_t)conn->sndbuf_desc->cpu_addr;
+			wr->wr.send_flags |= IB_SEND_INLINE;
+		} else {
+			wr->wr.send_flags &= ~IB_SEND_INLINE;
+		}
 
 		num_sges = 0;
 		for (srcchunk = 0; srcchunk < 2; srcchunk++) {
-			sge[srcchunk].addr = dma_addr + src_off;
+			sge[srcchunk].addr = base_addr + src_off;
 			sge[srcchunk].length = src_len;
 			num_sges++;
 
@@ -410,8 +418,7 @@ static int smcr_tx_rdma_writes(struct smc_connection *conn, size_t len,
 			src_len = dst_len - src_len; /* remainder */
 			src_len_sum += src_len;
 		}
-		rc = smc_tx_rdma_write(conn, dst_off, num_sges,
-				       &wr_rdma_buf->wr_tx_rdma[dstchunk]);
+		rc = smc_tx_rdma_write(conn, dst_off, num_sges, wr);
 		if (rc)
 			return rc;
 		if (dst_len_sum == len)
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc
  2022-05-16  5:51 [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Guangguan Wang
  2022-05-16  5:51 ` [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space Guangguan Wang
  2022-05-16  5:51 ` [PATCH net-next v3 2/2] net/smc: rdma write " Guangguan Wang
@ 2022-05-16 16:58 ` Karsten Graul
  2022-05-18  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Karsten Graul @ 2022-05-16 16:58 UTC (permalink / raw)
  To: Guangguan Wang, davem, edumazet, kuba, pabeni, leon, tonylu
  Cc: linux-s390, netdev, linux-kernel

On 16/05/2022 07:51, Guangguan Wang wrote:
> Send cdc msgs and write data inline if qp has sufficent inline
> space, helps latency reducing. 
> 
> In my test environment, which are 2 VMs running on the same
> physical host and whose NICs(ConnectX-4Lx) are working on
> SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.
> 
> Test command:
> server: smc_run taskset -c 1 qperf
> client: smc_run taskset -c 1 qperf <server ip> -oo \
> 		msg_size:1:2K:*2 -t 30 -vu tcp_lat
> 
> The results shown below:
> msgsize     before       after
> 1B          11.9 us      10.6 us (-1.3 us)
> 2B          11.7 us      10.7 us (-1.0 us)
> 4B          11.7 us      10.7 us (-1.0 us)
> 8B          11.6 us      10.6 us (-1.0 us)
> 16B         11.7 us      10.7 us (-1.0 us)
> 32B         11.7 us      10.6 us (-1.1 us)
> 64B         11.7 us      11.2 us (-0.5 us)
> 128B        11.6 us      11.2 us (-0.4 us)
> 256B        11.8 us      11.2 us (-0.6 us)
> 512B        11.8 us      11.3 us (-0.5 us)
> 1KB         11.9 us      11.5 us (-0.4 us)
> 2KB         12.1 us      11.5 us (-0.6 us)
> 
> Guangguan Wang (2):
>   net/smc: send cdc msg inline if qp has sufficient inline space
>   net/smc: rdma write inline if qp has sufficient inline space
> 
>  net/smc/smc_ib.c |  1 +
>  net/smc/smc_tx.c | 17 ++++++++++++-----
>  net/smc/smc_wr.c |  5 ++++-
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 

I like that change, thank you!

For the series:

Acked-by: Karsten Graul <kgraul@linux.ibm.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc
  2022-05-16  5:51 [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Guangguan Wang
                   ` (2 preceding siblings ...)
  2022-05-16 16:58 ` [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Karsten Graul
@ 2022-05-18  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-18  0:40 UTC (permalink / raw)
  To: Guangguan Wang
  Cc: kgraul, davem, edumazet, kuba, pabeni, leon, tonylu, linux-s390,
	netdev, linux-kernel

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 16 May 2022 13:51:35 +0800 you wrote:
> Send cdc msgs and write data inline if qp has sufficent inline
> space, helps latency reducing.
> 
> In my test environment, which are 2 VMs running on the same
> physical host and whose NICs(ConnectX-4Lx) are working on
> SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.
> 
> [...]

Here is the summary with links:
  - [net-next,v3,1/2] net/smc: send cdc msg inline if qp has sufficient inline space
    https://git.kernel.org/netdev/net-next/c/b632eb069732
  - [net-next,v3,2/2] net/smc: rdma write inline if qp has sufficient inline space
    https://git.kernel.org/netdev/net-next/c/793a7df63071

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-18  0:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  5:51 [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Guangguan Wang
2022-05-16  5:51 ` [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has sufficient inline space Guangguan Wang
2022-05-16  5:51 ` [PATCH net-next v3 2/2] net/smc: rdma write " Guangguan Wang
2022-05-16 16:58 ` [PATCH net-next v3 0/2] net/smc: send and write inline optimization for smc Karsten Graul
2022-05-18  0:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.