All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile()
@ 2022-01-30 18:02 Tony Lu
  2022-01-30 18:02 ` [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared Tony Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Tony Lu @ 2022-01-30 18:02 UTC (permalink / raw)
  To: kgraul, kuba, davem; +Cc: netdev, linux-s390

Currently, SMC use default implement for syscall sendfile() [1], which
is wildly used in nginx and big data sences. Usually, applications use
sendfile() with TCP_CORK:

fstat(20, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
setsockopt(19, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(19, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=240}], 1) = 240
sendfile(19, 20, [0] => [4096], 4096)   = 4096
close(20)                               = 0
setsockopt(19, SOL_TCP, TCP_CORK, [0], 4) = 0

The above is an example of Nginx, when sendfile() on, Nginx first
enables TCP_CORK, write headers, the data will not be sent. Then call
sendfile(), it reads file and write to sndbuf. When TCP_CORK is cleared,
all pending data is sent out.

The performance of the default implement of sendfile is lower than when
it is off. After investigation, it shows two parts to improve:
- unnecessary lock contention of delayed work
- less data per send than when sendfile off

Patch #1 tries to reduce lock_sock() contention in smc_tx_work().
Patch #2 removes timed work for corking, and let applications control
it. See TCP_CORK [2] MSG_MORE [3].
Patch #3 adds MSG_SENDPAGE_NOTLAST for corking more data when
sendfile().

Test environments:
- CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4
- socket sndbuf / rcvbuf: 16384 / 131072 bytes
- server: smc_run nginx
- client: smc_run ./wrk -c 100 -t 2 -d 30 http://192.168.100.1:8080/4k.html
- payload: 4KB local disk file

Items                     QPS
sendfile off        272477.10
sendfile on (orig)  223622.79
sendfile on (this)  395847.21

This benchmark shows +45.28% improvement compared with sendfile off, and
+77.02% compared with original sendfile implement.

[1] https://man7.org/linux/man-pages/man2/sendfile.2.html
[2] https://linux.die.net/man/7/tcp
[3] https://man7.org/linux/man-pages/man2/send.2.html

Tony Lu (3):
  net/smc: Send directly when TCP_CORK is cleared
  net/smc: Remove corked dealyed work
  net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag

 net/smc/af_smc.c |  8 ++++---
 net/smc/smc_tx.c | 59 ++++++++++++++++++++++++++++++++----------------
 net/smc/smc_tx.h |  3 +++
 3 files changed, 47 insertions(+), 23 deletions(-)

-- 
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared
  2022-01-30 18:02 [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Tony Lu
@ 2022-01-30 18:02 ` Tony Lu
  2022-01-31 19:13   ` Stefan Raspl
  2022-01-30 18:02 ` [PATCH net-next 2/3] net/smc: Remove corked dealyed work Tony Lu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-01-30 18:02 UTC (permalink / raw)
  To: kgraul, kuba, davem; +Cc: netdev, linux-s390

According to the man page of TCP_CORK [1], if set, don't send out
partial frames. All queued partial frames are sent when option is
cleared again.

When applications call setsockopt to disable TCP_CORK, this call is
protected by lock_sock(), and tries to mod_delayed_work() to 0, in order
to send pending data right now. However, the delayed work smc_tx_work is
also protected by lock_sock(). There introduces lock contention for
sending data.

To fix it, send pending data directly which acts like TCP, without
lock_sock() protected in the context of setsockopt (already lock_sock()ed),
and cancel unnecessary dealyed work, which is protected by lock.

[1] https://linux.die.net/man/7/tcp

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/af_smc.c |  4 ++--
 net/smc/smc_tx.c | 25 +++++++++++++++----------
 net/smc/smc_tx.h |  1 +
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ffab9cee747d..ef021ec6b361 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2600,8 +2600,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		    sk->sk_state != SMC_CLOSED) {
 			if (!val) {
 				SMC_STAT_INC(smc, cork_cnt);
-				mod_delayed_work(smc->conn.lgr->tx_wq,
-						 &smc->conn.tx_work, 0);
+				smc_tx_pending(&smc->conn);
+				cancel_delayed_work(&smc->conn.tx_work);
 			}
 		}
 		break;
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index be241d53020f..7b0b6e24582f 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -597,27 +597,32 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
 	return rc;
 }
 
-/* Wakeup sndbuf consumers from process context
- * since there is more data to transmit
- */
-void smc_tx_work(struct work_struct *work)
+void smc_tx_pending(struct smc_connection *conn)
 {
-	struct smc_connection *conn = container_of(to_delayed_work(work),
-						   struct smc_connection,
-						   tx_work);
 	struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
 	int rc;
 
-	lock_sock(&smc->sk);
 	if (smc->sk.sk_err)
-		goto out;
+		return;
 
 	rc = smc_tx_sndbuf_nonempty(conn);
 	if (!rc && conn->local_rx_ctrl.prod_flags.write_blocked &&
 	    !atomic_read(&conn->bytes_to_rcv))
 		conn->local_rx_ctrl.prod_flags.write_blocked = 0;
+}
+
+/* Wakeup sndbuf consumers from process context
+ * since there is more data to transmit
+ */
+void smc_tx_work(struct work_struct *work)
+{
+	struct smc_connection *conn = container_of(to_delayed_work(work),
+						   struct smc_connection,
+						   tx_work);
+	struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
 
-out:
+	lock_sock(&smc->sk);
+	smc_tx_pending(conn);
 	release_sock(&smc->sk);
 }
 
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
index 07e6ad76224a..a59f370b8b43 100644
--- a/net/smc/smc_tx.h
+++ b/net/smc/smc_tx.h
@@ -27,6 +27,7 @@ static inline int smc_tx_prepared_sends(struct smc_connection *conn)
 	return smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep);
 }
 
+void smc_tx_pending(struct smc_connection *conn);
 void smc_tx_work(struct work_struct *work);
 void smc_tx_init(struct smc_sock *smc);
 int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/3] net/smc: Remove corked dealyed work
  2022-01-30 18:02 [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Tony Lu
  2022-01-30 18:02 ` [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared Tony Lu
@ 2022-01-30 18:02 ` Tony Lu
  2022-01-31 19:40   ` Stefan Raspl
  2022-01-30 18:02 ` [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag Tony Lu
  2022-01-31 19:42 ` [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Jakub Kicinski
  3 siblings, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-01-30 18:02 UTC (permalink / raw)
  To: kgraul, kuba, davem; +Cc: netdev, linux-s390

Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
have the same effect. Applications can set these options and informs the
kernel to pend the data, and send them out only when the socket or
syscall does not specify this flag. In other words, there's no need to
send data out by a delayed work, which will queue a lot of work.

This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
applications control how/when to send them out. It improves the
performance for sendfile and throughput, and remove unnecessary race of
lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
it up before sending.

[1] https://linux.die.net/man/7/tcp
[2] https://man7.org/linux/man-pages/man2/send.2.html

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/smc_tx.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 7b0b6e24582f..9cec62cae7cb 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -31,7 +31,6 @@
 #include "smc_tracepoint.h"
 
 #define SMC_TX_WORK_DELAY	0
-#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
 
 /***************************** sndbuf producer *******************************/
 
@@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
 			conn->urg_tx_pend = true;
 		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
-		    (atomic_read(&conn->sndbuf_space) >
-						(conn->sndbuf_desc->len >> 1)))
-			/* for a corked socket defer the RDMA writes if there
-			 * is still sufficient sndbuf_space available
+		    (atomic_read(&conn->sndbuf_space)))
+			/* for a corked socket defer the RDMA writes if
+			 * sndbuf_space is still available. The applications
+			 * should known how/when to uncork it.
 			 */
-			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
-					   SMC_TX_CORK_DELAY);
-		else
-			smc_tx_sndbuf_nonempty(conn);
+			continue;
+		smc_tx_sndbuf_nonempty(conn);
 
 		trace_smc_tx_sendmsg(smc, copylen);
 	} /* while (msg_data_left(msg)) */
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag
  2022-01-30 18:02 [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Tony Lu
  2022-01-30 18:02 ` [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared Tony Lu
  2022-01-30 18:02 ` [PATCH net-next 2/3] net/smc: Remove corked dealyed work Tony Lu
@ 2022-01-30 18:02 ` Tony Lu
  2022-01-31 19:46   ` Stefan Raspl
  2022-01-31 19:42 ` [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Jakub Kicinski
  3 siblings, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-01-30 18:02 UTC (permalink / raw)
  To: kgraul, kuba, davem; +Cc: netdev, linux-s390

This introduces a new corked flag, MSG_SENDPAGE_NOTLAST, which is
involved in syscall sendfile() [1], it indicates this is not the last
page. So we can cork the data until the page is not specify this flag.
It has the same effect as MSG_MORE, but existed in sendfile() only.

This patch adds a option MSG_SENDPAGE_NOTLAST for corking data, try to
cork more data before sending when using sendfile(), which acts like
TCP's behaviour. Also, this reimplements the default sendpage to inform
that it is supported to some extent.

[1] https://man7.org/linux/man-pages/man2/sendfile.2.html

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/af_smc.c |  4 +++-
 net/smc/smc_tx.c | 19 ++++++++++++++++++-
 net/smc/smc_tx.h |  2 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ef021ec6b361..8b78010afe01 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2729,8 +2729,10 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
 		rc = kernel_sendpage(smc->clcsock, page, offset,
 				     size, flags);
 	} else {
+		lock_sock(sk);
+		rc = smc_tx_sendpage(smc, page, offset, size, flags);
+		release_sock(sk);
 		SMC_STAT_INC(smc, sendpage_cnt);
-		rc = sock_no_sendpage(sock, page, offset, size, flags);
 	}
 
 out:
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 9cec62cae7cb..a96ce162825e 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -235,7 +235,8 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 		 */
 		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
 			conn->urg_tx_pend = true;
-		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
+		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc) ||
+		     msg->msg_flags & MSG_SENDPAGE_NOTLAST) &&
 		    (atomic_read(&conn->sndbuf_space)))
 			/* for a corked socket defer the RDMA writes if
 			 * sndbuf_space is still available. The applications
@@ -257,6 +258,22 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 	return rc;
 }
 
+int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
+		    size_t size, int flags)
+{
+	struct msghdr msg = {.msg_flags = flags};
+	char *kaddr = kmap(page);
+	struct kvec iov;
+	int rc;
+
+	iov.iov_base = kaddr + offset;
+	iov.iov_len = size;
+	iov_iter_kvec(&msg.msg_iter, WRITE, &iov, 1, size);
+	rc = smc_tx_sendmsg(smc, &msg, size);
+	kunmap(page);
+	return rc;
+}
+
 /***************************** sndbuf consumer *******************************/
 
 /* sndbuf consumer: actual data transfer of one target chunk with ISM write */
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
index a59f370b8b43..34b578498b1f 100644
--- a/net/smc/smc_tx.h
+++ b/net/smc/smc_tx.h
@@ -31,6 +31,8 @@ void smc_tx_pending(struct smc_connection *conn);
 void smc_tx_work(struct work_struct *work);
 void smc_tx_init(struct smc_sock *smc);
 int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
+int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
+		    size_t size, int flags);
 int smc_tx_sndbuf_nonempty(struct smc_connection *conn);
 void smc_tx_sndbuf_nonfull(struct smc_sock *smc);
 void smc_tx_consumer_update(struct smc_connection *conn, bool force);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared
  2022-01-30 18:02 ` [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared Tony Lu
@ 2022-01-31 19:13   ` Stefan Raspl
  2022-02-07 10:03     ` Tony Lu
  2022-02-11  6:52     ` [PATCH net-next] net/smc: Add comment for smc_tx_pending Tony Lu
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Raspl @ 2022-01-31 19:13 UTC (permalink / raw)
  To: Tony Lu, kgraul, kuba, davem; +Cc: netdev, linux-s390

On 1/30/22 19:02, Tony Lu wrote:
> According to the man page of TCP_CORK [1], if set, don't send out
> partial frames. All queued partial frames are sent when option is
> cleared again.
> 
> When applications call setsockopt to disable TCP_CORK, this call is
> protected by lock_sock(), and tries to mod_delayed_work() to 0, in order
> to send pending data right now. However, the delayed work smc_tx_work is
> also protected by lock_sock(). There introduces lock contention for
> sending data.
> 
> To fix it, send pending data directly which acts like TCP, without
> lock_sock() protected in the context of setsockopt (already lock_sock()ed),
> and cancel unnecessary dealyed work, which is protected by lock.
> 
> [1] https://linux.die.net/man/7/tcp
> 
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> ---
>   net/smc/af_smc.c |  4 ++--
>   net/smc/smc_tx.c | 25 +++++++++++++++----------
>   net/smc/smc_tx.h |  1 +
>   3 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index ffab9cee747d..ef021ec6b361 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2600,8 +2600,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
>   		    sk->sk_state != SMC_CLOSED) {
>   			if (!val) {
>   				SMC_STAT_INC(smc, cork_cnt);
> -				mod_delayed_work(smc->conn.lgr->tx_wq,
> -						 &smc->conn.tx_work, 0);
> +				smc_tx_pending(&smc->conn);
> +				cancel_delayed_work(&smc->conn.tx_work);
>   			}
>   		}
>   		break;
> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> index be241d53020f..7b0b6e24582f 100644
> --- a/net/smc/smc_tx.c
> +++ b/net/smc/smc_tx.c
> @@ -597,27 +597,32 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
>   	return rc;
>   }
>   
> -/* Wakeup sndbuf consumers from process context
> - * since there is more data to transmit
> - */
> -void smc_tx_work(struct work_struct *work)
> +void smc_tx_pending(struct smc_connection *conn)

Could you add a comment that we're expecting lock_sock() to be held when calling 
this function?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/3] net/smc: Remove corked dealyed work
  2022-01-30 18:02 ` [PATCH net-next 2/3] net/smc: Remove corked dealyed work Tony Lu
@ 2022-01-31 19:40   ` Stefan Raspl
  2022-02-11  9:10     ` Tony Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Raspl @ 2022-01-31 19:40 UTC (permalink / raw)
  To: Tony Lu, kgraul, kuba, davem; +Cc: netdev, linux-s390

On 1/30/22 19:02, Tony Lu wrote:
> Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> have the same effect. Applications can set these options and informs the
> kernel to pend the data, and send them out only when the socket or
> syscall does not specify this flag. In other words, there's no need to
> send data out by a delayed work, which will queue a lot of work.
> 
> This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> applications control how/when to send them out. It improves the
> performance for sendfile and throughput, and remove unnecessary race of
> lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> it up before sending.
> 
> [1] https://linux.die.net/man/7/tcp
> [2] https://man7.org/linux/man-pages/man2/send.2.html
> 
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> ---
>   net/smc/smc_tx.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> index 7b0b6e24582f..9cec62cae7cb 100644
> --- a/net/smc/smc_tx.c
> +++ b/net/smc/smc_tx.c
> @@ -31,7 +31,6 @@
>   #include "smc_tracepoint.h"
>   
>   #define SMC_TX_WORK_DELAY	0
> -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
>   
>   /***************************** sndbuf producer *******************************/
>   
> @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>   		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
>   			conn->urg_tx_pend = true;
>   		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> -		    (atomic_read(&conn->sndbuf_space) >
> -						(conn->sndbuf_desc->len >> 1)))
> -			/* for a corked socket defer the RDMA writes if there
> -			 * is still sufficient sndbuf_space available
> +		    (atomic_read(&conn->sndbuf_space)))
> +			/* for a corked socket defer the RDMA writes if
> +			 * sndbuf_space is still available. The applications
> +			 * should known how/when to uncork it.
>   			 */
> -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> -					   SMC_TX_CORK_DELAY);
> -		else
> -			smc_tx_sndbuf_nonempty(conn);
> +			continue;

In case we just corked the final bytes in this call, wouldn't this 'continue' 
prevent us from accounting the Bytes that we just staged to be sent out later in 
the trace_smc_tx_sendmsg() call below?

> +		smc_tx_sndbuf_nonempty(conn);
>   
>   		trace_smc_tx_sendmsg(smc, copylen);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile()
  2022-01-30 18:02 [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Tony Lu
                   ` (2 preceding siblings ...)
  2022-01-30 18:02 ` [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag Tony Lu
@ 2022-01-31 19:42 ` Jakub Kicinski
  3 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2022-01-31 19:42 UTC (permalink / raw)
  To: Tony Lu; +Cc: kgraul, davem, netdev, linux-s390

On Mon, 31 Jan 2022 02:02:54 +0800 Tony Lu wrote:
> Currently, SMC use default implement for syscall sendfile() [1], which
> is wildly used in nginx and big data sences. Usually, applications use
> sendfile() with TCP_CORK:
> 
> fstat(20, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
> setsockopt(19, SOL_TCP, TCP_CORK, [1], 4) = 0
> writev(19, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=240}], 1) = 240
> sendfile(19, 20, [0] => [4096], 4096)   = 4096
> close(20)                               = 0
> setsockopt(19, SOL_TCP, TCP_CORK, [0], 4) = 0
> 
> The above is an example of Nginx, when sendfile() on, Nginx first
> enables TCP_CORK, write headers, the data will not be sent. Then call
> sendfile(), it reads file and write to sndbuf. When TCP_CORK is cleared,
> all pending data is sent out.
> 
> The performance of the default implement of sendfile is lower than when
> it is off. After investigation, it shows two parts to improve:
> - unnecessary lock contention of delayed work
> - less data per send than when sendfile off
> 
> Patch #1 tries to reduce lock_sock() contention in smc_tx_work().
> Patch #2 removes timed work for corking, and let applications control
> it. See TCP_CORK [2] MSG_MORE [3].
> Patch #3 adds MSG_SENDPAGE_NOTLAST for corking more data when
> sendfile().
> 
> Test environments:
> - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4
> - socket sndbuf / rcvbuf: 16384 / 131072 bytes
> - server: smc_run nginx
> - client: smc_run ./wrk -c 100 -t 2 -d 30 http://192.168.100.1:8080/4k.html
> - payload: 4KB local disk file
> 
> Items                     QPS
> sendfile off        272477.10
> sendfile on (orig)  223622.79
> sendfile on (this)  395847.21
> 
> This benchmark shows +45.28% improvement compared with sendfile off, and
> +77.02% compared with original sendfile implement.
> 
> [1] https://man7.org/linux/man-pages/man2/sendfile.2.html
> [2] https://linux.die.net/man/7/tcp
> [3] https://man7.org/linux/man-pages/man2/send.2.html

I believe this is now commit 780bf05f44c2 ("Merge branch
'smc-improvements'") in net-next. Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag
  2022-01-30 18:02 ` [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag Tony Lu
@ 2022-01-31 19:46   ` Stefan Raspl
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Raspl @ 2022-01-31 19:46 UTC (permalink / raw)
  To: Tony Lu, kgraul, kuba, davem; +Cc: netdev, linux-s390

On 1/30/22 19:02, Tony Lu wrote:
> This introduces a new corked flag, MSG_SENDPAGE_NOTLAST, which is
> involved in syscall sendfile() [1], it indicates this is not the last
> page. So we can cork the data until the page is not specify this flag.
> It has the same effect as MSG_MORE, but existed in sendfile() only.
> 
> This patch adds a option MSG_SENDPAGE_NOTLAST for corking data, try to
> cork more data before sending when using sendfile(), which acts like
> TCP's behaviour. Also, this reimplements the default sendpage to inform
> that it is supported to some extent.
> 
> [1] https://man7.org/linux/man-pages/man2/sendfile.2.html
> 
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> ---
>   net/smc/af_smc.c |  4 +++-
>   net/smc/smc_tx.c | 19 ++++++++++++++++++-
>   net/smc/smc_tx.h |  2 ++
>   3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index ef021ec6b361..8b78010afe01 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2729,8 +2729,10 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
>   		rc = kernel_sendpage(smc->clcsock, page, offset,
>   				     size, flags);
>   	} else {
> +		lock_sock(sk);
> +		rc = smc_tx_sendpage(smc, page, offset, size, flags);
> +		release_sock(sk);
>   		SMC_STAT_INC(smc, sendpage_cnt);
> -		rc = sock_no_sendpage(sock, page, offset, size, flags);
>   	}
>   
>   out:
> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> index 9cec62cae7cb..a96ce162825e 100644
> --- a/net/smc/smc_tx.c
> +++ b/net/smc/smc_tx.c
> @@ -235,7 +235,8 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>   		 */
>   		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
>   			conn->urg_tx_pend = true;
> -		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> +		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc) ||
> +		     msg->msg_flags & MSG_SENDPAGE_NOTLAST) &&
>   		    (atomic_read(&conn->sndbuf_space)))
>   			/* for a corked socket defer the RDMA writes if
>   			 * sndbuf_space is still available. The applications
> @@ -257,6 +258,22 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>   	return rc;
>   }
>   
> +int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
> +		    size_t size, int flags)
> +{
> +	struct msghdr msg = {.msg_flags = flags};
> +	char *kaddr = kmap(page);
> +	struct kvec iov;
> +	int rc;
> +
> +	iov.iov_base = kaddr + offset;
> +	iov.iov_len = size;
> +	iov_iter_kvec(&msg.msg_iter, WRITE, &iov, 1, size);
> +	rc = smc_tx_sendmsg(smc, &msg, size);
> +	kunmap(page);
> +	return rc;
> +}
> +
>   /***************************** sndbuf consumer *******************************/
>   
>   /* sndbuf consumer: actual data transfer of one target chunk with ISM write */
> diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
> index a59f370b8b43..34b578498b1f 100644
> --- a/net/smc/smc_tx.h
> +++ b/net/smc/smc_tx.h
> @@ -31,6 +31,8 @@ void smc_tx_pending(struct smc_connection *conn);
>   void smc_tx_work(struct work_struct *work);
>   void smc_tx_init(struct smc_sock *smc);
>   int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
> +int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
> +		    size_t size, int flags);
>   int smc_tx_sndbuf_nonempty(struct smc_connection *conn);
>   void smc_tx_sndbuf_nonfull(struct smc_sock *smc);
>   void smc_tx_consumer_update(struct smc_connection *conn, bool force);
> 

Reviewed-by: Stefan Raspl <raspl@linux.ibm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared
  2022-01-31 19:13   ` Stefan Raspl
@ 2022-02-07 10:03     ` Tony Lu
  2022-02-11  6:52     ` [PATCH net-next] net/smc: Add comment for smc_tx_pending Tony Lu
  1 sibling, 0 replies; 14+ messages in thread
From: Tony Lu @ 2022-02-07 10:03 UTC (permalink / raw)
  To: Stefan Raspl; +Cc: kgraul, kuba, davem, netdev, linux-s390

On Mon, Jan 31, 2022 at 08:13:53PM +0100, Stefan Raspl wrote:
> On 1/30/22 19:02, Tony Lu wrote:
> > According to the man page of TCP_CORK [1], if set, don't send out
> > partial frames. All queued partial frames are sent when option is
> > cleared again.
> > 
> > When applications call setsockopt to disable TCP_CORK, this call is
> > protected by lock_sock(), and tries to mod_delayed_work() to 0, in order
> > to send pending data right now. However, the delayed work smc_tx_work is
> > also protected by lock_sock(). There introduces lock contention for
> > sending data.
> > 
> > To fix it, send pending data directly which acts like TCP, without
> > lock_sock() protected in the context of setsockopt (already lock_sock()ed),
> > and cancel unnecessary dealyed work, which is protected by lock.
> > 
> > [1] https://linux.die.net/man/7/tcp
> > 
> > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> > ---
> >   net/smc/af_smc.c |  4 ++--
> >   net/smc/smc_tx.c | 25 +++++++++++++++----------
> >   net/smc/smc_tx.h |  1 +
> >   3 files changed, 18 insertions(+), 12 deletions(-)
> > 
> > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> > index ffab9cee747d..ef021ec6b361 100644
> > --- a/net/smc/af_smc.c
> > +++ b/net/smc/af_smc.c
> > @@ -2600,8 +2600,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
> >   		    sk->sk_state != SMC_CLOSED) {
> >   			if (!val) {
> >   				SMC_STAT_INC(smc, cork_cnt);
> > -				mod_delayed_work(smc->conn.lgr->tx_wq,
> > -						 &smc->conn.tx_work, 0);
> > +				smc_tx_pending(&smc->conn);
> > +				cancel_delayed_work(&smc->conn.tx_work);
> >   			}
> >   		}
> >   		break;
> > diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> > index be241d53020f..7b0b6e24582f 100644
> > --- a/net/smc/smc_tx.c
> > +++ b/net/smc/smc_tx.c
> > @@ -597,27 +597,32 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
> >   	return rc;
> >   }
> > -/* Wakeup sndbuf consumers from process context
> > - * since there is more data to transmit
> > - */
> > -void smc_tx_work(struct work_struct *work)
> > +void smc_tx_pending(struct smc_connection *conn)
> 
> Could you add a comment that we're expecting lock_sock() to be held when
> calling this function?

I will add it in the next separated patch.

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next] net/smc: Add comment for smc_tx_pending
  2022-01-31 19:13   ` Stefan Raspl
  2022-02-07 10:03     ` Tony Lu
@ 2022-02-11  6:52     ` Tony Lu
  2022-02-14 11:20       ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-02-11  6:52 UTC (permalink / raw)
  To: raspl; +Cc: kgraul, kuba, davem, netdev, linux-s390

The previous patch introduces a lock-free version of smc_tx_work() to
solve unnecessary lock contention, which is expected to be held lock.
So this adds comment to remind people to keep an eye out for locks.

Suggested-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/smc_tx.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index a96ce162825e..5df3940d4543 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -611,6 +611,10 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
 	return rc;
 }
 
+/* Wakeup sndbuf consumers from process context
+ * since there is more data to transmit. The caller
+ * must hold sock lock.
+ */
 void smc_tx_pending(struct smc_connection *conn)
 {
 	struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
@@ -626,7 +630,8 @@ void smc_tx_pending(struct smc_connection *conn)
 }
 
 /* Wakeup sndbuf consumers from process context
- * since there is more data to transmit
+ * since there is more data to transmit in locked
+ * sock.
  */
 void smc_tx_work(struct work_struct *work)
 {
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/3] net/smc: Remove corked dealyed work
  2022-01-31 19:40   ` Stefan Raspl
@ 2022-02-11  9:10     ` Tony Lu
  2022-02-14 10:29       ` Stefan Raspl
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Lu @ 2022-02-11  9:10 UTC (permalink / raw)
  To: Stefan Raspl; +Cc: kgraul, kuba, davem, netdev, linux-s390

On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
> On 1/30/22 19:02, Tony Lu wrote:
> > Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> > have the same effect. Applications can set these options and informs the
> > kernel to pend the data, and send them out only when the socket or
> > syscall does not specify this flag. In other words, there's no need to
> > send data out by a delayed work, which will queue a lot of work.
> > 
> > This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> > applications control how/when to send them out. It improves the
> > performance for sendfile and throughput, and remove unnecessary race of
> > lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> > it up before sending.
> > 
> > [1] https://linux.die.net/man/7/tcp
> > [2] https://man7.org/linux/man-pages/man2/send.2.html
> > 
> > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> > ---
> >   net/smc/smc_tx.c | 15 ++++++---------
> >   1 file changed, 6 insertions(+), 9 deletions(-)
> > 
> > diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> > index 7b0b6e24582f..9cec62cae7cb 100644
> > --- a/net/smc/smc_tx.c
> > +++ b/net/smc/smc_tx.c
> > @@ -31,7 +31,6 @@
> >   #include "smc_tracepoint.h"
> >   #define SMC_TX_WORK_DELAY	0
> > -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
> >   /***************************** sndbuf producer *******************************/
> > @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
> >   		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
> >   			conn->urg_tx_pend = true;
> >   		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> > -		    (atomic_read(&conn->sndbuf_space) >
> > -						(conn->sndbuf_desc->len >> 1)))
> > -			/* for a corked socket defer the RDMA writes if there
> > -			 * is still sufficient sndbuf_space available
> > +		    (atomic_read(&conn->sndbuf_space)))
> > +			/* for a corked socket defer the RDMA writes if
> > +			 * sndbuf_space is still available. The applications
> > +			 * should known how/when to uncork it.
> >   			 */
> > -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> > -					   SMC_TX_CORK_DELAY);
> > -		else
> > -			smc_tx_sndbuf_nonempty(conn);
> > +			continue;
> 
> In case we just corked the final bytes in this call, wouldn't this
> 'continue' prevent us from accounting the Bytes that we just staged to be
> sent out later in the trace_smc_tx_sendmsg() call below?
> 
> > +		smc_tx_sndbuf_nonempty(conn);
> >   		trace_smc_tx_sendmsg(smc, copylen);
> 

If the application send out the final bytes in this call, the
application should also clear MSG_MORE or TCP_CORK flag, this action is
required based on the manuals [1] and [2]. So it is safe to cork the data
if flag is setted, and continue to the next loop until application
clears the flag.

[1] https://linux.die.net/man/7/tcp
[2] https://man7.org/linux/man-pages/man2/send.2.html

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/3] net/smc: Remove corked dealyed work
  2022-02-11  9:10     ` Tony Lu
@ 2022-02-14 10:29       ` Stefan Raspl
  2022-02-14 12:10         ` Tony Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Raspl @ 2022-02-14 10:29 UTC (permalink / raw)
  To: Tony Lu; +Cc: kgraul, kuba, davem, netdev, linux-s390

On 2/11/22 10:10, Tony Lu wrote:
> On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
>> On 1/30/22 19:02, Tony Lu wrote:
>>> Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
>>> have the same effect. Applications can set these options and informs the
>>> kernel to pend the data, and send them out only when the socket or
>>> syscall does not specify this flag. In other words, there's no need to
>>> send data out by a delayed work, which will queue a lot of work.
>>>
>>> This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
>>> applications control how/when to send them out. It improves the
>>> performance for sendfile and throughput, and remove unnecessary race of
>>> lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
>>> it up before sending.
>>>
>>> [1] https://linux.die.net/man/7/tcp
>>> [2] https://man7.org/linux/man-pages/man2/send.2.html
>>>
>>> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
>>> ---
>>>    net/smc/smc_tx.c | 15 ++++++---------
>>>    1 file changed, 6 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
>>> index 7b0b6e24582f..9cec62cae7cb 100644
>>> --- a/net/smc/smc_tx.c
>>> +++ b/net/smc/smc_tx.c
>>> @@ -31,7 +31,6 @@
>>>    #include "smc_tracepoint.h"
>>>    #define SMC_TX_WORK_DELAY	0
>>> -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
>>>    /***************************** sndbuf producer *******************************/
>>> @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>>>    		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
>>>    			conn->urg_tx_pend = true;
>>>    		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
>>> -		    (atomic_read(&conn->sndbuf_space) >
>>> -						(conn->sndbuf_desc->len >> 1)))
>>> -			/* for a corked socket defer the RDMA writes if there
>>> -			 * is still sufficient sndbuf_space available
>>> +		    (atomic_read(&conn->sndbuf_space)))
>>> +			/* for a corked socket defer the RDMA writes if
>>> +			 * sndbuf_space is still available. The applications
>>> +			 * should known how/when to uncork it.
>>>    			 */
>>> -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
>>> -					   SMC_TX_CORK_DELAY);
>>> -		else
>>> -			smc_tx_sndbuf_nonempty(conn);
>>> +			continue;
>>
>> In case we just corked the final bytes in this call, wouldn't this
>> 'continue' prevent us from accounting the Bytes that we just staged to be
>> sent out later in the trace_smc_tx_sendmsg() call below?
>>
>>> +		smc_tx_sndbuf_nonempty(conn);
>>>    		trace_smc_tx_sendmsg(smc, copylen);
>>
> 
> If the application send out the final bytes in this call, the
> application should also clear MSG_MORE or TCP_CORK flag, this action is
> required based on the manuals [1] and [2]. So it is safe to cork the data
> if flag is setted, and continue to the next loop until application
> clears the flag.

Yes, I understand. But trace_smc_tx_sendmsg(smc, copylen) should be called for 
each portion of data that we transmit, i.e. each time we run through this loop. 
That is because parameter copylen is reset during each iteration.
Now your patch adds a 'continue', which prevents that trace_smc_tc... call from 
being made. Which means the information that 'copylen' Bytes were transferred is 
lost forever, and the accounting of tx Bytes is off by 'copylen' Bytes, I believe!

Ciao,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net/smc: Add comment for smc_tx_pending
  2022-02-11  6:52     ` [PATCH net-next] net/smc: Add comment for smc_tx_pending Tony Lu
@ 2022-02-14 11:20       ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-14 11:20 UTC (permalink / raw)
  To: Tony Lu; +Cc: raspl, kgraul, kuba, davem, netdev, linux-s390

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 11 Feb 2022 14:52:21 +0800 you wrote:
> The previous patch introduces a lock-free version of smc_tx_work() to
> solve unnecessary lock contention, which is expected to be held lock.
> So this adds comment to remind people to keep an eye out for locks.
> 
> Suggested-by: Stefan Raspl <raspl@linux.ibm.com>
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> 
> [...]

Here is the summary with links:
  - [net-next] net/smc: Add comment for smc_tx_pending
    https://git.kernel.org/netdev/net-next/c/2e13bde13153

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/3] net/smc: Remove corked dealyed work
  2022-02-14 10:29       ` Stefan Raspl
@ 2022-02-14 12:10         ` Tony Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Tony Lu @ 2022-02-14 12:10 UTC (permalink / raw)
  To: Stefan Raspl; +Cc: kgraul, kuba, davem, netdev, linux-s390

On Mon, Feb 14, 2022 at 11:29:10AM +0100, Stefan Raspl wrote:
> On 2/11/22 10:10, Tony Lu wrote:
> > On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
> > > On 1/30/22 19:02, Tony Lu wrote:
> > > > Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> > > > have the same effect. Applications can set these options and informs the
> > > > kernel to pend the data, and send them out only when the socket or
> > > > syscall does not specify this flag. In other words, there's no need to
> > > > send data out by a delayed work, which will queue a lot of work.
> > > > 
> > > > This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> > > > applications control how/when to send them out. It improves the
> > > > performance for sendfile and throughput, and remove unnecessary race of
> > > > lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> > > > it up before sending.
> > > > 
> > > > [1] https://linux.die.net/man/7/tcp
> > > > [2] https://man7.org/linux/man-pages/man2/send.2.html
> > > > 
> > > > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> > > > ---
> > > >    net/smc/smc_tx.c | 15 ++++++---------
> > > >    1 file changed, 6 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> > > > index 7b0b6e24582f..9cec62cae7cb 100644
> > > > --- a/net/smc/smc_tx.c
> > > > +++ b/net/smc/smc_tx.c
> > > > @@ -31,7 +31,6 @@
> > > >    #include "smc_tracepoint.h"
> > > >    #define SMC_TX_WORK_DELAY	0
> > > > -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
> > > >    /***************************** sndbuf producer *******************************/
> > > > @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
> > > >    		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
> > > >    			conn->urg_tx_pend = true;
> > > >    		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> > > > -		    (atomic_read(&conn->sndbuf_space) >
> > > > -						(conn->sndbuf_desc->len >> 1)))
> > > > -			/* for a corked socket defer the RDMA writes if there
> > > > -			 * is still sufficient sndbuf_space available
> > > > +		    (atomic_read(&conn->sndbuf_space)))
> > > > +			/* for a corked socket defer the RDMA writes if
> > > > +			 * sndbuf_space is still available. The applications
> > > > +			 * should known how/when to uncork it.
> > > >    			 */
> > > > -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> > > > -					   SMC_TX_CORK_DELAY);
> > > > -		else
> > > > -			smc_tx_sndbuf_nonempty(conn);
> > > > +			continue;
> > > 
> > > In case we just corked the final bytes in this call, wouldn't this
> > > 'continue' prevent us from accounting the Bytes that we just staged to be
> > > sent out later in the trace_smc_tx_sendmsg() call below?
> > > 
> > > > +		smc_tx_sndbuf_nonempty(conn);
> > > >    		trace_smc_tx_sendmsg(smc, copylen);
> > > 
> > 
> > If the application send out the final bytes in this call, the
> > application should also clear MSG_MORE or TCP_CORK flag, this action is
> > required based on the manuals [1] and [2]. So it is safe to cork the data
> > if flag is setted, and continue to the next loop until application
> > clears the flag.
> 
> Yes, I understand. But trace_smc_tx_sendmsg(smc, copylen) should be called
> for each portion of data that we transmit, i.e. each time we run through
> this loop. That is because parameter copylen is reset during each iteration.
> Now your patch adds a 'continue', which prevents that trace_smc_tc... call
> from being made. Which means the information that 'copylen' Bytes were
> transferred is lost forever, and the accounting of tx Bytes is off by
> 'copylen' Bytes, I believe!

This makes sense to me. It shouldn't be ignored if data was corked. I
will fix it in the next patch.

Thank you,
Tony Lu

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-14 12:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-30 18:02 [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Tony Lu
2022-01-30 18:02 ` [PATCH net-next 1/3] net/smc: Send directly when TCP_CORK is cleared Tony Lu
2022-01-31 19:13   ` Stefan Raspl
2022-02-07 10:03     ` Tony Lu
2022-02-11  6:52     ` [PATCH net-next] net/smc: Add comment for smc_tx_pending Tony Lu
2022-02-14 11:20       ` patchwork-bot+netdevbpf
2022-01-30 18:02 ` [PATCH net-next 2/3] net/smc: Remove corked dealyed work Tony Lu
2022-01-31 19:40   ` Stefan Raspl
2022-02-11  9:10     ` Tony Lu
2022-02-14 10:29       ` Stefan Raspl
2022-02-14 12:10         ` Tony Lu
2022-01-30 18:02 ` [PATCH net-next 3/3] net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag Tony Lu
2022-01-31 19:46   ` Stefan Raspl
2022-01-31 19:42 ` [PATCH net-next 0/3] net/smc: Improvements for TCP_CORK and sendfile() Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.