All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS
@ 2023-07-19 11:38 Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 1/6] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

Hi all,

here are some small fixes to get NVMe-over-TLS up and running.
The first set are just minor modifications to have MSG_EOR handled
for TLS, but the second set implements the ->read_sock() callback for tls_sw
which I guess could do with some reviews.

As usual, comments and reviews are welcome.

Changes to the original submission:
- Add a testcase for MSG_EOR handling

Changes to v2:
- Bail out on conflicting message flags
- Rework flag handling

Changes to v3:
- Return -EINVAL on conflicting flags
- Rebase on top of net-next

Changes to v4:
- Add tlx_rx_reader_lock() to read_sock
- Add MSG_EOR handling to tls_sw_readpages()

Changes to v5:
- Rebase to latest upstream
- Split tls_rx_reader_lock() as suggested by Sagi

Changes to v6:
- Fixup tls_strp_read_copyin() to avoid infinite recursion
  in tls_read_sock()
- Rework tls_read_sock() to read all available data

Hannes Reinecke (6):
  net/tls: handle MSG_EOR for tls_sw TX flow
  net/tls: handle MSG_EOR for tls_device TX flow
  selftests/net/tls: add test for MSG_EOR
  net/tls: Use tcp_read_sock() instead of ops->read_sock()
  net/tls: split tls_rx_reader_lock
  net/tls: implement ->read_sock()

 net/tls/tls.h                     |   2 +
 net/tls/tls_device.c              |   6 +-
 net/tls/tls_main.c                |   2 +
 net/tls/tls_strp.c                |   3 +-
 net/tls/tls_sw.c                  | 139 ++++++++++++++++++++++++++----
 tools/testing/selftests/net/tls.c |  11 +++
 6 files changed, 143 insertions(+), 20 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] net/tls: handle MSG_EOR for tls_sw TX flow
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 2/6] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

tls_sw_sendmsg() already handles MSG_MORE, but bails
out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of
MSG_MORE this patch adds handling MSG_EOR by treating
it as the negation of MSG_MORE.
And erroring out if MSG_EOR is specified with MSG_MORE.

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_sw.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 53f944e6d8ef..9aef45e870a5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -984,6 +984,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
 	int ret = 0;
 	int pending;
 
+	if (!eor && (msg->msg_flags & MSG_EOR))
+		return -EINVAL;
+
 	if (unlikely(msg->msg_controllen)) {
 		ret = tls_process_cmsg(sk, msg, &record_type);
 		if (ret) {
@@ -1193,7 +1196,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	int ret;
 
 	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | MSG_EOR |
 			       MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] net/tls: handle MSG_EOR for tls_device TX flow
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 1/6] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 3/6] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

tls_push_data() MSG_MORE, but bails out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of MSG_MORE
this patch adds handling MSG_EOR by treating it as the
absence of MSG_MORE.
Consequently we should return an error when both are set.

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_device.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 2021fe557e50..5df18f696d7f 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -441,9 +441,13 @@ static int tls_push_data(struct sock *sk,
 	long timeo;
 
 	if (flags &
-	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SPLICE_PAGES))
+	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+	      MSG_SPLICE_PAGES | MSG_EOR))
 		return -EOPNOTSUPP;
 
+	if ((flags & (MSG_MORE | MSG_EOR)) == (MSG_MORE | MSG_EOR))
+		return -EINVAL;
+
 	if (unlikely(sk->sk_err))
 		return -sk->sk_err;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] selftests/net/tls: add test for MSG_EOR
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 1/6] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 2/6] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock() Hannes Reinecke
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

As the recent patch is modifying the behaviour for TLS re MSG_EOR
handling we should be having a test for it.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 tools/testing/selftests/net/tls.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index a3c57004344c..4b63708c6a81 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -486,6 +486,17 @@ TEST_F(tls, msg_more_unsent)
 	EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_DONTWAIT), -1);
 }
 
+TEST_F(tls, msg_eor)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+	char buf[10];
+
+	EXPECT_EQ(send(self->fd, test_str, send_len, MSG_EOR), send_len);
+	EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_WAITALL), send_len);
+	EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+}
+
 TEST_F(tls, sendmsg_single)
 {
 	struct msghdr msg;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock()
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
                   ` (2 preceding siblings ...)
  2023-07-19 11:38 ` [PATCH 3/6] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:54   ` Sagi Grimberg
  2023-07-19 11:38 ` [PATCH 5/6] net/tls: split tls_rx_reader_lock Hannes Reinecke
  2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

TLS resets the protocol operations, so the read_sock() callback might
be changed, too.
In this case using sock->ops->readsock() in tls_strp_read_copyin() will
enter an infinite recursion if the read_sock() callback is calling
tls_rx_rec_wait() which will call into sock->ops->readsock() via
tls_strp_read_copyin().
But as tls_strp_read_copyin() is supposed to produce data from the
consumed socket and that socket is always a TCP socket we can call
tcp_read_sock() directly without having to deal with callbacks.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_strp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index f37f4a0fcd3c..ca1e0e198ceb 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -369,7 +369,6 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
 
 static int tls_strp_read_copyin(struct tls_strparser *strp)
 {
-	struct socket *sock = strp->sk->sk_socket;
 	read_descriptor_t desc;
 
 	desc.arg.data = strp;
@@ -377,7 +376,7 @@ static int tls_strp_read_copyin(struct tls_strparser *strp)
 	desc.count = 1; /* give more than one skb per call */
 
 	/* sk should be locked here, so okay to do read_sock */
-	sock->ops->read_sock(strp->sk, &desc, tls_strp_copyin);
+	tcp_read_sock(strp->sk, &desc, tls_strp_copyin);
 
 	return desc.error;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] net/tls: split tls_rx_reader_lock
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
                   ` (3 preceding siblings ...)
  2023-07-19 11:38 ` [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock() Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:55   ` Sagi Grimberg
  2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke

Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
the actual locking part.
With that we can use the tls_rx_reader_lock in situations where
the socket is already locked.

Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_sw.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9aef45e870a5..d0636ea13009 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1848,13 +1848,10 @@ tls_read_flush_backlog(struct sock *sk, struct tls_prot_info *prot,
 	return sk_flush_backlog(sk);
 }
 
-static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
-			      bool nonblock)
+static int tls_rx_reader_acquire(struct sock *sk, struct tls_sw_context_rx *ctx,
+				 bool nonblock)
 {
 	long timeo;
-	int err;
-
-	lock_sock(sk);
 
 	timeo = sock_rcvtimeo(sk, nonblock);
 
@@ -1868,26 +1865,30 @@ static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
 			      !READ_ONCE(ctx->reader_present), &wait);
 		remove_wait_queue(&ctx->wq, &wait);
 
-		if (timeo <= 0) {
-			err = -EAGAIN;
-			goto err_unlock;
-		}
-		if (signal_pending(current)) {
-			err = sock_intr_errno(timeo);
-			goto err_unlock;
-		}
+		if (timeo <= 0)
+			return -EAGAIN;
+		if (signal_pending(current))
+			return sock_intr_errno(timeo);
 	}
 
 	WRITE_ONCE(ctx->reader_present, 1);
 
 	return 0;
+}
 
-err_unlock:
-	release_sock(sk);
+static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
+			      bool nonblock)
+{
+	int err;
+
+	lock_sock(sk);
+	err = tls_rx_reader_acquire(sk, ctx, nonblock);
+	if (err)
+		release_sock(sk);
 	return err;
 }
 
-static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
+static void tls_rx_reader_release(struct sock *sk, struct tls_sw_context_rx *ctx)
 {
 	if (unlikely(ctx->reader_contended)) {
 		if (wq_has_sleeper(&ctx->wq))
@@ -1899,6 +1900,11 @@ static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
 	}
 
 	WRITE_ONCE(ctx->reader_present, 0);
+}
+
+static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
+{
+	tls_rx_reader_release(sk, ctx);
 	release_sock(sk);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
                   ` (4 preceding siblings ...)
  2023-07-19 11:38 ` [PATCH 5/6] net/tls: split tls_rx_reader_lock Hannes Reinecke
@ 2023-07-19 11:38 ` Hannes Reinecke
  2023-07-19 11:55   ` Sagi Grimberg
                     ` (2 more replies)
  5 siblings, 3 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-19 11:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke,
	Boris Pismenny

Implement ->read_sock() function for use with nvme-tcp.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
---
 net/tls/tls.h      |  2 +
 net/tls/tls_main.c |  2 +
 net/tls/tls_sw.c   | 96 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 100 insertions(+)

diff --git a/net/tls/tls.h b/net/tls/tls.h
index 86cef1c68e03..7e4d45537deb 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -110,6 +110,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
 ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor);
 
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 void tls_device_splice_eof(struct socket *sock);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b6896126bb92..7dbb8cd8f809 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -962,10 +962,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_BASE][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_BASE][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 	ops[TLS_SW  ][TLS_SW  ] = ops[TLS_SW  ][TLS_BASE];
 	ops[TLS_SW  ][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_SW  ][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_SW  ][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 #ifdef CONFIG_TLS_DEVICE
 	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index d0636ea13009..4829d2cb9a7c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2202,6 +2202,102 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	goto splice_read_end;
 }
 
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor)
+{
+	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	struct strp_msg *rxm = NULL;
+	struct tls_msg *tlm;
+	struct sk_buff *skb;
+	struct sk_psock *psock;
+	ssize_t copied = 0;
+	bool bpf_strp_enabled;
+	int err, used;
+
+	psock = sk_psock_get(sk);
+	err = tls_rx_reader_acquire(sk, ctx, true);
+	if (err < 0)
+		goto psock_put;
+	bpf_strp_enabled = sk_psock_strp_enabled(psock);
+
+	/* If crypto failed the connection is broken */
+	err = ctx->async_wait.err;
+	if (err)
+		goto read_sock_end;
+
+	do {
+		if (!skb_queue_empty(&ctx->rx_list)) {
+			skb = __skb_dequeue(&ctx->rx_list);
+			rxm = strp_msg(skb);
+		} else {
+			struct tls_decrypt_arg darg;
+
+			err = tls_rx_rec_wait(sk, psock, true, true);
+			if (err <= 0)
+				goto read_sock_end;
+
+			memset(&darg.inargs, 0, sizeof(darg.inargs));
+			darg.zc = !bpf_strp_enabled && ctx->zc_capable;
+
+			rxm = strp_msg(tls_strp_msg(ctx));
+			tlm = tls_msg(tls_strp_msg(ctx));
+
+			/* read_sock does not support reading control messages */
+			if (tlm->control != TLS_RECORD_TYPE_DATA) {
+				err = -EINVAL;
+				goto read_sock_requeue;
+			}
+
+			if (!bpf_strp_enabled)
+				darg.async = ctx->async_capable;
+			else
+				darg.async = false;
+
+			err = tls_rx_one_record(sk, NULL, &darg);
+			if (err < 0) {
+				tls_err_abort(sk, -EBADMSG);
+				goto read_sock_end;
+			}
+
+			sk_flush_backlog(sk);
+			skb = darg.skb;
+			rxm = strp_msg(skb);
+
+			tls_rx_rec_done(ctx);
+		}
+
+		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
+		if (used <= 0) {
+			if (!copied)
+				err = used;
+			goto read_sock_end;
+		}
+		copied += used;
+		if (used < rxm->full_len) {
+			rxm->offset += used;
+			rxm->full_len -= used;
+			if (!desc->count)
+				goto read_sock_requeue;
+		} else {
+			consume_skb(skb);
+			if (!desc->count)
+				skb = NULL;
+		}
+	} while (skb);
+
+read_sock_end:
+	tls_rx_reader_release(sk, ctx);
+psock_put:
+	if (psock)
+		sk_psock_put(sk, psock);
+	return copied ? : err;
+
+read_sock_requeue:
+	__skb_queue_head(&ctx->rx_list, skb);
+	goto read_sock_end;
+}
+
 bool tls_sw_sock_is_readable(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock()
  2023-07-19 11:38 ` [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock() Hannes Reinecke
@ 2023-07-19 11:54   ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-07-19 11:54 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, netdev

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-07-19 11:55   ` Sagi Grimberg
  2023-07-20 18:32   ` Simon Horman
  2023-07-21  3:02   ` Jakub Kicinski
  2 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-07-19 11:55 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny


> Implement ->read_sock() function for use with nvme-tcp.

This implementation looks better.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] net/tls: split tls_rx_reader_lock
  2023-07-19 11:38 ` [PATCH 5/6] net/tls: split tls_rx_reader_lock Hannes Reinecke
@ 2023-07-19 11:55   ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-07-19 11:55 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, netdev

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
  2023-07-19 11:55   ` Sagi Grimberg
@ 2023-07-20 18:32   ` Simon Horman
  2023-07-21  6:03     ` Hannes Reinecke
  2023-07-21  3:02   ` Jakub Kicinski
  2 siblings, 1 reply; 15+ messages in thread
From: Simon Horman @ 2023-07-20 18:32 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev,
	Boris Pismenny

On Wed, Jul 19, 2023 at 01:38:36PM +0200, Hannes Reinecke wrote:

...

Hi Hannes,

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index d0636ea13009..4829d2cb9a7c 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2202,6 +2202,102 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>  	goto splice_read_end;
>  }
>  
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		     sk_read_actor_t read_actor)
> +{
> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> +	struct strp_msg *rxm = NULL;
> +	struct tls_msg *tlm;
> +	struct sk_buff *skb;
> +	struct sk_psock *psock;
> +	ssize_t copied = 0;
> +	bool bpf_strp_enabled;
> +	int err, used;
> +
> +	psock = sk_psock_get(sk);
> +	err = tls_rx_reader_acquire(sk, ctx, true);
> +	if (err < 0)
> +		goto psock_put;

skb is uninitialised here,
however, it is used in the psock_put unwind path.

Flagged by gcc-12 [-Wmaybe-uninitialized] and Smatch.

> +	bpf_strp_enabled = sk_psock_strp_enabled(psock);
> +
> +	/* If crypto failed the connection is broken */
> +	err = ctx->async_wait.err;
> +	if (err)
> +		goto read_sock_end;

Likewise, here.

> +
> +	do {
> +		if (!skb_queue_empty(&ctx->rx_list)) {
> +			skb = __skb_dequeue(&ctx->rx_list);
> +			rxm = strp_msg(skb);
> +		} else {
> +			struct tls_decrypt_arg darg;
> +
> +			err = tls_rx_rec_wait(sk, psock, true, true);
> +			if (err <= 0)
> +				goto read_sock_end;
> +
> +			memset(&darg.inargs, 0, sizeof(darg.inargs));
> +			darg.zc = !bpf_strp_enabled && ctx->zc_capable;
> +
> +			rxm = strp_msg(tls_strp_msg(ctx));
> +			tlm = tls_msg(tls_strp_msg(ctx));
> +
> +			/* read_sock does not support reading control messages */
> +			if (tlm->control != TLS_RECORD_TYPE_DATA) {
> +				err = -EINVAL;
> +				goto read_sock_requeue;
> +			}
> +
> +			if (!bpf_strp_enabled)
> +				darg.async = ctx->async_capable;
> +			else
> +				darg.async = false;
> +
> +			err = tls_rx_one_record(sk, NULL, &darg);
> +			if (err < 0) {
> +				tls_err_abort(sk, -EBADMSG);
> +				goto read_sock_end;
> +			}
> +
> +			sk_flush_backlog(sk);
> +			skb = darg.skb;
> +			rxm = strp_msg(skb);
> +
> +			tls_rx_rec_done(ctx);
> +		}
> +
> +		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
> +		if (used <= 0) {
> +			if (!copied)
> +				err = used;
> +			goto read_sock_end;
> +		}
> +		copied += used;
> +		if (used < rxm->full_len) {
> +			rxm->offset += used;
> +			rxm->full_len -= used;
> +			if (!desc->count)
> +				goto read_sock_requeue;
> +		} else {
> +			consume_skb(skb);
> +			if (!desc->count)
> +				skb = NULL;
> +		}
> +	} while (skb);
> +
> +read_sock_end:
> +	tls_rx_reader_release(sk, ctx);
> +psock_put:
> +	if (psock)
> +		sk_psock_put(sk, psock);
> +	return copied ? : err;
> +
> +read_sock_requeue:
> +	__skb_queue_head(&ctx->rx_list, skb);
> +	goto read_sock_end;
> +}
> +
>  bool tls_sw_sock_is_readable(struct sock *sk)
>  {
>  	struct tls_context *tls_ctx = tls_get_ctx(sk);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
  2023-07-19 11:55   ` Sagi Grimberg
  2023-07-20 18:32   ` Simon Horman
@ 2023-07-21  3:02   ` Jakub Kicinski
  2023-07-21 13:53     ` Hannes Reinecke
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2023-07-21  3:02 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Eric Dumazet, Paolo Abeni, netdev, Boris Pismenny

On Wed, 19 Jul 2023 13:38:36 +0200 Hannes Reinecke wrote:
> Implement ->read_sock() function for use with nvme-tcp.

> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		     sk_read_actor_t read_actor)
> +{
> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> +	struct strp_msg *rxm = NULL;
> +	struct tls_msg *tlm;
> +	struct sk_buff *skb;
> +	struct sk_psock *psock;
> +	ssize_t copied = 0;
> +	bool bpf_strp_enabled;

bubble up the longer lines, like this:

+	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	struct strp_msg *rxm = NULL;
+	struct sk_psock *psock;
+	bool bpf_strp_enabled;
+	struct tls_msg *tlm;
+	struct sk_buff *skb;
+	ssize_t copied = 0;
+	int err, used;

> +	int err, used;
> +
> +	psock = sk_psock_get(sk);
> +	err = tls_rx_reader_acquire(sk, ctx, true);
> +	if (err < 0)
> +		goto psock_put;
> +	bpf_strp_enabled = sk_psock_strp_enabled(psock);

You're not servicing the BPF out of band queue, just error out if
the BPF psock is enabled. It's barely used and endlessly buggy anyway.

> +	/* If crypto failed the connection is broken */
> +	err = ctx->async_wait.err;
> +	if (err)
> +		goto read_sock_end;
> +
> +	do {
> +		if (!skb_queue_empty(&ctx->rx_list)) {
> +			skb = __skb_dequeue(&ctx->rx_list);
> +			rxm = strp_msg(skb);
> +		} else {
> +			struct tls_decrypt_arg darg;
> +
> +			err = tls_rx_rec_wait(sk, psock, true, true);
> +			if (err <= 0)
> +				goto read_sock_end;
> +
> +			memset(&darg.inargs, 0, sizeof(darg.inargs));
> +			darg.zc = !bpf_strp_enabled && ctx->zc_capable;

And what are you zero-copying into my friend? zc == zero copy.
Leave the zc be 0, like splice does, otherwise passing msg=NULL
to tls_rx_one_record() may explode. Testing with TLS 1.2 should
show that.

> +			rxm = strp_msg(tls_strp_msg(ctx));
> +			tlm = tls_msg(tls_strp_msg(ctx));
> +
> +			/* read_sock does not support reading control messages */
> +			if (tlm->control != TLS_RECORD_TYPE_DATA) {
> +				err = -EINVAL;
> +				goto read_sock_requeue;
> +			}
> +
> +			if (!bpf_strp_enabled)
> +				darg.async = ctx->async_capable;
> +			else
> +				darg.async = false;

Also don't bother with async. TLS 1.3 can't do async, anyway,
and I don't think you wait for the completion :S

> +			err = tls_rx_one_record(sk, NULL, &darg);
> +			if (err < 0) {
> +				tls_err_abort(sk, -EBADMSG);
> +				goto read_sock_end;
> +			}
> +
> +			sk_flush_backlog(sk);

Hm, could be a bit often but okay.

> +			skb = darg.skb;
> +			rxm = strp_msg(skb);
> +
> +			tls_rx_rec_done(ctx);
> +		}
> +
> +		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
> +		if (used <= 0) {
> +			if (!copied)
> +				err = used;
> +			goto read_sock_end;

You have to requeue on error.

> +		}
> +		copied += used;
> +		if (used < rxm->full_len) {
> +			rxm->offset += used;
> +			rxm->full_len -= used;
> +			if (!desc->count)
> +				goto read_sock_requeue;

And here. Like splice_read does. Otherwise you leak the skb.

> +		} else {
> +			consume_skb(skb);
> +			if (!desc->count)
> +				skb = NULL;
> +		}
> +	} while (skb);
> +
> +read_sock_end:
> +	tls_rx_reader_release(sk, ctx);
> +psock_put:
> +	if (psock)
> +		sk_psock_put(sk, psock);
> +	return copied ? : err;
> +
> +read_sock_requeue:
> +	__skb_queue_head(&ctx->rx_list, skb);
> +	goto read_sock_end;
> +}
> +
>  bool tls_sw_sock_is_readable(struct sock *sk)
>  {
>  	struct tls_context *tls_ctx = tls_get_ctx(sk);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-20 18:32   ` Simon Horman
@ 2023-07-21  6:03     ` Hannes Reinecke
  0 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-21  6:03 UTC (permalink / raw)
  To: Simon Horman
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev,
	Boris Pismenny

On 7/20/23 20:32, Simon Horman wrote:
> On Wed, Jul 19, 2023 at 01:38:36PM +0200, Hannes Reinecke wrote:
> 
> ...
> 
> Hi Hannes,
> 
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index d0636ea13009..4829d2cb9a7c 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -2202,6 +2202,102 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>>   	goto splice_read_end;
>>   }
>>   
>> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>> +		     sk_read_actor_t read_actor)
>> +{
>> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
>> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
>> +	struct strp_msg *rxm = NULL;
>> +	struct tls_msg *tlm;
>> +	struct sk_buff *skb;
>> +	struct sk_psock *psock;
>> +	ssize_t copied = 0;
>> +	bool bpf_strp_enabled;
>> +	int err, used;
>> +
>> +	psock = sk_psock_get(sk);
>> +	err = tls_rx_reader_acquire(sk, ctx, true);
>> +	if (err < 0)
>> +		goto psock_put;
> 
> skb is uninitialised here,
> however, it is used in the psock_put unwind path.
> 
> Flagged by gcc-12 [-Wmaybe-uninitialized] and Smatch.
> 
Will fix it up in the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-21  3:02   ` Jakub Kicinski
@ 2023-07-21 13:53     ` Hannes Reinecke
  2023-07-21 14:50       ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-07-21 13:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Eric Dumazet, Paolo Abeni, netdev, Boris Pismenny

On 7/21/23 05:02, Jakub Kicinski wrote:
> On Wed, 19 Jul 2023 13:38:36 +0200 Hannes Reinecke wrote:
>> Implement ->read_sock() function for use with nvme-tcp.
> 
>> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>> +		     sk_read_actor_t read_actor)
>> +{
>> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
>> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
>> +	struct strp_msg *rxm = NULL;
>> +	struct tls_msg *tlm;
>> +	struct sk_buff *skb;
>> +	struct sk_psock *psock;
>> +	ssize_t copied = 0;
>> +	bool bpf_strp_enabled;
> 
> bubble up the longer lines, like this:
> 
> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> +	struct strp_msg *rxm = NULL;
> +	struct sk_psock *psock;
> +	bool bpf_strp_enabled;
> +	struct tls_msg *tlm;
> +	struct sk_buff *skb;
> +	ssize_t copied = 0;
> +	int err, used;
> 
Ok.

>> +	int err, used;
>> +
>> +	psock = sk_psock_get(sk);
>> +	err = tls_rx_reader_acquire(sk, ctx, true);
>> +	if (err < 0)
>> +		goto psock_put;
>> +	bpf_strp_enabled = sk_psock_strp_enabled(psock);
> 
> You're not servicing the BPF out of band queue, just error out if
> the BPF psock is enabled. It's barely used and endlessly buggy anyway.
> 
Have been wondering about that; will do.

>> +	/* If crypto failed the connection is broken */
>> +	err = ctx->async_wait.err;
>> +	if (err)
>> +		goto read_sock_end;
>> +
>> +	do {
>> +		if (!skb_queue_empty(&ctx->rx_list)) {
>> +			skb = __skb_dequeue(&ctx->rx_list);
>> +			rxm = strp_msg(skb);
>> +		} else {
>> +			struct tls_decrypt_arg darg;
>> +
>> +			err = tls_rx_rec_wait(sk, psock, true, true);
>> +			if (err <= 0)
>> +				goto read_sock_end;
>> +
>> +			memset(&darg.inargs, 0, sizeof(darg.inargs));
>> +			darg.zc = !bpf_strp_enabled && ctx->zc_capable;
> 
> And what are you zero-copying into my friend? zc == zero copy.
> Leave the zc be 0, like splice does, otherwise passing msg=NULL
> to tls_rx_one_record() may explode. Testing with TLS 1.2 should
> show that.
> 
Still beginning to learn how the stream parser works.
(And have been wondering about the 'msg == NULL' case, too).
Will fix it.

>> +			rxm = strp_msg(tls_strp_msg(ctx));
>> +			tlm = tls_msg(tls_strp_msg(ctx));
>> +
>> +			/* read_sock does not support reading control messages */
>> +			if (tlm->control != TLS_RECORD_TYPE_DATA) {
>> +				err = -EINVAL;
>> +				goto read_sock_requeue;
>> +			}
>> +
>> +			if (!bpf_strp_enabled)
>> +				darg.async = ctx->async_capable;
>> +			else
>> +				darg.async = false;
> 
> Also don't bother with async. TLS 1.3 can't do async, anyway,
> and I don't think you wait for the completion :S
> 
Ok.

>> +			err = tls_rx_one_record(sk, NULL, &darg);
>> +			if (err < 0) {
>> +				tls_err_abort(sk, -EBADMSG);
>> +				goto read_sock_end;
>> +			}
>> +
>> +			sk_flush_backlog(sk);
> 
> Hm, could be a bit often but okay.
> 
When would you suggest to do it?
(Do I need to do it at all?)

>> +			skb = darg.skb;
>> +			rxm = strp_msg(skb);
>> +
>> +			tls_rx_rec_done(ctx);
>> +		}
>> +
>> +		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
>> +		if (used <= 0) {
>> +			if (!copied)
>> +				err = used;
>> +			goto read_sock_end;
> 
> You have to requeue on error.
> 
Ah, right. Did it in the previous version, but somehow got
lost here.

Will be fixing it up.

>> +		}
>> +		copied += used;
>> +		if (used < rxm->full_len) {
>> +			rxm->offset += used;
>> +			rxm->full_len -= used;
>> +			if (!desc->count)
>> +				goto read_sock_requeue;
> 
> And here. Like splice_read does. Otherwise you leak the skb.
> 
Will do.

Thanks for the review!

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] net/tls: implement ->read_sock()
  2023-07-21 13:53     ` Hannes Reinecke
@ 2023-07-21 14:50       ` Jakub Kicinski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2023-07-21 14:50 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Eric Dumazet, Paolo Abeni, netdev, Boris Pismenny

On Fri, 21 Jul 2023 15:53:05 +0200 Hannes Reinecke wrote:
> >> +			err = tls_rx_one_record(sk, NULL, &darg);
> >> +			if (err < 0) {
> >> +				tls_err_abort(sk, -EBADMSG);
> >> +				goto read_sock_end;
> >> +			}
> >> +
> >> +			sk_flush_backlog(sk);  
> > 
> > Hm, could be a bit often but okay.
> >   
> When would you suggest to do it?
> (Do I need to do it at all?)

I picked every 128kB for the normal Rx path. I looked thru my old notes
and it seems I didn't measure smaller batches :( Only 128kB - 4MB.
Flush every 128kB was showing a 4% throughput hit, but much better TCP
behavior. Not sure how the perf hit would scale below 128kB, maybe 
the lower the threshold the lower the overhead because statistically
only one of every 4 calls will have something to do? (GRO coalesces
to 64kB == 4 TLS records). Dunno. You'd have to measure.

But its not a blocker for me, FWIW, you can keep the flushing on every
record.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-21 14:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19 11:38 [RESENT PATCHv7 0/6] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
2023-07-19 11:38 ` [PATCH 1/6] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
2023-07-19 11:38 ` [PATCH 2/6] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
2023-07-19 11:38 ` [PATCH 3/6] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
2023-07-19 11:38 ` [PATCH 4/6] net/tls: Use tcp_read_sock() instead of ops->read_sock() Hannes Reinecke
2023-07-19 11:54   ` Sagi Grimberg
2023-07-19 11:38 ` [PATCH 5/6] net/tls: split tls_rx_reader_lock Hannes Reinecke
2023-07-19 11:55   ` Sagi Grimberg
2023-07-19 11:38 ` [PATCH 6/6] net/tls: implement ->read_sock() Hannes Reinecke
2023-07-19 11:55   ` Sagi Grimberg
2023-07-20 18:32   ` Simon Horman
2023-07-21  6:03     ` Hannes Reinecke
2023-07-21  3:02   ` Jakub Kicinski
2023-07-21 13:53     ` Hannes Reinecke
2023-07-21 14:50       ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.