All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] tls, add unhash callback
@ 2019-06-27 17:36 John Fastabend
  2019-06-27 17:36 ` [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: John Fastabend @ 2019-06-27 17:36 UTC (permalink / raw)
  To: daniel, jakub.kicinski, ast; +Cc: netdev, edumazet, john.fastabend, bpf

Resolve a series of splats discovered by syzbot and noted by
Eric Dumazet. The primary problem here is we resolved an issue on
the BPF sockmap side by adding an unhash callback. This is
required to ensure sockmap sockets do not transition out of
ESTABLISHED state into a LISTEN state. When we did this it
created a case where the interaction between callbacks in TLS
and sockmap when used together could break. This resulted in
leaking TLS memory and potential to build loops of callbacks
where sockmap called into TLS and TLS called back into BPF.

Additionally, TLS was releasing the sock lock and then
reaquiring it during the tear down process which could hang
if another sock operation happened while the lock was not
held.

To fix this first refactor TLS code so lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. And cleans up the callbacks to fix the syzbot
errors.

---

John Fastabend (2):
      tls: remove close callback sock unlock/lock and flush_sync
      bpf: tls, implement unhash to avoid transition out of ESTABLISHED


 include/net/tls.h  |    6 ++-
 net/tls/tls_main.c |   96 ++++++++++++++++++++++++++++++++++++----------------
 net/tls/tls_sw.c   |   50 ++++++++++++++++++---------
 3 files changed, 103 insertions(+), 49 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-27 17:36 [PATCH 0/2] tls, add unhash callback John Fastabend
@ 2019-06-27 17:36 ` John Fastabend
  2019-06-27 23:44   ` Jakub Kicinski
  2019-06-27 17:36 ` [PATCH 2/2] bpf: tls, implement unhash to avoid transition out of ESTABLISHED John Fastabend
  2019-06-27 18:16 ` [PATCH 0/2] tls, add unhash callback John Fastabend
  2 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2019-06-27 17:36 UTC (permalink / raw)
  To: daniel, jakub.kicinski, ast; +Cc: netdev, edumazet, john.fastabend, bpf

The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock. This
seems suspect at best. The lock_sock() is applied to stop concurrent
operations on the socket while tearing the sock down. Further we
will need to add support for unhash() shortly and this complicates
matters because the lock may or may not be held then.

So to fix the above situation and simplify the next patch to add
unhash this patch creates a function tls_sk_proto_cleanup() that
tears down the socket without calling lock_sock/release_sock. In
order to flush the workqueue then we do the following,

  - Add a new bit to ctx, BIT_TX_CLOSING that is set when the
    tls resources are being removed.
  - Check this bit before scheduling any new work. This way we
    avoid queueing new work after tear down has started.
  - With the BIT_TX_CLOSING ensuring no new work is being added
    convert the cancel_delayed_work_sync to flush_delayed_work()
  - Finally call tlx_tx_records() to complete any available records
    before,
  - releasing and removing tls ctx.

The above is implemented for the software case namely any of
the following configurations from build_protos,

   prot[TLS_SW][TLS_BASE]
   prot[TLS_BASE][TLS_SW]
   prot[TLS_SW][TLS_SW]

The implication is a follow up patch is needed to resolve the
hardware offload case.

Tested with net selftests and bpf selftests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tls.h  |    4 ++--
 net/tls/tls_main.c |   54 ++++++++++++++++++++++++++--------------------------
 net/tls/tls_sw.c   |   50 ++++++++++++++++++++++++++++++++----------------
 3 files changed, 62 insertions(+), 46 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4a55ce6a303f..6fe1f5c96f4a 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -105,9 +105,7 @@ struct tls_device {
 enum {
 	TLS_BASE,
 	TLS_SW,
-#ifdef CONFIG_TLS_DEVICE
 	TLS_HW,
-#endif
 	TLS_HW_RECORD,
 	TLS_NUM_CONFIG,
 };
@@ -160,6 +158,7 @@ struct tls_sw_context_tx {
 	int async_capable;
 
 #define BIT_TX_SCHEDULED	0
+#define BIT_TX_CLOSING		1
 	unsigned long tx_bitmask;
 };
 
@@ -327,6 +326,7 @@ void tls_sw_close(struct sock *sk, long timeout);
 void tls_sw_free_resources_tx(struct sock *sk);
 void tls_sw_free_resources_rx(struct sock *sk);
 void tls_sw_release_resources_rx(struct sock *sk);
+void tls_sw_release_strp_rx(struct tls_context *tls_ctx);
 int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		   int nonblock, int flags, int *addr_len);
 bool tls_sw_stream_read(const struct sock *sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fc81ae18cc44..51cb19e24dd9 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -261,24 +261,9 @@ static void tls_ctx_free(struct tls_context *ctx)
 	kfree(ctx);
 }
 
-static void tls_sk_proto_close(struct sock *sk, long timeout)
+static void tls_sk_proto_cleanup(struct sock *sk,
+				 struct tls_context *ctx, long timeo)
 {
-	struct tls_context *ctx = tls_get_ctx(sk);
-	long timeo = sock_sndtimeo(sk, 0);
-	void (*sk_proto_close)(struct sock *sk, long timeout);
-	bool free_ctx = false;
-
-	lock_sock(sk);
-	sk_proto_close = ctx->sk_proto_close;
-
-	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
-		goto skip_tx_cleanup;
-
-	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
-		free_ctx = true;
-		goto skip_tx_cleanup;
-	}
-
 	if (!tls_complete_pending_work(sk, ctx, 0, &timeo))
 		tls_handle_open_record(sk, 0);
 
@@ -299,22 +284,37 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 #ifdef CONFIG_TLS_DEVICE
 	if (ctx->rx_conf == TLS_HW)
 		tls_device_offload_cleanup_rx(sk);
-
-	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW) {
-#else
-	{
 #endif
-		tls_ctx_free(ctx);
-		ctx = NULL;
+}
+
+static void tls_sk_proto_close(struct sock *sk, long timeout)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	long timeo = sock_sndtimeo(sk, 0);
+	void (*sk_proto_close)(struct sock *sk, long timeout);
+	bool free_ctx = false;
+
+	lock_sock(sk);
+	sk_proto_close = ctx->sk_proto_close;
+
+	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
+		goto skip_tx_cleanup;
+
+	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
+		free_ctx = true;
+		goto skip_tx_cleanup;
 	}
 
+	tls_sk_proto_cleanup(sk, ctx, timeo);
+
 skip_tx_cleanup:
 	release_sock(sk);
+	if (ctx->rx_conf == TLS_SW)
+		tls_sw_release_strp_rx(ctx);
 	sk_proto_close(sk, timeout);
-	/* free ctx for TLS_HW_RECORD, used by tcp_set_state
-	 * for sk->sk_prot->unhash [tls_hw_unhash]
-	 */
-	if (free_ctx)
+
+	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
+	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
 		tls_ctx_free(ctx);
 }
 
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 455a782c7658..d234a6b818e6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -473,7 +473,8 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
 		return;
 
 	/* Schedule the transmission */
-	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask) &&
+	    !test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
 		schedule_delayed_work(&ctx->tx_work.work, 1);
 }
 
@@ -2058,16 +2059,26 @@ void tls_sw_free_resources_tx(struct sock *sk)
 	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
 	struct tls_rec *rec, *tmp;
 
+	/* Set TX CLOSING bit to stop tx_work from being scheduled
+	 * while tearing down TX context. We will flush any pending
+	 * work before free'ing ctx anyways. If already set then
+	 * another call is already free'ing resources.
+	 */
+	if (test_and_set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
+		return;
+
 	/* Wait for any pending async encryptions to complete */
 	smp_store_mb(ctx->async_notify, true);
 	if (atomic_read(&ctx->encrypt_pending))
 		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
 
-	release_sock(sk);
-	cancel_delayed_work_sync(&ctx->tx_work.work);
-	lock_sock(sk);
-
-	/* Tx whatever records we can transmit and abandon the rest */
+	/* Flush work queue and then Tx whatever records we can
+	 * transmit and abandon the rest, lock_sock(sk) must be
+	 * held here. We ensure no further work is enqueue by
+	 * checking CLOSING bit before queueing new work and
+	 * setting it above.
+	 */
+	flush_delayed_work(&ctx->tx_work.work);
 	tls_tx_records(sk, -1);
 
 	/* Free up un-sent records in tx_list. First, free
@@ -2111,22 +2122,22 @@ void tls_sw_release_resources_rx(struct sock *sk)
 		write_lock_bh(&sk->sk_callback_lock);
 		sk->sk_data_ready = ctx->saved_data_ready;
 		write_unlock_bh(&sk->sk_callback_lock);
-		release_sock(sk);
-		strp_done(&ctx->strp);
-		lock_sock(sk);
 	}
 }
 
-void tls_sw_free_resources_rx(struct sock *sk)
+void tls_sw_release_strp_rx(struct tls_context *tls_ctx)
 {
-	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
 
-	tls_sw_release_resources_rx(sk);
-
+	strp_done(&ctx->strp);
 	kfree(ctx);
 }
 
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+	tls_sw_release_resources_rx(sk);
+}
+
 /* The work handler to transmitt the encrypted records in tx_list */
 static void tx_work_handler(struct work_struct *work)
 {
@@ -2140,9 +2151,14 @@ static void tx_work_handler(struct work_struct *work)
 	if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
 		return;
 
-	lock_sock(sk);
+	/* If we are running from a socket close operation then the
+	 * lock is already held so we do not need to hold it.
+	 */
+	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
+		lock_sock(sk);
 	tls_tx_records(sk, -1);
-	release_sock(sk);
+	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
+		release_sock(sk);
 }
 
 void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
@@ -2152,8 +2168,8 @@ void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
 	/* Schedule the transmission if tx list is ready */
 	if (is_tx_ready(tx_ctx) && !sk->sk_write_pending) {
 		/* Schedule the transmission */
-		if (!test_and_set_bit(BIT_TX_SCHEDULED,
-				      &tx_ctx->tx_bitmask))
+		if (!test_and_set_bit(BIT_TX_SCHEDULED, &tx_ctx->tx_bitmask) &&
+		    !test_bit(BIT_TX_CLOSING, &tx_ctx->tx_bitmask))
 			schedule_delayed_work(&tx_ctx->tx_work.work, 0);
 	}
 }


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] bpf: tls, implement unhash to avoid transition out of ESTABLISHED
  2019-06-27 17:36 [PATCH 0/2] tls, add unhash callback John Fastabend
  2019-06-27 17:36 ` [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
@ 2019-06-27 17:36 ` John Fastabend
  2019-06-27 18:16 ` [PATCH 0/2] tls, add unhash callback John Fastabend
  2 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2019-06-27 17:36 UTC (permalink / raw)
  To: daniel, jakub.kicinski, ast; +Cc: netdev, edumazet, john.fastabend, bpf

It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_disconnect() without calling into close callback. This
would allow a kTLS enabled socket to exist outside of ESTABLISHED
state which is not supported.

Solve this the same way we solved the sock{map|hash} case by adding
an unhash hook to remove tear down the TLS state.

Tested with bpf and net selftests plus ran syzkaller reproducers
for below listed issues.

Fixes: d91c3e17f75f2 ("net/tls: Only attach to sockets in ESTABLISHED state")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+4207c7f3a443366d8aa2@syzkaller.appspotmail.com
Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tls.h  |    2 ++
 net/tls/tls_main.c |   50 +++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6fe1f5c96f4a..935d65606bb3 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -264,6 +264,8 @@ struct tls_context {
 	bool in_tcp_sendpages;
 	bool pending_open_record_frags;
 
+	struct proto *sk_proto;
+
 	int (*push_pending_record)(struct sock *sk, int flags);
 
 	void (*sk_write_space)(struct sock *sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 51cb19e24dd9..e1750634a53a 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -251,11 +251,16 @@ static void tls_write_space(struct sock *sk)
 	ctx->sk_write_space(sk);
 }
 
-static void tls_ctx_free(struct tls_context *ctx)
+static void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
 	if (!ctx)
 		return;
 
+	sk->sk_prot = ctx->sk_proto;
+	icsk->icsk_ulp_data = NULL;
+
 	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
 	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
 	kfree(ctx);
@@ -287,23 +292,49 @@ static void tls_sk_proto_cleanup(struct sock *sk,
 #endif
 }
 
+static void tls_sk_proto_unhash(struct sock *sk)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	void (*sk_proto_unhash)(struct sock *sk);
+	long timeo = sock_sndtimeo(sk, 0);
+
+	if (unlikely(!ctx)) {
+		if (sk->sk_prot->unhash)
+			sk->sk_prot->unhash(sk);
+		return;
+	}
+
+	sk->sk_prot = ctx->sk_proto;
+	sk_proto_unhash = ctx->unhash;
+	tls_sk_proto_cleanup(sk, ctx, timeo);
+	if (ctx->rx_conf == TLS_SW)
+		tls_sw_release_strp_rx(ctx);
+	tls_ctx_free(sk, ctx);
+	if (sk_proto_unhash)
+		sk_proto_unhash(sk);
+}
+
 static void tls_sk_proto_close(struct sock *sk, long timeout)
 {
 	struct tls_context *ctx = tls_get_ctx(sk);
 	long timeo = sock_sndtimeo(sk, 0);
 	void (*sk_proto_close)(struct sock *sk, long timeout);
-	bool free_ctx = false;
+
+	if (unlikely(!ctx)) {
+		if (sk->sk_prot->close)
+			sk->sk_prot->close(sk, timeout);
+		return;
+	}
 
 	lock_sock(sk);
+	sk->sk_prot = ctx->sk_proto;
 	sk_proto_close = ctx->sk_proto_close;
 
 	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
 		goto skip_tx_cleanup;
 
-	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
-		free_ctx = true;
+	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
 		goto skip_tx_cleanup;
-	}
 
 	tls_sk_proto_cleanup(sk, ctx, timeo);
 
@@ -311,11 +342,12 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	release_sock(sk);
 	if (ctx->rx_conf == TLS_SW)
 		tls_sw_release_strp_rx(ctx);
-	sk_proto_close(sk, timeout);
 
 	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
 	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
-		tls_ctx_free(ctx);
+		tls_ctx_free(sk, ctx);
+	if (sk_proto_close)
+		sk_proto_close(sk, timeout);
 }
 
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
@@ -733,16 +765,19 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
 	prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_SW][TLS_BASE].sendmsg		= tls_sw_sendmsg;
 	prot[TLS_SW][TLS_BASE].sendpage		= tls_sw_sendpage;
+	prot[TLS_SW][TLS_BASE].unhash		= tls_sk_proto_unhash;
 
 	prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_BASE][TLS_SW].recvmsg		  = tls_sw_recvmsg;
 	prot[TLS_BASE][TLS_SW].stream_memory_read = tls_sw_stream_read;
 	prot[TLS_BASE][TLS_SW].close		  = tls_sk_proto_close;
+	prot[TLS_BASE][TLS_SW].unhash		= tls_sk_proto_unhash;
 
 	prot[TLS_SW][TLS_SW] = prot[TLS_SW][TLS_BASE];
 	prot[TLS_SW][TLS_SW].recvmsg		= tls_sw_recvmsg;
 	prot[TLS_SW][TLS_SW].stream_memory_read	= tls_sw_stream_read;
 	prot[TLS_SW][TLS_SW].close		= tls_sk_proto_close;
+	prot[TLS_SW][TLS_SW].unhash		= tls_sk_proto_unhash;
 
 #ifdef CONFIG_TLS_DEVICE
 	prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
@@ -793,6 +828,7 @@ static int tls_init(struct sock *sk)
 	tls_build_proto(sk);
 	ctx->tx_conf = TLS_BASE;
 	ctx->rx_conf = TLS_BASE;
+	ctx->sk_proto = sk->sk_prot;
 	update_sk_prot(sk, ctx);
 out:
 	return rc;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] tls, add unhash callback
  2019-06-27 17:36 [PATCH 0/2] tls, add unhash callback John Fastabend
  2019-06-27 17:36 ` [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
  2019-06-27 17:36 ` [PATCH 2/2] bpf: tls, implement unhash to avoid transition out of ESTABLISHED John Fastabend
@ 2019-06-27 18:16 ` John Fastabend
  2 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2019-06-27 18:16 UTC (permalink / raw)
  To: John Fastabend, daniel, jakub.kicinski, ast
  Cc: netdev, edumazet, john.fastabend, bpf

John Fastabend wrote:
> Resolve a series of splats discovered by syzbot and noted by
> Eric Dumazet. The primary problem here is we resolved an issue on
> the BPF sockmap side by adding an unhash callback. This is
> required to ensure sockmap sockets do not transition out of
> ESTABLISHED state into a LISTEN state. When we did this it
> created a case where the interaction between callbacks in TLS
> and sockmap when used together could break. This resulted in
> leaking TLS memory and potential to build loops of callbacks
> where sockmap called into TLS and TLS called back into BPF.
> 
> Additionally, TLS was releasing the sock lock and then
> reaquiring it during the tear down process which could hang
> if another sock operation happened while the lock was not
> held.
> 
> To fix this first refactor TLS code so lock is held for the
> entire teardown operation. Then add an unhash callback to ensure
> TLS can not transition from ESTABLISHED to LISTEN state. This
> transition is a similar bug to the one found and fixed previously
> in sockmap. And cleans up the callbacks to fix the syzbot
> errors.
> 
> ---
>

Jakub,

If you could test this for the offload case that would
be helpful. I don't have any hardware here. We will still need
a few fixes in the unhash/hardware case but would be good to
know we don't cause any regressions here.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-27 17:36 ` [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
@ 2019-06-27 23:44   ` Jakub Kicinski
  2019-06-28 14:12     ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2019-06-27 23:44 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

On Thu, 27 Jun 2019 10:36:42 -0700, John Fastabend wrote:
> The tls close() callback currently drops the sock lock, makes a
> cancel_delayed_work_sync() call, and then relocks the sock. This
> seems suspect at best. The lock_sock() is applied to stop concurrent
> operations on the socket while tearing the sock down. Further we
> will need to add support for unhash() shortly and this complicates
> matters because the lock may or may not be held then.
> 
> So to fix the above situation and simplify the next patch to add
> unhash this patch creates a function tls_sk_proto_cleanup() that
> tears down the socket without calling lock_sock/release_sock. In
> order to flush the workqueue then we do the following,
> 
>   - Add a new bit to ctx, BIT_TX_CLOSING that is set when the
>     tls resources are being removed.
>   - Check this bit before scheduling any new work. This way we
>     avoid queueing new work after tear down has started.
>   - With the BIT_TX_CLOSING ensuring no new work is being added
>     convert the cancel_delayed_work_sync to flush_delayed_work()
>   - Finally call tlx_tx_records() to complete any available records
>     before,
>   - releasing and removing tls ctx.
> 
> The above is implemented for the software case namely any of
> the following configurations from build_protos,
> 
>    prot[TLS_SW][TLS_BASE]
>    prot[TLS_BASE][TLS_SW]
>    prot[TLS_SW][TLS_SW]
> 
> The implication is a follow up patch is needed to resolve the
> hardware offload case.
> 
> Tested with net selftests and bpf selftests.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/net/tls.h  |    4 ++--
>  net/tls/tls_main.c |   54 ++++++++++++++++++++++++++--------------------------
>  net/tls/tls_sw.c   |   50 ++++++++++++++++++++++++++++++++----------------
>  3 files changed, 62 insertions(+), 46 deletions(-)
> 
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 4a55ce6a303f..6fe1f5c96f4a 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -105,9 +105,7 @@ struct tls_device {
>  enum {
>  	TLS_BASE,
>  	TLS_SW,
> -#ifdef CONFIG_TLS_DEVICE
>  	TLS_HW,
> -#endif
>  	TLS_HW_RECORD,
>  	TLS_NUM_CONFIG,
>  };
> @@ -160,6 +158,7 @@ struct tls_sw_context_tx {
>  	int async_capable;
>  
>  #define BIT_TX_SCHEDULED	0

BTW do you understand why we track this bit separately?  Just to avoid
the irq operations in the workqueue code?

> +#define BIT_TX_CLOSING		1

But since we do have the above, and I think it's tested everywhere,
wouldn't setting SCHEDULED without accentually scheduling have
effectively the same result?

>  	unsigned long tx_bitmask;
>  };
>  
> @@ -327,6 +326,7 @@ void tls_sw_close(struct sock *sk, long timeout);
>  void tls_sw_free_resources_tx(struct sock *sk);
>  void tls_sw_free_resources_rx(struct sock *sk);
>  void tls_sw_release_resources_rx(struct sock *sk);
> +void tls_sw_release_strp_rx(struct tls_context *tls_ctx);
>  int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  		   int nonblock, int flags, int *addr_len);
>  bool tls_sw_stream_read(const struct sock *sk);
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index fc81ae18cc44..51cb19e24dd9 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -261,24 +261,9 @@ static void tls_ctx_free(struct tls_context *ctx)
>  	kfree(ctx);
>  }
>  
> -static void tls_sk_proto_close(struct sock *sk, long timeout)
> +static void tls_sk_proto_cleanup(struct sock *sk,
> +				 struct tls_context *ctx, long timeo)
>  {
> -	struct tls_context *ctx = tls_get_ctx(sk);
> -	long timeo = sock_sndtimeo(sk, 0);
> -	void (*sk_proto_close)(struct sock *sk, long timeout);
> -	bool free_ctx = false;
> -
> -	lock_sock(sk);
> -	sk_proto_close = ctx->sk_proto_close;
> -
> -	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
> -		goto skip_tx_cleanup;
> -
> -	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
> -		free_ctx = true;
> -		goto skip_tx_cleanup;
> -	}
> -
>  	if (!tls_complete_pending_work(sk, ctx, 0, &timeo))
>  		tls_handle_open_record(sk, 0);
>  
> @@ -299,22 +284,37 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
>  #ifdef CONFIG_TLS_DEVICE
>  	if (ctx->rx_conf == TLS_HW)
>  		tls_device_offload_cleanup_rx(sk);
> -
> -	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW) {
> -#else
> -	{
>  #endif
> -		tls_ctx_free(ctx);
> -		ctx = NULL;
> +}
> +
> +static void tls_sk_proto_close(struct sock *sk, long timeout)
> +{
> +	struct tls_context *ctx = tls_get_ctx(sk);
> +	long timeo = sock_sndtimeo(sk, 0);
> +	void (*sk_proto_close)(struct sock *sk, long timeout);
> +	bool free_ctx = false;

Set but not used?

> +
> +	lock_sock(sk);
> +	sk_proto_close = ctx->sk_proto_close;
> +
> +	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
> +		goto skip_tx_cleanup;
> +
> +	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
> +		free_ctx = true;
> +		goto skip_tx_cleanup;
>  	}
>  
> +	tls_sk_proto_cleanup(sk, ctx, timeo);
> +
>  skip_tx_cleanup:
>  	release_sock(sk);
> +	if (ctx->rx_conf == TLS_SW)
> +		tls_sw_release_strp_rx(ctx);
>  	sk_proto_close(sk, timeout);
> -	/* free ctx for TLS_HW_RECORD, used by tcp_set_state
> -	 * for sk->sk_prot->unhash [tls_hw_unhash]
> -	 */
> -	if (free_ctx)
> +
> +	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
> +	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
>  		tls_ctx_free(ctx);
>  }
>  
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 455a782c7658..d234a6b818e6 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -473,7 +473,8 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
>  		return;
>  
>  	/* Schedule the transmission */
> -	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
> +	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask) &&
> +	    !test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))

Probably doesn't matter but seems like CLOSING test should be before
the test_and_set().

>  		schedule_delayed_work(&ctx->tx_work.work, 1);
>  }
>  
> @@ -2058,16 +2059,26 @@ void tls_sw_free_resources_tx(struct sock *sk)
>  	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
>  	struct tls_rec *rec, *tmp;
>  
> +	/* Set TX CLOSING bit to stop tx_work from being scheduled
> +	 * while tearing down TX context. We will flush any pending
> +	 * work before free'ing ctx anyways. If already set then
> +	 * another call is already free'ing resources.
> +	 */

Oh, can we get multiple calls here?  Is this prep for unhash?

> +	if (test_and_set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
> +		return;
> +
>  	/* Wait for any pending async encryptions to complete */
>  	smp_store_mb(ctx->async_notify, true);
>  	if (atomic_read(&ctx->encrypt_pending))
>  		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
>  
> -	release_sock(sk);
> -	cancel_delayed_work_sync(&ctx->tx_work.work);
> -	lock_sock(sk);
> -
> -	/* Tx whatever records we can transmit and abandon the rest */
> +	/* Flush work queue and then Tx whatever records we can
> +	 * transmit and abandon the rest, lock_sock(sk) must be
> +	 * held here. We ensure no further work is enqueue by
> +	 * checking CLOSING bit before queueing new work and
> +	 * setting it above.
> +	 */
> +	flush_delayed_work(&ctx->tx_work.work);
>  	tls_tx_records(sk, -1);
>  
>  	/* Free up un-sent records in tx_list. First, free
> @@ -2111,22 +2122,22 @@ void tls_sw_release_resources_rx(struct sock *sk)
>  		write_lock_bh(&sk->sk_callback_lock);
>  		sk->sk_data_ready = ctx->saved_data_ready;
>  		write_unlock_bh(&sk->sk_callback_lock);
> -		release_sock(sk);
> -		strp_done(&ctx->strp);
> -		lock_sock(sk);
>  	}
>  }
>  
> -void tls_sw_free_resources_rx(struct sock *sk)
> +void tls_sw_release_strp_rx(struct tls_context *tls_ctx)
>  {
> -	struct tls_context *tls_ctx = tls_get_ctx(sk);
>  	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
>  
> -	tls_sw_release_resources_rx(sk);
> -
> +	strp_done(&ctx->strp);
>  	kfree(ctx);
>  }
>  
> +void tls_sw_free_resources_rx(struct sock *sk)
> +{
> +	tls_sw_release_resources_rx(sk);
> +}

I don't understand the RX side well enough, but perhaps a separate
patch would make sense here?

>  /* The work handler to transmitt the encrypted records in tx_list */
>  static void tx_work_handler(struct work_struct *work)
>  {
> @@ -2140,9 +2151,14 @@ static void tx_work_handler(struct work_struct *work)
>  	if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
>  		return;
>  
> -	lock_sock(sk);
> +	/* If we are running from a socket close operation then the
> +	 * lock is already held so we do not need to hold it.
> +	 */
> +	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
> +		lock_sock(sk);

	CPU 0 (free)		CPU 1 (wq)
				test_bit()
	lock(sk)
	set_bit()
				lock(sk)
	flush_work()

No?

>  	tls_tx_records(sk, -1);
> -	release_sock(sk);
> +	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
> +		release_sock(sk);
>  }
>  
>  void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
> @@ -2152,8 +2168,8 @@ void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
>  	/* Schedule the transmission if tx list is ready */
>  	if (is_tx_ready(tx_ctx) && !sk->sk_write_pending) {
>  		/* Schedule the transmission */
> -		if (!test_and_set_bit(BIT_TX_SCHEDULED,
> -				      &tx_ctx->tx_bitmask))
> +		if (!test_and_set_bit(BIT_TX_SCHEDULED, &tx_ctx->tx_bitmask) &&
> +		    !test_bit(BIT_TX_CLOSING, &tx_ctx->tx_bitmask))
>  			schedule_delayed_work(&tx_ctx->tx_work.work, 0);
>  	}
>  }
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-27 23:44   ` Jakub Kicinski
@ 2019-06-28 14:12     ` John Fastabend
  2019-06-28 18:31       ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2019-06-28 14:12 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Thu, 27 Jun 2019 10:36:42 -0700, John Fastabend wrote:
> > The tls close() callback currently drops the sock lock, makes a
> > cancel_delayed_work_sync() call, and then relocks the sock. This
> > seems suspect at best. The lock_sock() is applied to stop concurrent
> > operations on the socket while tearing the sock down. Further we
> > will need to add support for unhash() shortly and this complicates
> > matters because the lock may or may not be held then.
> > 
> > So to fix the above situation and simplify the next patch to add
> > unhash this patch creates a function tls_sk_proto_cleanup() that
> > tears down the socket without calling lock_sock/release_sock. In
> > order to flush the workqueue then we do the following,
> > 
> >   - Add a new bit to ctx, BIT_TX_CLOSING that is set when the
> >     tls resources are being removed.
> >   - Check this bit before scheduling any new work. This way we
> >     avoid queueing new work after tear down has started.
> >   - With the BIT_TX_CLOSING ensuring no new work is being added
> >     convert the cancel_delayed_work_sync to flush_delayed_work()
> >   - Finally call tlx_tx_records() to complete any available records
> >     before,
> >   - releasing and removing tls ctx.
> > 
> > The above is implemented for the software case namely any of
> > the following configurations from build_protos,
> > 
> >    prot[TLS_SW][TLS_BASE]
> >    prot[TLS_BASE][TLS_SW]
> >    prot[TLS_SW][TLS_SW]
> > 
> > The implication is a follow up patch is needed to resolve the
> > hardware offload case.
> > 
> > Tested with net selftests and bpf selftests.
> > 
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >  include/net/tls.h  |    4 ++--
> >  net/tls/tls_main.c |   54 ++++++++++++++++++++++++++--------------------------
> >  net/tls/tls_sw.c   |   50 ++++++++++++++++++++++++++++++++----------------
> >  3 files changed, 62 insertions(+), 46 deletions(-)
> > 
> > diff --git a/include/net/tls.h b/include/net/tls.h
> > index 4a55ce6a303f..6fe1f5c96f4a 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
> > @@ -105,9 +105,7 @@ struct tls_device {
> >  enum {
> >  	TLS_BASE,
> >  	TLS_SW,
> > -#ifdef CONFIG_TLS_DEVICE
> >  	TLS_HW,
> > -#endif
> >  	TLS_HW_RECORD,
> >  	TLS_NUM_CONFIG,
> >  };
> > @@ -160,6 +158,7 @@ struct tls_sw_context_tx {
> >  	int async_capable;
> >  
> >  #define BIT_TX_SCHEDULED	0
> 
> BTW do you understand why we track this bit separately?  Just to avoid
> the irq operations in the workqueue code?
> 

Sorry not sure I understand. You mean vs simply scheduling the work
without checking the bit? Presumably its better to avoid scheduling
unnecessary work.

> > +#define BIT_TX_CLOSING		1
> 
> But since we do have the above, and I think it's tested everywhere,
> wouldn't setting SCHEDULED without accentually scheduling have
> effectively the same result?

It would block a send from calling tls_tx_records() but I guess that is
OK because this is a tear down operation and we are about to call
tls_tx_records anyways.

Sure we can do it this way might be slightly nicer to avoid checking
two bits.

> 
> >  	unsigned long tx_bitmask;
> >  };
> >  
> > @@ -327,6 +326,7 @@ void tls_sw_close(struct sock *sk, long timeout);
> >  void tls_sw_free_resources_tx(struct sock *sk);
> >  void tls_sw_free_resources_rx(struct sock *sk);
> >  void tls_sw_release_resources_rx(struct sock *sk);
> > +void tls_sw_release_strp_rx(struct tls_context *tls_ctx);
> >  int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> >  		   int nonblock, int flags, int *addr_len);
> >  bool tls_sw_stream_read(const struct sock *sk);
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index fc81ae18cc44..51cb19e24dd9 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -261,24 +261,9 @@ static void tls_ctx_free(struct tls_context *ctx)
> >  	kfree(ctx);
> >  }
> >  
> > -static void tls_sk_proto_close(struct sock *sk, long timeout)
> > +static void tls_sk_proto_cleanup(struct sock *sk,
> > +				 struct tls_context *ctx, long timeo)
> >  {
> > -	struct tls_context *ctx = tls_get_ctx(sk);
> > -	long timeo = sock_sndtimeo(sk, 0);
> > -	void (*sk_proto_close)(struct sock *sk, long timeout);
> > -	bool free_ctx = false;
> > -
> > -	lock_sock(sk);
> > -	sk_proto_close = ctx->sk_proto_close;
> > -
> > -	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
> > -		goto skip_tx_cleanup;
> > -
> > -	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
> > -		free_ctx = true;
> > -		goto skip_tx_cleanup;
> > -	}
> > -
> >  	if (!tls_complete_pending_work(sk, ctx, 0, &timeo))
> >  		tls_handle_open_record(sk, 0);
> >  
> > @@ -299,22 +284,37 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> >  #ifdef CONFIG_TLS_DEVICE
> >  	if (ctx->rx_conf == TLS_HW)
> >  		tls_device_offload_cleanup_rx(sk);
> > -
> > -	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW) {
> > -#else
> > -	{
> >  #endif
> > -		tls_ctx_free(ctx);
> > -		ctx = NULL;
> > +}
> > +
> > +static void tls_sk_proto_close(struct sock *sk, long timeout)
> > +{
> > +	struct tls_context *ctx = tls_get_ctx(sk);
> > +	long timeo = sock_sndtimeo(sk, 0);
> > +	void (*sk_proto_close)(struct sock *sk, long timeout);
> > +	bool free_ctx = false;
> 
> Set but not used?
> 

Removed in second patch but right should be removed here.

> > +
> > +	lock_sock(sk);
> > +	sk_proto_close = ctx->sk_proto_close;
> > +
> > +	if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
> > +		goto skip_tx_cleanup;
> > +
> > +	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
> > +		free_ctx = true;
> > +		goto skip_tx_cleanup;
> >  	}
> >  
> > +	tls_sk_proto_cleanup(sk, ctx, timeo);
> > +
> >  skip_tx_cleanup:
> >  	release_sock(sk);
> > +	if (ctx->rx_conf == TLS_SW)
> > +		tls_sw_release_strp_rx(ctx);
> >  	sk_proto_close(sk, timeout);
> > -	/* free ctx for TLS_HW_RECORD, used by tcp_set_state
> > -	 * for sk->sk_prot->unhash [tls_hw_unhash]
> > -	 */
> > -	if (free_ctx)
> > +
> > +	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
> > +	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
> >  		tls_ctx_free(ctx);
> >  }
> >  
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 455a782c7658..d234a6b818e6 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -473,7 +473,8 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
> >  		return;
> >  
> >  	/* Schedule the transmission */
> > -	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
> > +	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask) &&
> > +	    !test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
> 
> Probably doesn't matter but seems like CLOSING test should be before
> the test_and_set().
> 

Yea, looks like we can drop CLOSING bit and use SCHEDULED bit makes
these a bit nicer.

> >  		schedule_delayed_work(&ctx->tx_work.work, 1);
> >  }
> >  
> > @@ -2058,16 +2059,26 @@ void tls_sw_free_resources_tx(struct sock *sk)
> >  	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
> >  	struct tls_rec *rec, *tmp;
> >  
> > +	/* Set TX CLOSING bit to stop tx_work from being scheduled
> > +	 * while tearing down TX context. We will flush any pending
> > +	 * work before free'ing ctx anyways. If already set then
> > +	 * another call is already free'ing resources.
> > +	 */
> 
> Oh, can we get multiple calls here?  Is this prep for unhash?
> 

It was prep for unhash() but there is a nicer way to get this so
we can drop it and just ensure we reset the prot callbacks before.

> > +	if (test_and_set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
> > +		return;
> > +
> >  	/* Wait for any pending async encryptions to complete */
> >  	smp_store_mb(ctx->async_notify, true);
> >  	if (atomic_read(&ctx->encrypt_pending))
> >  		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
> >  
> > -	release_sock(sk);
> > -	cancel_delayed_work_sync(&ctx->tx_work.work);
> > -	lock_sock(sk);
> > -
> > -	/* Tx whatever records we can transmit and abandon the rest */
> > +	/* Flush work queue and then Tx whatever records we can
> > +	 * transmit and abandon the rest, lock_sock(sk) must be
> > +	 * held here. We ensure no further work is enqueue by
> > +	 * checking CLOSING bit before queueing new work and
> > +	 * setting it above.
> > +	 */
> > +	flush_delayed_work(&ctx->tx_work.work);
> >  	tls_tx_records(sk, -1);
> >  
> >  	/* Free up un-sent records in tx_list. First, free
> > @@ -2111,22 +2122,22 @@ void tls_sw_release_resources_rx(struct sock *sk)
> >  		write_lock_bh(&sk->sk_callback_lock);
> >  		sk->sk_data_ready = ctx->saved_data_ready;
> >  		write_unlock_bh(&sk->sk_callback_lock);
> > -		release_sock(sk);
> > -		strp_done(&ctx->strp);
> > -		lock_sock(sk);
> >  	}
> >  }
> >  
> > -void tls_sw_free_resources_rx(struct sock *sk)
> > +void tls_sw_release_strp_rx(struct tls_context *tls_ctx)
> >  {
> > -	struct tls_context *tls_ctx = tls_get_ctx(sk);
> >  	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> >  
> > -	tls_sw_release_resources_rx(sk);
> > -
> > +	strp_done(&ctx->strp);
> >  	kfree(ctx);
> >  }
> >  
> > +void tls_sw_free_resources_rx(struct sock *sk)
> > +{
> > +	tls_sw_release_resources_rx(sk);
> > +}
> 
> I don't understand the RX side well enough, but perhaps a separate
> patch would make sense here?
> 

sure. Its actually its own fix I guess.

> >  /* The work handler to transmitt the encrypted records in tx_list */
> >  static void tx_work_handler(struct work_struct *work)
> >  {
> > @@ -2140,9 +2151,14 @@ static void tx_work_handler(struct work_struct *work)
> >  	if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
> >  		return;
> >  
> > -	lock_sock(sk);
> > +	/* If we are running from a socket close operation then the
> > +	 * lock is already held so we do not need to hold it.
> > +	 */
> > +	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
> > +		lock_sock(sk);
> 
> 	CPU 0 (free)		CPU 1 (wq)
> 				test_bit()
> 	lock(sk)
> 	set_bit()
> 				lock(sk)
> 	flush_work()
> 
> No?
> 

Yeah seems possible although never seen in my testing. So I'll
move the test_bit() inside the lock and do a ctx check to ensure
still have the reference.

  CPU 0 (free)           CPU 1 (wq)

  lock(sk)
                         lock(sk)
  set_bit()
  cancel_work()
  release
                         ctx = tls_get_ctx(sk)
                         unlikely(!ctx) <- we may have free'd 
                         test_bit()
                         ...
                         release()

or

  CPU 0 (free)           CPU 1 (wq)

                         lock(sk)
  lock(sk)
                         ctx = tls_get_ctx(sk)
                         unlikely(!ctx)
                         test_bit()
                         ...
                         release()
  set_bit()
  cancel_work()
  release

> >  	tls_tx_records(sk, -1);
> > -	release_sock(sk);
> > +	if (likely(!test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)))
> > +		release_sock(sk);
> >  }
> >  
> >  void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
> > @@ -2152,8 +2168,8 @@ void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
> >  	/* Schedule the transmission if tx list is ready */
> >  	if (is_tx_ready(tx_ctx) && !sk->sk_write_pending) {
> >  		/* Schedule the transmission */
> > -		if (!test_and_set_bit(BIT_TX_SCHEDULED,
> > -				      &tx_ctx->tx_bitmask))
> > +		if (!test_and_set_bit(BIT_TX_SCHEDULED, &tx_ctx->tx_bitmask) &&
> > +		    !test_bit(BIT_TX_CLOSING, &tx_ctx->tx_bitmask))
> >  			schedule_delayed_work(&tx_ctx->tx_work.work, 0);
> >  	}
> >  }
> > 
> 

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-28 14:12     ` John Fastabend
@ 2019-06-28 18:31       ` Jakub Kicinski
  2019-06-28 19:40         ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2019-06-28 18:31 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

On Fri, 28 Jun 2019 07:12:07 -0700, John Fastabend wrote:
> Yeah seems possible although never seen in my testing. So I'll
> move the test_bit() inside the lock and do a ctx check to ensure
> still have the reference.
> 
>   CPU 0 (free)           CPU 1 (wq)
> 
>   lock(sk)
>                          lock(sk)
>   set_bit()
>   cancel_work()
>   release
>                          ctx = tls_get_ctx(sk)
>                          unlikely(!ctx) <- we may have free'd 
>                          test_bit()
>                          ...
>                          release()
> 
> or
> 
>   CPU 0 (free)           CPU 1 (wq)
> 
>                          lock(sk)
>   lock(sk)
>                          ctx = tls_get_ctx(sk)
>                          unlikely(!ctx)
>                          test_bit()
>                          ...
>                          release()
>   set_bit()
>   cancel_work()
>   release

Hmm... perhaps it's cleanest to stop the work from scheduling before we
proceed?

close():
	while (!test_and_set(SHED))
		flush();

	lock(sk);
	...

We just need to move init work, no?

FWIW I never tested his async crypto stuff, I wonder if there is a way
to convince normal CPU crypto to pretend to be async?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-28 18:31       ` Jakub Kicinski
@ 2019-06-28 19:40         ` John Fastabend
  2019-06-28 22:48           ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2019-06-28 19:40 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Fri, 28 Jun 2019 07:12:07 -0700, John Fastabend wrote:
> > Yeah seems possible although never seen in my testing. So I'll
> > move the test_bit() inside the lock and do a ctx check to ensure
> > still have the reference.
> > 
> >   CPU 0 (free)           CPU 1 (wq)
> > 
> >   lock(sk)
> >                          lock(sk)
> >   set_bit()
> >   cancel_work()
> >   release
> >                          ctx = tls_get_ctx(sk)
> >                          unlikely(!ctx) <- we may have free'd 
> >                          test_bit()
> >                          ...
> >                          release()
> > 
> > or
> > 
> >   CPU 0 (free)           CPU 1 (wq)
> > 
> >                          lock(sk)
> >   lock(sk)
> >                          ctx = tls_get_ctx(sk)
> >                          unlikely(!ctx)
> >                          test_bit()
> >                          ...
> >                          release()
> >   set_bit()
> >   cancel_work()
> >   release
> 
> Hmm... perhaps it's cleanest to stop the work from scheduling before we
> proceed?
> 
> close():
> 	while (!test_and_set(SHED))
> 		flush();
> 
> 	lock(sk);
> 	...
> 
> We just need to move init work, no?

The lock() is already held when entering unhash() side so need to
handle this case as well,

CPU 0 (free)          CPU 1 (wq)

lock(sk)              ctx = tls_get_ctx(sk) <- need to be check null ptr
sk_prot->unhash()
  set_bit()
  cancel_work()
  ...
  kfree(ctx)
unlock(sk)

but using cancel and doing an unlikely(!ctx) check should be
sufficient to handle wq. What I'm not sure how to solve now is
in patch 2 of this series unhash is still calling strp_done
with the sock lock. Maybe we need to do a deferred release
like sockmap side?

Trying to drop the lock and then grabbing it again doesn't
seem right to me seems based on comment in tcp_abort we
could potentially "race with userspace socket closes such
as tcp_close". iirc I think one of the tls splats from syzbot
looked something like this may have happened.

For now I'm considering adding a strp_cancel() op. Seeing
we are closing() the socket and tearkng down we can probably
be OK with throwing out strp results.

> 
> FWIW I never tested his async crypto stuff, I wonder if there is a way
> to convince normal CPU crypto to pretend to be async?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-28 19:40         ` John Fastabend
@ 2019-06-28 22:48           ` Jakub Kicinski
  2019-06-29  0:20             ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2019-06-28 22:48 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

On Fri, 28 Jun 2019 12:40:29 -0700, John Fastabend wrote:
> The lock() is already held when entering unhash() side so need to
> handle this case as well,
> 
> CPU 0 (free)          CPU 1 (wq)
> 
> lock(sk)              ctx = tls_get_ctx(sk) <- need to be check null ptr
> sk_prot->unhash()
>   set_bit()
>   cancel_work()
>   ...
>   kfree(ctx)
> unlock(sk)
> 
> but using cancel and doing an unlikely(!ctx) check should be
> sufficient to handle wq. 

I'm not sure we can kfree ctx, the work struct itself is in it, no?

> What I'm not sure how to solve now is
> in patch 2 of this series unhash is still calling strp_done
> with the sock lock. Maybe we need to do a deferred release
> like sockmap side?

Right, we can't do anything that sleeps in unhash, since we're holding
the spinlock there, not the "owner" lock.

> Trying to drop the lock and then grabbing it again doesn't
> seem right to me seems based on comment in tcp_abort we
> could potentially "race with userspace socket closes such
> as tcp_close". iirc I think one of the tls splats from syzbot
> looked something like this may have happened.
> 
> For now I'm considering adding a strp_cancel() op. Seeing
> we are closing() the socket and tearkng down we can probably
> be OK with throwing out strp results.

But don't we have to flush the work queues before we free ctx?  We'd
need to alloc a workqueue and schedule a work to flush the other works
and then free?

Why can't tls sockets exist outside of established state?  If shutdown
doesn't call close, perhaps we can add a shutdown callback?  It doesn't
seem to be called from BH?

Sorry for all the questions, I'm not really able to fully wrap my head
around this. I also feel like I'm missing the sockmap piece that may
be why you prefer unhash over disconnect.

FWIW Davide's ULP diag support patches will require us to most likely
free ctx with kfree_rcu(). diag only has a ref on struct sock, so if 
we want to access ctx we need RCU or to lock every socket. It's a
little bit of an abuse of RCU, because the data under our feet may
actually change, but the fields we dump will only get inited once
after ulp is installed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-28 22:48           ` Jakub Kicinski
@ 2019-06-29  0:20             ` John Fastabend
  2019-06-29  0:59               ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2019-06-29  0:20 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

Jakub Kicinski wrote:
> On Fri, 28 Jun 2019 12:40:29 -0700, John Fastabend wrote:
> > The lock() is already held when entering unhash() side so need to
> > handle this case as well,
> > 
> > CPU 0 (free)          CPU 1 (wq)
> > 
> > lock(sk)              ctx = tls_get_ctx(sk) <- need to be check null ptr
> > sk_prot->unhash()
> >   set_bit()
> >   cancel_work()
> >   ...
> >   kfree(ctx)
> > unlock(sk)
> > 
> > but using cancel and doing an unlikely(!ctx) check should be
> > sufficient to handle wq. 
> 
> I'm not sure we can kfree ctx, the work struct itself is in it, no?

should be OK as long as we have no outstanding work. So cancel_work()
needs to be cancel_work_sync() but we can't do this in unhash so
see below...

> 
> > What I'm not sure how to solve now is
> > in patch 2 of this series unhash is still calling strp_done
> > with the sock lock. Maybe we need to do a deferred release
> > like sockmap side?
> 
> Right, we can't do anything that sleeps in unhash, since we're holding
> the spinlock there, not the "owner" lock.

yep.

> 
> > Trying to drop the lock and then grabbing it again doesn't
> > seem right to me seems based on comment in tcp_abort we
> > could potentially "race with userspace socket closes such
> > as tcp_close". iirc I think one of the tls splats from syzbot
> > looked something like this may have happened.
> > 
> > For now I'm considering adding a strp_cancel() op. Seeing
> > we are closing() the socket and tearkng down we can probably
> > be OK with throwing out strp results.
> 
> But don't we have to flush the work queues before we free ctx?  We'd
> need to alloc a workqueue and schedule a work to flush the other works
> and then free?

Agree, just wrote this code now and testing it. So new flow is,

 lock(sk)
 sk_prot->unhash()
   set_bit(CLOSING)
   ...
   tls_ctx_wq_free(ctx) <- sets up a work queue to do the *sync and kfree
 unlock(sk)

FWIW sockmap has a similar problem on unhash() where it needs to
wait a RCU grace period and then do *sync operations on workqueues.
So it also schedules work to do the cleanup outside unhash() with
additional complication of waiting rcu grace period before scheduling.

The new patches are not too ugly IMO it only impacts this one
unhash() case.

> 
> Why can't tls sockets exist outside of established state?  If shutdown
> doesn't call close, perhaps we can add a shutdown callback?  It doesn't
> seem to be called from BH?
>

Because the ulp would be shared in this case,

	/* The TLS ulp is currently supported only for TCP sockets
	 * in ESTABLISHED state.
	 * Supporting sockets in LISTEN state will require us
	 * to modify the accept implementation to clone rather then
	 * share the ulp context.
	 */
	if (sk->sk_state != TCP_ESTABLISHED)
		return -ENOTSUPP;

In general I was trying to avoid modifying core TCP layer to fix
this corner case in tls.
 
> Sorry for all the questions, I'm not really able to fully wrap my head
> around this. I also feel like I'm missing the sockmap piece that may
> be why you prefer unhash over disconnect.

Yep, if we try to support listening sockets we need a some more
core infrastructure to push around ulp and user_data portions of
sockets. Its not going to be nice for stable. Also at least in TLS
and sockmap case its not really needed for any use case I know
of.

> 
> FWIW Davide's ULP diag support patches will require us to most likely
> free ctx with kfree_rcu(). diag only has a ref on struct sock, so if 
> we want to access ctx we need RCU or to lock every socket. It's a
> little bit of an abuse of RCU, because the data under our feet may
> actually change, but the fields we dump will only get inited once
> after ulp is installed.

Ah great. I'll push a v2 this afternoon so we get syzbot running
its reproducers over it and maybe it will fit into Davide's work
as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-29  0:20             ` John Fastabend
@ 2019-06-29  0:59               ` Jakub Kicinski
  2019-06-29  3:46                 ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2019-06-29  0:59 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

On Fri, 28 Jun 2019 17:20:23 -0700, John Fastabend wrote:
> > Why can't tls sockets exist outside of established state?  If shutdown
> > doesn't call close, perhaps we can add a shutdown callback?  It doesn't
> > seem to be called from BH?
> >  
> 
> Because the ulp would be shared in this case,
> 
> 	/* The TLS ulp is currently supported only for TCP sockets
> 	 * in ESTABLISHED state.
> 	 * Supporting sockets in LISTEN state will require us
> 	 * to modify the accept implementation to clone rather then
> 	 * share the ulp context.
> 	 */
> 	if (sk->sk_state != TCP_ESTABLISHED)
> 		return -ENOTSUPP;
> 
> In general I was trying to avoid modifying core TCP layer to fix
> this corner case in tls.

I see, thanks for clarifying! I was wondering if there's anything wrong
in being in CLOSE/SYN/FIN states.

> > Sorry for all the questions, I'm not really able to fully wrap my head
> > around this. I also feel like I'm missing the sockmap piece that may
> > be why you prefer unhash over disconnect.  
> 
> Yep, if we try to support listening sockets we need a some more
> core infrastructure to push around ulp and user_data portions of
> sockets. Its not going to be nice for stable. Also at least in TLS
> and sockmap case its not really needed for any use case I know
> of.

IIUC we can't go from ESTABLISHED to LISTEN without calling close() 
or disconnect() so I'm not clear on why are we hooking into unhash() 😕

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync
  2019-06-29  0:59               ` Jakub Kicinski
@ 2019-06-29  3:46                 ` Jakub Kicinski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2019-06-29  3:46 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, edumazet, bpf

On Fri, 28 Jun 2019 17:59:25 -0700, Jakub Kicinski wrote:
> > > Sorry for all the questions, I'm not really able to fully wrap my head
> > > around this. I also feel like I'm missing the sockmap piece that may
> > > be why you prefer unhash over disconnect.    
> > 
> > Yep, if we try to support listening sockets we need a some more
> > core infrastructure to push around ulp and user_data portions of
> > sockets. Its not going to be nice for stable. Also at least in TLS
> > and sockmap case its not really needed for any use case I know
> > of.  
> 
> IIUC we can't go from ESTABLISHED to LISTEN without calling close() 
> or disconnect() so I'm not clear on why are we hooking into unhash() 😕

Ah, disconnect() is also called with the socket already locked.
So no BH, but still not great..

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-29  3:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27 17:36 [PATCH 0/2] tls, add unhash callback John Fastabend
2019-06-27 17:36 ` [PATCH 1/2] tls: remove close callback sock unlock/lock and flush_sync John Fastabend
2019-06-27 23:44   ` Jakub Kicinski
2019-06-28 14:12     ` John Fastabend
2019-06-28 18:31       ` Jakub Kicinski
2019-06-28 19:40         ` John Fastabend
2019-06-28 22:48           ` Jakub Kicinski
2019-06-29  0:20             ` John Fastabend
2019-06-29  0:59               ` Jakub Kicinski
2019-06-29  3:46                 ` Jakub Kicinski
2019-06-27 17:36 ` [PATCH 2/2] bpf: tls, implement unhash to avoid transition out of ESTABLISHED John Fastabend
2019-06-27 18:16 ` [PATCH 0/2] tls, add unhash callback John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.