[PATCH net 0/2] Fix use-after-free after the TLS device goes down and up

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net 0/2] Fix use-after-free after the TLS device goes down and up
@ 2021-05-24 12:12 Maxim Mikityanskiy
  2021-05-24 12:12 ` [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU Maxim Mikityanskiy
  2021-05-24 12:12 ` [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
  0 siblings, 2 replies; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-24 12:12 UTC (permalink / raw)
  To: Boris Pismenny, John Fastabend, Daniel Borkmann, Jakub Kicinski,
	David S. Miller, Aviad Yehezkel
  Cc: Tariq Toukan, netdev, Maxim Mikityanskiy

This small series fixes a use-after-free bug in the TLS offload code.
The first patch is a preparation for the second one, and the second is
the fix itself.

Maxim Mikityanskiy (2):
  net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
  net/tls: Fix use-after-free after the TLS device goes down and up

 include/net/tls.h             | 10 +++++-
 net/tls/tls_device.c          | 60 ++++++++++++++++++++++++++++-------
 net/tls/tls_device_fallback.c |  8 +++++
 net/tls/tls_main.c            |  1 +
 4 files changed, 67 insertions(+), 12 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
  2021-05-24 12:12 [PATCH net 0/2] Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
@ 2021-05-24 12:12 ` Maxim Mikityanskiy
  2021-05-24 16:05   ` Jakub Kicinski
  2021-05-24 12:12 ` [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
  1 sibling, 1 reply; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-24 12:12 UTC (permalink / raw)
  To: Boris Pismenny, John Fastabend, Daniel Borkmann, Jakub Kicinski,
	David S. Miller, Aviad Yehezkel
  Cc: Tariq Toukan, netdev, Maxim Mikityanskiy

RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/net/tls.h    |  1 -
 net/tls/tls_device.c | 10 +++-------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 3eccb525e8f7..6531ace2a68b 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -193,7 +193,6 @@ struct tls_offload_context_tx {
 	(sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
 
 enum tls_context_flags {
-	TLS_RX_SYNC_RUNNING = 0,
 	/* Unlike RX where resync is driven entirely by the core in TX only
 	 * the driver knows when things went out of sync, so we need the flag
 	 * to be atomic.
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 76a6f8c2eec4..171752cd6910 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -680,15 +680,13 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx,
 	struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
 	struct net_device *netdev;
 
-	if (WARN_ON(test_and_set_bit(TLS_RX_SYNC_RUNNING, &tls_ctx->flags)))
-		return;
-
 	trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type);
+	rcu_read_lock();
 	netdev = READ_ONCE(tls_ctx->netdev);
 	if (netdev)
 		netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn,
 						   TLS_OFFLOAD_CTX_DIR_RX);
-	clear_bit_unlock(TLS_RX_SYNC_RUNNING, &tls_ctx->flags);
+	rcu_read_unlock();
 	TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICERESYNC);
 }
 
@@ -1300,9 +1298,7 @@ static int tls_device_down(struct net_device *netdev)
 			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
 							TLS_OFFLOAD_CTX_DIR_RX);
 		WRITE_ONCE(ctx->netdev, NULL);
-		smp_mb__before_atomic(); /* pairs with test_and_set_bit() */
-		while (test_bit(TLS_RX_SYNC_RUNNING, &ctx->flags))
-			usleep_range(10, 200);
+		synchronize_net();
 		dev_put(netdev);
 		list_del_init(&ctx->list);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-24 12:12 [PATCH net 0/2] Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
  2021-05-24 12:12 ` [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU Maxim Mikityanskiy
@ 2021-05-24 12:12 ` Maxim Mikityanskiy
  2021-05-24 16:15   ` Jakub Kicinski
  2021-05-25 17:39   ` Jakub Kicinski
  1 sibling, 2 replies; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-24 12:12 UTC (permalink / raw)
  To: Boris Pismenny, John Fastabend, Daniel Borkmann, Jakub Kicinski,
	David S. Miller, Aviad Yehezkel
  Cc: Tariq Toukan, netdev, Maxim Mikityanskiy

When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.

This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.

On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.

The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).

A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/net/tls.h             |  9 ++++++
 net/tls/tls_device.c          | 52 +++++++++++++++++++++++++++++++----
 net/tls/tls_device_fallback.c |  8 ++++++
 net/tls/tls_main.c            |  1 +
 4 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6531ace2a68b..8341a8d1e807 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -193,6 +193,11 @@ struct tls_offload_context_tx {
 	(sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
 
 enum tls_context_flags {
+	/* tls_device_down was called after the netdev went down, device state
+	 * was released, and kTLS works in software, even though rx_conf is
+	 * still TLS_HW (needed for transition).
+	 */
+	TLS_RX_DEV_DEGRADED = 0,
 	/* Unlike RX where resync is driven entirely by the core in TX only
 	 * the driver knows when things went out of sync, so we need the flag
 	 * to be atomic.
@@ -265,6 +270,7 @@ struct tls_context {
 
 	/* cache cold stuff */
 	struct proto *sk_proto;
+	struct sock *sk;
 
 	void (*sk_destruct)(struct sock *sk);
 
@@ -447,6 +453,9 @@ static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
 struct sk_buff *
 tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
 		      struct sk_buff *skb);
+struct sk_buff *
+tls_validate_xmit_skb_sw(struct sock *sk, struct net_device *dev,
+			 struct sk_buff *skb);
 
 static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk)
 {
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 171752cd6910..bd9f1567aa39 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -50,6 +50,7 @@ static void tls_device_gc_task(struct work_struct *work);
 static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
 static LIST_HEAD(tls_device_gc_list);
 static LIST_HEAD(tls_device_list);
+static LIST_HEAD(tls_device_down_list);
 static DEFINE_SPINLOCK(tls_device_lock);
 
 static void tls_device_free_ctx(struct tls_context *ctx)
@@ -759,6 +760,8 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq)
 
 	if (tls_ctx->rx_conf != TLS_HW)
 		return;
+	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags)))
+		return;
 
 	prot = &tls_ctx->prot_info;
 	rx_ctx = tls_offload_ctx_rx(tls_ctx);
@@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
 
 	ctx->sw.decrypted |= is_decrypted;
 
+	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) {
+		if (likely(is_encrypted || is_decrypted))
+			return 0;
+
+		/* After tls_device_down disables the offload, the next SKB will
+		 * likely have initial fragments decrypted, and final ones not
+		 * decrypted. We need to reencrypt that single SKB.
+		 */
+		return tls_device_reencrypt(sk, skb);
+	}
+
 	/* Return immediately if the record is either entirely plaintext or
 	 * entirely ciphertext. Otherwise handle reencrypt partially decrypted
 	 * record.
@@ -1290,6 +1304,26 @@ static int tls_device_down(struct net_device *netdev)
 	spin_unlock_irqrestore(&tls_device_lock, flags);
 
 	list_for_each_entry_safe(ctx, tmp, &list, list)	{
+		/* Stop offloaded TX and switch to the fallback.
+		 * tls_is_sk_tx_device_offloaded will return false.
+		 */
+		WRITE_ONCE(ctx->sk->sk_validate_xmit_skb, tls_validate_xmit_skb_sw);
+
+		/* Stop the RX and TX resync.
+		 * tls_dev_resync must not be called after tls_dev_del.
+		 */
+		WRITE_ONCE(ctx->netdev, NULL);
+
+		/* Start skipping the RX resync logic completely. */
+		set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags);
+
+		/* Sync with inflight packets. After this point:
+		 * TX: no non-encrypted packets will be passed to the driver.
+		 * RX: resync requests from the driver will be ignored.
+		 */
+		synchronize_net();
+
+		/* Release the offload context on the driver side. */
 		if (ctx->tx_conf == TLS_HW)
 			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
 							TLS_OFFLOAD_CTX_DIR_TX);
@@ -1297,13 +1331,21 @@ static int tls_device_down(struct net_device *netdev)
 		    !test_bit(TLS_RX_DEV_CLOSED, &ctx->flags))
 			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
 							TLS_OFFLOAD_CTX_DIR_RX);
-		WRITE_ONCE(ctx->netdev, NULL);
-		synchronize_net();
+
 		dev_put(netdev);
-		list_del_init(&ctx->list);
 
-		if (refcount_dec_and_test(&ctx->refcount))
-			tls_device_free_ctx(ctx);
+		/* Move the context to a separate list for two reasons:
+		 * 1. When the context is deallocated, list_del is called.
+		 * 2. It's no longer an offloaded context, so we don't want to
+		 *    run offload-specific code on this context.
+		 */
+		spin_lock_irqsave(&tls_device_lock, flags);
+		list_move_tail(&ctx->list, &tls_device_down_list);
+		spin_unlock_irqrestore(&tls_device_lock, flags);
+
+		/* Device contexts for RX and TX will be freed in on sk_destruct
+		 * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW.
+		 */
 	}
 
 	up_write(&device_offload_lock);
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index cacf040872c7..a72c89b48a59 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -431,6 +431,14 @@ struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
 }
 EXPORT_SYMBOL_GPL(tls_validate_xmit_skb);
 
+struct sk_buff *tls_validate_xmit_skb_sw(struct sock *sk,
+					 struct net_device *dev,
+					 struct sk_buff *skb)
+{
+	return tls_sw_fallback(sk, skb);
+}
+EXPORT_SYMBOL_GPL(tls_validate_xmit_skb_sw);
+
 struct sk_buff *tls_encrypt_skb(struct sk_buff *skb)
 {
 	return tls_sw_fallback(skb->sk, skb);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 47b7c5334c34..fde56ff49163 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -636,6 +636,7 @@ struct tls_context *tls_ctx_create(struct sock *sk)
 	mutex_init(&ctx->tx_lock);
 	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
 	ctx->sk_proto = READ_ONCE(sk->sk_prot);
+	ctx->sk = sk;
 	return ctx;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
  2021-05-24 12:12 ` [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU Maxim Mikityanskiy
@ 2021-05-24 16:05   ` Jakub Kicinski
  2021-05-25  8:52     ` Maxim Mikityanskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-24 16:05 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Mon, 24 May 2021 15:12:19 +0300 Maxim Mikityanskiy wrote:
> RCU synchronization is guaranteed to finish in finite time, unlike a
> busy loop that polls a flag. This patch is a preparation for the bugfix
> in the next patch, where the same synchronize_net() call will also be
> used to sync with the TX datapath.
> 
> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  include/net/tls.h    |  1 -
>  net/tls/tls_device.c | 10 +++-------
>  2 files changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 3eccb525e8f7..6531ace2a68b 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -193,7 +193,6 @@ struct tls_offload_context_tx {
>  	(sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
>  
>  enum tls_context_flags {
> -	TLS_RX_SYNC_RUNNING = 0,
>  	/* Unlike RX where resync is driven entirely by the core in TX only
>  	 * the driver knows when things went out of sync, so we need the flag
>  	 * to be atomic.
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index 76a6f8c2eec4..171752cd6910 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -680,15 +680,13 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx,
>  	struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
>  	struct net_device *netdev;
>  
> -	if (WARN_ON(test_and_set_bit(TLS_RX_SYNC_RUNNING, &tls_ctx->flags)))
> -		return;
> -
>  	trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type);
> +	rcu_read_lock();
>  	netdev = READ_ONCE(tls_ctx->netdev);
>  	if (netdev)
>  		netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn,
>  						   TLS_OFFLOAD_CTX_DIR_RX);

Now this can't sleep right? No bueno.

> -	clear_bit_unlock(TLS_RX_SYNC_RUNNING, &tls_ctx->flags);
> +	rcu_read_unlock();
>  	TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICERESYNC);
>  }
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-24 12:12 ` [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
@ 2021-05-24 16:15   ` Jakub Kicinski
  2021-05-25  8:52     ` Maxim Mikityanskiy
  2021-05-25 17:39   ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-24 16:15 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Mon, 24 May 2021 15:12:20 +0300 Maxim Mikityanskiy wrote:
> @@ -1290,6 +1304,26 @@ static int tls_device_down(struct net_device *netdev)
>  	spin_unlock_irqrestore(&tls_device_lock, flags);
>  
>  	list_for_each_entry_safe(ctx, tmp, &list, list)	{
> +		/* Stop offloaded TX and switch to the fallback.
> +		 * tls_is_sk_tx_device_offloaded will return false.
> +		 */
> +		WRITE_ONCE(ctx->sk->sk_validate_xmit_skb, tls_validate_xmit_skb_sw);
> +
> +		/* Stop the RX and TX resync.
> +		 * tls_dev_resync must not be called after tls_dev_del.
> +		 */
> +		WRITE_ONCE(ctx->netdev, NULL);
> +
> +		/* Start skipping the RX resync logic completely. */
> +		set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags);
> +
> +		/* Sync with inflight packets. After this point:
> +		 * TX: no non-encrypted packets will be passed to the driver.
> +		 * RX: resync requests from the driver will be ignored.
> +		 */
> +		synchronize_net();
> +
> +		/* Release the offload context on the driver side. */
>  		if (ctx->tx_conf == TLS_HW)
>  			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
>  							TLS_OFFLOAD_CTX_DIR_TX);

Can we have the Rx resync take the device_offload_lock for read instead?
Like Tx already does?

> +EXPORT_SYMBOL_GPL(tls_validate_xmit_skb_sw);

Why the export?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
  2021-05-24 16:05   ` Jakub Kicinski
@ 2021-05-25  8:52     ` Maxim Mikityanskiy
  2021-05-25 17:14       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-25  8:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On 2021-05-24 19:05, Jakub Kicinski wrote:
> On Mon, 24 May 2021 15:12:19 +0300 Maxim Mikityanskiy wrote:
>> RCU synchronization is guaranteed to finish in finite time, unlike a
>> busy loop that polls a flag. This patch is a preparation for the bugfix
>> in the next patch, where the same synchronize_net() call will also be
>> used to sync with the TX datapath.
>>
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   include/net/tls.h    |  1 -
>>   net/tls/tls_device.c | 10 +++-------
>>   2 files changed, 3 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/tls.h b/include/net/tls.h
>> index 3eccb525e8f7..6531ace2a68b 100644
>> --- a/include/net/tls.h
>> +++ b/include/net/tls.h
>> @@ -193,7 +193,6 @@ struct tls_offload_context_tx {
>>   	(sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
>>   
>>   enum tls_context_flags {
>> -	TLS_RX_SYNC_RUNNING = 0,
>>   	/* Unlike RX where resync is driven entirely by the core in TX only
>>   	 * the driver knows when things went out of sync, so we need the flag
>>   	 * to be atomic.
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 76a6f8c2eec4..171752cd6910 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -680,15 +680,13 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx,
>>   	struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
>>   	struct net_device *netdev;
>>   
>> -	if (WARN_ON(test_and_set_bit(TLS_RX_SYNC_RUNNING, &tls_ctx->flags)))
>> -		return;
>> -
>>   	trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type);
>> +	rcu_read_lock();
>>   	netdev = READ_ONCE(tls_ctx->netdev);
>>   	if (netdev)
>>   		netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn,
>>   						   TLS_OFFLOAD_CTX_DIR_RX);
> 
> Now this can't sleep right? No bueno.

No, it can't sleep under RCU. However, are you sure it was allowed to 
sleep before my change? I don't think so. Your commit e52972c11d6b 
("net/tls: replace the sleeping lock around RX resync with a bit lock") 
mentions that "RX resync may get called from soft IRQ", which 
essentially means that it can't sleep.

Furthermore, no implementations try to sleep in RX resync, as far as I 
see from reviewing the code. For example, nfp_net_tls_resync uses 
GFP_ATOMIC for RX resync and GFP_KERNEL for TX resync. 
mlx5_fpga_tls_resync_rx also uses GFP_ATOMIC.

So, I don't think I'm breaking anything with my change.

> 
>> -	clear_bit_unlock(TLS_RX_SYNC_RUNNING, &tls_ctx->flags);
>> +	rcu_read_unlock();
>>   	TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICERESYNC);
>>   }
>>   
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-24 16:15   ` Jakub Kicinski
@ 2021-05-25  8:52     ` Maxim Mikityanskiy
  2021-05-25 17:21       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-25  8:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On 2021-05-24 19:15, Jakub Kicinski wrote:
> On Mon, 24 May 2021 15:12:20 +0300 Maxim Mikityanskiy wrote:
>> @@ -1290,6 +1304,26 @@ static int tls_device_down(struct net_device *netdev)
>>   	spin_unlock_irqrestore(&tls_device_lock, flags);
>>   
>>   	list_for_each_entry_safe(ctx, tmp, &list, list)	{
>> +		/* Stop offloaded TX and switch to the fallback.
>> +		 * tls_is_sk_tx_device_offloaded will return false.
>> +		 */
>> +		WRITE_ONCE(ctx->sk->sk_validate_xmit_skb, tls_validate_xmit_skb_sw);
>> +
>> +		/* Stop the RX and TX resync.
>> +		 * tls_dev_resync must not be called after tls_dev_del.
>> +		 */
>> +		WRITE_ONCE(ctx->netdev, NULL);
>> +
>> +		/* Start skipping the RX resync logic completely. */
>> +		set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags);
>> +
>> +		/* Sync with inflight packets. After this point:
>> +		 * TX: no non-encrypted packets will be passed to the driver.
>> +		 * RX: resync requests from the driver will be ignored.
>> +		 */
>> +		synchronize_net();
>> +
>> +		/* Release the offload context on the driver side. */
>>   		if (ctx->tx_conf == TLS_HW)
>>   			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
>>   							TLS_OFFLOAD_CTX_DIR_TX);
> 
> Can we have the Rx resync take the device_offload_lock for read instead?
> Like Tx already does?

I believe you previously made this attempt in commit 38030d7cb779 
("net/tls: avoid NULL-deref on resync during device removal"), and this 
approach turned out to be problematic, as explained in commit 
e52972c11d6b ("net/tls: replace the sleeping lock around RX resync with 
a bit lock"): "RX resync may get called from soft IRQ, so we can't use 
the rwsem".

> 
>> +EXPORT_SYMBOL_GPL(tls_validate_xmit_skb_sw);
> 
> Why the export?

Because tls_validate_xmit_skb was also exported. Now I see it's needed 
for tls_validate_xmit_skb, because tls_is_sk_tx_device_offloaded needs 
its address and can be called from the drivers. There is no similar 
public function for tls_validate_xmit_skb_sw, so you are probably right 
that we don't need to export it.

> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
  2021-05-25  8:52     ` Maxim Mikityanskiy
@ 2021-05-25 17:14       ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-25 17:14 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Tue, 25 May 2021 11:52:20 +0300 Maxim Mikityanskiy wrote:
> On 2021-05-24 19:05, Jakub Kicinski wrote:
> > On Mon, 24 May 2021 15:12:19 +0300 Maxim Mikityanskiy wrote:  
> >> RCU synchronization is guaranteed to finish in finite time, unlike a
> >> busy loop that polls a flag. This patch is a preparation for the bugfix
> >> in the next patch, where the same synchronize_net() call will also be
> >> used to sync with the TX datapath.
> >>
> >> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> >> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> >> ---
> >>   include/net/tls.h    |  1 -
> >>   net/tls/tls_device.c | 10 +++-------
> >>   2 files changed, 3 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/net/tls.h b/include/net/tls.h
> >> index 3eccb525e8f7..6531ace2a68b 100644
> >> --- a/include/net/tls.h
> >> +++ b/include/net/tls.h
> >> @@ -193,7 +193,6 @@ struct tls_offload_context_tx {
> >>   	(sizeof(struct tls_offload_context_tx) + TLS_DRIVER_STATE_SIZE_TX)
> >>   
> >>   enum tls_context_flags {
> >> -	TLS_RX_SYNC_RUNNING = 0,
> >>   	/* Unlike RX where resync is driven entirely by the core in TX only
> >>   	 * the driver knows when things went out of sync, so we need the flag
> >>   	 * to be atomic.
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index 76a6f8c2eec4..171752cd6910 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -680,15 +680,13 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx,
> >>   	struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
> >>   	struct net_device *netdev;
> >>   
> >> -	if (WARN_ON(test_and_set_bit(TLS_RX_SYNC_RUNNING, &tls_ctx->flags)))
> >> -		return;
> >> -
> >>   	trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type);
> >> +	rcu_read_lock();
> >>   	netdev = READ_ONCE(tls_ctx->netdev);
> >>   	if (netdev)
> >>   		netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn,
> >>   						   TLS_OFFLOAD_CTX_DIR_RX);  
> > 
> > Now this can't sleep right? No bueno.  
> 
> No, it can't sleep under RCU. However, are you sure it was allowed to 
> sleep before my change? I don't think so. Your commit e52972c11d6b 
> ("net/tls: replace the sleeping lock around RX resync with a bit lock") 
> mentions that "RX resync may get called from soft IRQ", which 
> essentially means that it can't sleep.
> 
> Furthermore, no implementations try to sleep in RX resync, as far as I 
> see from reviewing the code. For example, nfp_net_tls_resync uses 
> GFP_ATOMIC for RX resync and GFP_KERNEL for TX resync. 
> mlx5_fpga_tls_resync_rx also uses GFP_ATOMIC.
> 
> So, I don't think I'm breaking anything with my change.

You're right.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-25  8:52     ` Maxim Mikityanskiy
@ 2021-05-25 17:21       ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-25 17:21 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Tue, 25 May 2021 11:52:31 +0300 Maxim Mikityanskiy wrote:
> > Can we have the Rx resync take the device_offload_lock for read instead?
> > Like Tx already does?  
> 
> I believe you previously made this attempt in commit 38030d7cb779 
> ("net/tls: avoid NULL-deref on resync during device removal"), and this 
> approach turned out to be problematic, as explained in commit 
> e52972c11d6b ("net/tls: replace the sleeping lock around RX resync with 
> a bit lock"): "RX resync may get called from soft IRQ, so we can't use 
> the rwsem".

If only my memory wasn't this shit.. :) Let me take a look at the patch
again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-24 12:12 ` [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
  2021-05-24 16:15   ` Jakub Kicinski
@ 2021-05-25 17:39   ` Jakub Kicinski
  2021-05-28 12:40     ` Maxim Mikityanskiy
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-25 17:39 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Mon, 24 May 2021 15:12:20 +0300 Maxim Mikityanskiy wrote:
> When a netdev with active TLS offload goes down, tls_device_down is
> called to stop the offload and tear down the TLS context. However, the
> socket stays alive, and it still points to the TLS context, which is now
> deallocated. If a netdev goes up, while the connection is still active,
> and the data flow resumes after a number of TCP retransmissions, it will
> lead to a use-after-free of the TLS context.
> 
> This commit addresses this bug by keeping the context alive until its
> normal destruction, and implements the necessary fallbacks, so that the
> connection can resume in software (non-offloaded) kTLS mode.
> 
> On the TX side tls_sw_fallback is used to encrypt all packets. The RX
> side already has all the necessary fallbacks, because receiving
> non-decrypted packets is supported. The thing needed on the RX side is
> to block resync requests, which are normally produced after receiving
> non-decrypted packets.
> 
> The necessary synchronization is implemented for a graceful teardown:
> first the fallbacks are deployed, then the driver resources are released
> (it used to be possible to have a tls_dev_resync after tls_dev_del).
> 
> A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
> mode. It's used to skip the RX resync logic completely, as it becomes
> useless, and some objects may be released (for example, resync_async,
> which is allocated and freed by the driver).
> 
> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

> @@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
>  
>  	ctx->sw.decrypted |= is_decrypted;
>  
> +	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) {

Why not put the check in tls_device_core_ctrl_rx_resync()?
Would be less code, right?

> +		if (likely(is_encrypted || is_decrypted))
> +			return 0;
> +
> +		/* After tls_device_down disables the offload, the next SKB will
> +		 * likely have initial fragments decrypted, and final ones not
> +		 * decrypted. We need to reencrypt that single SKB.
> +		 */
> +		return tls_device_reencrypt(sk, skb);
> +	}
> +
>  	/* Return immediately if the record is either entirely plaintext or
>  	 * entirely ciphertext. Otherwise handle reencrypt partially decrypted
>  	 * record.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-25 17:39   ` Jakub Kicinski
@ 2021-05-28 12:40     ` Maxim Mikityanskiy
  2021-05-28 19:44       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-28 12:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On 2021-05-25 20:39, Jakub Kicinski wrote:
> On Mon, 24 May 2021 15:12:20 +0300 Maxim Mikityanskiy wrote:
>> When a netdev with active TLS offload goes down, tls_device_down is
>> called to stop the offload and tear down the TLS context. However, the
>> socket stays alive, and it still points to the TLS context, which is now
>> deallocated. If a netdev goes up, while the connection is still active,
>> and the data flow resumes after a number of TCP retransmissions, it will
>> lead to a use-after-free of the TLS context.
>>
>> This commit addresses this bug by keeping the context alive until its
>> normal destruction, and implements the necessary fallbacks, so that the
>> connection can resume in software (non-offloaded) kTLS mode.
>>
>> On the TX side tls_sw_fallback is used to encrypt all packets. The RX
>> side already has all the necessary fallbacks, because receiving
>> non-decrypted packets is supported. The thing needed on the RX side is
>> to block resync requests, which are normally produced after receiving
>> non-decrypted packets.
>>
>> The necessary synchronization is implemented for a graceful teardown:
>> first the fallbacks are deployed, then the driver resources are released
>> (it used to be possible to have a tls_dev_resync after tls_dev_del).
>>
>> A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
>> mode. It's used to skip the RX resync logic completely, as it becomes
>> useless, and some objects may be released (for example, resync_async,
>> which is allocated and freed by the driver).
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> 
>> @@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
>>   
>>   	ctx->sw.decrypted |= is_decrypted;
>>   
>> +	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) {
> 
> Why not put the check in tls_device_core_ctrl_rx_resync()?
> Would be less code, right?

I see what you mean, and I considered this option, but I think my option 
has better readability and is more future-proof. By doing an early 
return, I skip all code irrelevant to the degraded mode, and even though 
changing ctx->resync_nh_reset won't have effect in the degraded mode, it 
will be easier for readers to understand that this part of code is not 
relevant. Furthermore, if someone decides to add more code to 
!is_encrypted branches in the future, there is a chance that the 
degraded mode will be missed from consideration. With the early return 
there is not problem, but if I follow your suggestion and do the check 
only under is_encrypted, a future contributor unfamiliar with this 
"degraded flow" might fail to add that check where it will be needed.

This was the reason I implemented it this way. What do you think?

>> +		if (likely(is_encrypted || is_decrypted))
>> +			return 0;
>> +
>> +		/* After tls_device_down disables the offload, the next SKB will
>> +		 * likely have initial fragments decrypted, and final ones not
>> +		 * decrypted. We need to reencrypt that single SKB.
>> +		 */
>> +		return tls_device_reencrypt(sk, skb);
>> +	}
>> +
>>   	/* Return immediately if the record is either entirely plaintext or
>>   	 * entirely ciphertext. Otherwise handle reencrypt partially decrypted
>>   	 * record.
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-28 12:40     ` Maxim Mikityanskiy
@ 2021-05-28 19:44       ` Jakub Kicinski
  2021-05-31 11:09         ` Maxim Mikityanskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-05-28 19:44 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Fri, 28 May 2021 15:40:38 +0300 Maxim Mikityanskiy wrote:
> >> @@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
> >>   
> >>   	ctx->sw.decrypted |= is_decrypted;
> >>   
> >> +	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) {  
> > 
> > Why not put the check in tls_device_core_ctrl_rx_resync()?
> > Would be less code, right?  
> 
> I see what you mean, and I considered this option, but I think my option 
> has better readability and is more future-proof. By doing an early 
> return, I skip all code irrelevant to the degraded mode, and even though 
> changing ctx->resync_nh_reset won't have effect in the degraded mode, it 
> will be easier for readers to understand that this part of code is not 
> relevant. Furthermore, if someone decides to add more code to 
> !is_encrypted branches in the future, there is a chance that the 
> degraded mode will be missed from consideration. With the early return 
> there is not problem, but if I follow your suggestion and do the check 
> only under is_encrypted, a future contributor unfamiliar with this 
> "degraded flow" might fail to add that check where it will be needed.
> 
> This was the reason I implemented it this way. What do you think?

In general "someone may miss this in the future" is better served by
adding a test case than code duplication. But we don't have infra to 
fake-offload TLS so I don't feel strongly. You can keep as is if that's
your preference.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-28 19:44       ` Jakub Kicinski
@ 2021-05-31 11:09         ` Maxim Mikityanskiy
  2021-06-01  4:16           ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Maxim Mikityanskiy @ 2021-05-31 11:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On 2021-05-28 22:44, Jakub Kicinski wrote:
> On Fri, 28 May 2021 15:40:38 +0300 Maxim Mikityanskiy wrote:
>>>> @@ -961,6 +964,17 @@ int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
>>>>    
>>>>    	ctx->sw.decrypted |= is_decrypted;
>>>>    
>>>> +	if (unlikely(test_bit(TLS_RX_DEV_DEGRADED, &tls_ctx->flags))) {
>>>
>>> Why not put the check in tls_device_core_ctrl_rx_resync()?
>>> Would be less code, right?
>>
>> I see what you mean, and I considered this option, but I think my option
>> has better readability and is more future-proof. By doing an early
>> return, I skip all code irrelevant to the degraded mode, and even though
>> changing ctx->resync_nh_reset won't have effect in the degraded mode, it
>> will be easier for readers to understand that this part of code is not
>> relevant. Furthermore, if someone decides to add more code to
>> !is_encrypted branches in the future, there is a chance that the
>> degraded mode will be missed from consideration. With the early return
>> there is not problem, but if I follow your suggestion and do the check
>> only under is_encrypted, a future contributor unfamiliar with this
>> "degraded flow" might fail to add that check where it will be needed.
>>
>> This was the reason I implemented it this way. What do you think?
> 
> In general "someone may miss this in the future" is better served by
> adding a test case than code duplication. But we don't have infra to
> fake-offload TLS so I don't feel strongly. You can keep as is if that's
> your preference.

Yeah, I agree that we would benefit from having unit tests for such 
flows. But as we don't have the needed infrastructure, and you are fine 
with the current implementation, I'll keep it. The only thing I need to 
fix when resubmitting is the unneeded EXPORT_SYMBOL_GPL, right?

Thanks for reviewing.

> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up
  2021-05-31 11:09         ` Maxim Mikityanskiy
@ 2021-06-01  4:16           ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-06-01  4:16 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Aviad Yehezkel, Tariq Toukan, netdev

On Mon, 31 May 2021 14:09:56 +0300 Maxim Mikityanskiy wrote:
> On 2021-05-28 22:44, Jakub Kicinski wrote:
> > In general "someone may miss this in the future" is better served by
> > adding a test case than code duplication. But we don't have infra to
> > fake-offload TLS so I don't feel strongly. You can keep as is if that's
> > your preference.  
> 
> Yeah, I agree that we would benefit from having unit tests for such 
> flows. But as we don't have the needed infrastructure, and you are fine 
> with the current implementation, I'll keep it. The only thing I need to 
> fix when resubmitting is the unneeded EXPORT_SYMBOL_GPL, right?

Yup!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-01  4:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24 12:12 [PATCH net 0/2] Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
2021-05-24 12:12 ` [PATCH net 1/2] net/tls: Replace TLS_RX_SYNC_RUNNING with RCU Maxim Mikityanskiy
2021-05-24 16:05   ` Jakub Kicinski
2021-05-25  8:52     ` Maxim Mikityanskiy
2021-05-25 17:14       ` Jakub Kicinski
2021-05-24 12:12 ` [PATCH net 2/2] net/tls: Fix use-after-free after the TLS device goes down and up Maxim Mikityanskiy
2021-05-24 16:15   ` Jakub Kicinski
2021-05-25  8:52     ` Maxim Mikityanskiy
2021-05-25 17:21       ` Jakub Kicinski
2021-05-25 17:39   ` Jakub Kicinski
2021-05-28 12:40     ` Maxim Mikityanskiy
2021-05-28 19:44       ` Jakub Kicinski
2021-05-31 11:09         ` Maxim Mikityanskiy
2021-06-01  4:16           ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.