netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/3] net/tls: fix device surprise removal with offload
@ 2019-05-22  2:01 Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal Jakub Kicinski
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-22  2:01 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, borisp, alexei.starovoitov, Jakub Kicinski

Hi!

This series fixes two issues with device surprise removal.
First we need to take a read lock around resync, otherwise
netdev notifier handler may clear the structures from under
our feet.

Secondly we need to be careful about the interpretation
of device features.  Offload has to be properly cleaned
up even if the TLS device features got cleared after
connection state was installed.

Jakub Kicinski (3):
  net/tls: avoid NULL-deref on resync during device removal
  net/tls: fix state removal with feature flags off
  net/tls: don't ignore netdev notifications if no TLS features

 net/tls/tls_device.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal
  2019-05-22  2:01 [PATCH net 0/3] net/tls: fix device surprise removal with offload Jakub Kicinski
@ 2019-05-22  2:02 ` Jakub Kicinski
  2019-05-30 23:40   ` Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 2/3] net/tls: fix state removal with feature flags off Jakub Kicinski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-22  2:02 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, borisp, alexei.starovoitov, Jakub Kicinski,
	Dirk van der Merwe

When netdev with active kTLS sockets in unregistered
notifier callback walks the offloaded sockets and
cleans up offload state.  RX data may still be processed,
however, and if resync was requested prior to device
removal we would hit a NULL pointer dereference on
ctx->netdev use.

Make sure resync is under the device offload lock
and NULL-check the netdev pointer.

This should be safe, because the pointer is set to
NULL either in the netdev notifier (under said lock)
or when socket is completely dead and no resync can
happen.

The other access to ctx->netdev in tls_validate_xmit_skb()
does not dereference the pointer, it just checks it against
other device pointer, so it should be pretty safe (perhaps
we can add a READ_ONCE/WRITE_ONCE there, if paranoid).

Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index ca54a7c7ec81..aa33e4accc32 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -553,8 +553,8 @@ void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
 void handle_device_resync(struct sock *sk, u32 seq, u64 rcd_sn)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	struct net_device *netdev = tls_ctx->netdev;
 	struct tls_offload_context_rx *rx_ctx;
+	struct net_device *netdev;
 	u32 is_req_pending;
 	s64 resync_req;
 	u32 req_seq;
@@ -568,10 +568,15 @@ void handle_device_resync(struct sock *sk, u32 seq, u64 rcd_sn)
 	is_req_pending = resync_req;
 
 	if (unlikely(is_req_pending) && req_seq == seq &&
-	    atomic64_try_cmpxchg(&rx_ctx->resync_req, &resync_req, 0))
-		netdev->tlsdev_ops->tls_dev_resync_rx(netdev, sk,
-						      seq + TLS_HEADER_SIZE - 1,
-						      rcd_sn);
+	    atomic64_try_cmpxchg(&rx_ctx->resync_req, &resync_req, 0)) {
+		seq += TLS_HEADER_SIZE - 1;
+		down_read(&device_offload_lock);
+		netdev = tls_ctx->netdev;
+		if (netdev)
+			netdev->tlsdev_ops->tls_dev_resync_rx(netdev, sk, seq,
+							      rcd_sn);
+		up_read(&device_offload_lock);
+	}
 }
 
 static int tls_device_reencrypt(struct sock *sk, struct sk_buff *skb)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 2/3] net/tls: fix state removal with feature flags off
  2019-05-22  2:01 [PATCH net 0/3] net/tls: fix device surprise removal with offload Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal Jakub Kicinski
@ 2019-05-22  2:02 ` Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 3/3] net/tls: don't ignore netdev notifications if no TLS features Jakub Kicinski
  2019-05-22 19:28 ` [PATCH net 0/3] net/tls: fix device surprise removal with offload David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-22  2:02 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, borisp, alexei.starovoitov, Jakub Kicinski,
	Dirk van der Merwe

TLS offload drivers shouldn't (and currently don't) block
the TLS offload feature changes based on whether there are
active offloaded connections or not.

This seems to be a good idea, because we want the admin to
be able to disable the TLS offload at any time, and there
is no clean way of disabling it for active connections
(TX side is quite problematic).  So if features are cleared
existing connections will stay offloaded until they close,
and new connections will not attempt offload to a given
device.

However, the offload state removal handling is currently
broken if feature flags get cleared while there are
active TLS offloads.

RX side will completely bail from cleanup, even on normal
remove path, leaving device state dangling, potentially
causing issues when the 5-tuple is reused.  It will also
fail to release the netdev reference.

Remove the RX-side warning message, in next release cycle
it should be printed when features are disabled, rather
than when connection dies, but for that we need a more
efficient method of finding connection of a given netdev
(a'la BPF offload code).

Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index aa33e4accc32..07650446e892 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -939,12 +939,6 @@ void tls_device_offload_cleanup_rx(struct sock *sk)
 	if (!netdev)
 		goto out;
 
-	if (!(netdev->features & NETIF_F_HW_TLS_RX)) {
-		pr_err_ratelimited("%s: device is missing NETIF_F_HW_TLS_RX cap\n",
-				   __func__);
-		goto out;
-	}
-
 	netdev->tlsdev_ops->tls_dev_del(netdev, tls_ctx,
 					TLS_OFFLOAD_CTX_DIR_RX);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 3/3] net/tls: don't ignore netdev notifications if no TLS features
  2019-05-22  2:01 [PATCH net 0/3] net/tls: fix device surprise removal with offload Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal Jakub Kicinski
  2019-05-22  2:02 ` [PATCH net 2/3] net/tls: fix state removal with feature flags off Jakub Kicinski
@ 2019-05-22  2:02 ` Jakub Kicinski
  2019-05-22 19:28 ` [PATCH net 0/3] net/tls: fix device surprise removal with offload David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-22  2:02 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, borisp, alexei.starovoitov, Jakub Kicinski,
	Dirk van der Merwe

On device surprise removal path (the notifier) we can't
bail just because the features are disabled.  They may
have been enabled during the lifetime of the device.
This bug leads to leaking netdev references and
use-after-frees if there are active connections while
device features are cleared.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 07650446e892..b95c408fd771 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -997,7 +997,8 @@ static int tls_dev_event(struct notifier_block *this, unsigned long event,
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 
-	if (!(dev->features & (NETIF_F_HW_TLS_RX | NETIF_F_HW_TLS_TX)))
+	if (!dev->tlsdev_ops &&
+	    !(dev->features & (NETIF_F_HW_TLS_RX | NETIF_F_HW_TLS_TX)))
 		return NOTIFY_DONE;
 
 	switch (event) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 0/3] net/tls: fix device surprise removal with offload
  2019-05-22  2:01 [PATCH net 0/3] net/tls: fix device surprise removal with offload Jakub Kicinski
                   ` (2 preceding siblings ...)
  2019-05-22  2:02 ` [PATCH net 3/3] net/tls: don't ignore netdev notifications if no TLS features Jakub Kicinski
@ 2019-05-22 19:28 ` David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-05-22 19:28 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers, borisp, alexei.starovoitov

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 21 May 2019 19:01:59 -0700

> This series fixes two issues with device surprise removal.
> First we need to take a read lock around resync, otherwise
> netdev notifier handler may clear the structures from under
> our feet.
> 
> Secondly we need to be careful about the interpretation
> of device features.  Offload has to be properly cleaned
> up even if the TLS device features got cleared after
> connection state was installed.

Series applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal
  2019-05-22  2:02 ` [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal Jakub Kicinski
@ 2019-05-30 23:40   ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-30 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, borisp, alexei.starovoitov, Dirk van der Merwe

On Tue, 21 May 2019 19:02:00 -0700, Jakub Kicinski wrote:
> When netdev with active kTLS sockets in unregistered
> notifier callback walks the offloaded sockets and
> cleans up offload state.  RX data may still be processed,
> however, and if resync was requested prior to device
> removal we would hit a NULL pointer dereference on
> ctx->netdev use.
> 
> Make sure resync is under the device offload lock
> and NULL-check the netdev pointer.
> 
> This should be safe, because the pointer is set to
> NULL either in the netdev notifier (under said lock)
> or when socket is completely dead and no resync can
> happen.
> 
> The other access to ctx->netdev in tls_validate_xmit_skb()
> does not dereference the pointer, it just checks it against
> other device pointer, so it should be pretty safe (perhaps
> we can add a READ_ONCE/WRITE_ONCE there, if paranoid).
> 
> Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
> ---
>  net/tls/tls_device.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index ca54a7c7ec81..aa33e4accc32 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -553,8 +553,8 @@ void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
>  void handle_device_resync(struct sock *sk, u32 seq, u64 rcd_sn)
>  {
>  	struct tls_context *tls_ctx = tls_get_ctx(sk);
> -	struct net_device *netdev = tls_ctx->netdev;
>  	struct tls_offload_context_rx *rx_ctx;
> +	struct net_device *netdev;
>  	u32 is_req_pending;
>  	s64 resync_req;
>  	u32 req_seq;
> @@ -568,10 +568,15 @@ void handle_device_resync(struct sock *sk, u32 seq, u64 rcd_sn)
>  	is_req_pending = resync_req;
>  
>  	if (unlikely(is_req_pending) && req_seq == seq &&
> -	    atomic64_try_cmpxchg(&rx_ctx->resync_req, &resync_req, 0))
> -		netdev->tlsdev_ops->tls_dev_resync_rx(netdev, sk,
> -						      seq + TLS_HEADER_SIZE - 1,
> -						      rcd_sn);
> +	    atomic64_try_cmpxchg(&rx_ctx->resync_req, &resync_req, 0)) {
> +		seq += TLS_HEADER_SIZE - 1;
> +		down_read(&device_offload_lock);

Sorry this may actually cause a sleep in atomic, turns out resync may
get called directly from softirq under certain conditions.

Would it be possible to drop this from stable?  I can post a revert +
new fix (probably on a refcount..) or should I post an incremental fix?

> +		netdev = tls_ctx->netdev;
> +		if (netdev)
> +			netdev->tlsdev_ops->tls_dev_resync_rx(netdev, sk, seq,
> +							      rcd_sn);
> +		up_read(&device_offload_lock);
> +	}
>  }
>  
>  static int tls_device_reencrypt(struct sock *sk, struct sk_buff *skb)


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-30 23:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22  2:01 [PATCH net 0/3] net/tls: fix device surprise removal with offload Jakub Kicinski
2019-05-22  2:02 ` [PATCH net 1/3] net/tls: avoid NULL-deref on resync during device removal Jakub Kicinski
2019-05-30 23:40   ` Jakub Kicinski
2019-05-22  2:02 ` [PATCH net 2/3] net/tls: fix state removal with feature flags off Jakub Kicinski
2019-05-22  2:02 ` [PATCH net 3/3] net/tls: don't ignore netdev notifications if no TLS features Jakub Kicinski
2019-05-22 19:28 ` [PATCH net 0/3] net/tls: fix device surprise removal with offload David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).