All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes
@ 2019-08-26  6:10 Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	bjorn.topel, jonathan.lemon, syzbot+c82697e3043781e08802,
	hdanton, i.maximets

This is a four patch series of various barrier, {READ, WRITE}_ONCE
cleanups in the AF_XDP socket code. More details can be found in the
corresponding commit message.

For an AF_XDP socket, most control plane operations are done under the
control mutex (struct xdp_sock, mutex), but there are some places
where members of the struct is read outside the control mutex. This,
as pointed out by Daniel in [1], requires proper {READ,
WRITE}_ONCE-correctness [2] [3]. To address this, and to simplify the
code, the state variable (introduced by Ilya), is now used a point of
synchronization ("is the socket in a valid state, or not").


Thanks,
Björn

[1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
[2] https://lwn.net/Articles/793253/
[3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

v1->v2:
  Removed redundant dev check. (Jonathan)

Björn Töpel (4):
  xsk: avoid store-tearing when assigning queues
  xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  xsk: avoid store-tearing when assigning umem
  xsk: lock the control mutex in sock_diag interface

 net/xdp/xsk.c      | 61 +++++++++++++++++++++++++++++++---------------
 net/xdp/xsk_diag.c |  3 +++
 2 files changed, 45 insertions(+), 19 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues
  2019-08-26  6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
@ 2019-08-26  6:10 ` Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets

From: Björn Töpel <bjorn.topel@intel.com>

Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
potential store-tearing. These members are read outside of the control
mutex in the mmap implementation.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..f3351013c2a5 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -409,7 +409,7 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
 
 	/* Make sure queue is ready before it can be seen by others */
 	smp_wmb();
-	*queue = q;
+	WRITE_ONCE(*queue, q);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26  6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
@ 2019-08-26  6:10 ` Björn Töpel
  2019-08-26 15:24   ` Ilya Maximets
                     ` (2 more replies)
  2019-08-26  6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel
  3 siblings, 3 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets

From: Björn Töpel <bjorn.topel@intel.com>

The state variable was read, and written outside the control mutex
(struct xdp_sock, mutex), without proper barriers and {READ,
WRITE}_ONCE correctness.

In this commit this issue is addressed, and the state member is now
used a point of synchronization whether the socket is setup correctly
or not.

This also fixes a race, found by syzcaller, in xsk_poll() where umem
could be accessed when stale.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f3351013c2a5..8fafa3ce3ae6 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	return err;
 }
 
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	if (READ_ONCE(xs->state) == XSK_BOUND) {
+		/* Matches smp_wmb() in bind(). */
+		smp_rmb();
+		return true;
+	}
+	return false;
+}
+
 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 len;
 
+	if (!xsk_is_bound(xs))
+		return -EINVAL;
+
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
@@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 
-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(!(xs->dev->flags & IFF_UP)))
 		return -ENETDOWN;
@@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
 			     struct poll_table_struct *wait)
 {
 	unsigned int mask = datagram_poll(file, sock, wait);
-	struct sock *sk = sock->sk;
-	struct xdp_sock *xs = xdp_sk(sk);
-	struct net_device *dev = xs->dev;
-	struct xdp_umem *umem = xs->umem;
+	struct xdp_sock *xs = xdp_sk(sock->sk);
+	struct net_device *dev;
+	struct xdp_umem *umem;
+
+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
+	dev = xs->dev;
+	umem = xs->umem;
 
 	if (umem->need_wakeup)
 		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
@@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
 
-	if (!dev || xs->state != XSK_BOUND)
+	if (xs->state != XSK_BOUND)
 		return;
-
-	xs->state = XSK_UNBOUND;
+	WRITE_ONCE(xs->state, XSK_UNBOUND);
 
 	/* Wait for driver to stop using the xdp socket. */
 	xdp_del_sk_umem(xs->umem, xs);
@@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
 	local_bh_enable();
 
 	xsk_delete_from_maps(xs);
+	mutex_lock(&xs->mutex);
 	xsk_unbind_dev(xs);
+	mutex_unlock(&xs->mutex);
 
 	xskq_destroy(xs->rx);
 	xskq_destroy(xs->tx);
@@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		}
 
 		umem_xs = xdp_sk(sock->sk);
-		if (!umem_xs->umem) {
-			/* No umem to inherit. */
+		if (!xsk_is_bound(umem_xs)) {
 			err = -EBADF;
 			sockfd_put(sock);
 			goto out_unlock;
-		} else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
+		}
+		if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
 			err = -EINVAL;
 			sockfd_put(sock);
 			goto out_unlock;
 		}
-
 		xdp_get_umem(umem_xs->umem);
-		xs->umem = umem_xs->umem;
+		WRITE_ONCE(xs->umem, umem_xs->umem);
 		sockfd_put(sock);
 	} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
 		err = -EINVAL;
@@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	xdp_add_sk_umem(xs->umem, xs);
 
 out_unlock:
-	if (err)
+	if (err) {
 		dev_put(dev);
-	else
-		xs->state = XSK_BOUND;
+	} else {
+		/* Matches smp_rmb() in bind() for shared umem
+		 * sockets, and xsk_is_bound().
+		 */
+		smp_wmb();
+		WRITE_ONCE(xs->state, XSK_BOUND);
+	}
 out_release:
 	mutex_unlock(&xs->mutex);
 	rtnl_unlock();
@@ -869,7 +892,7 @@ static int xsk_mmap(struct file *file, struct socket *sock,
 	unsigned long pfn;
 	struct page *qpg;
 
-	if (xs->state != XSK_READY)
+	if (READ_ONCE(xs->state) != XSK_READY)
 		return -EBUSY;
 
 	if (offset == XDP_PGOFF_RX_RING) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem
  2019-08-26  6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
@ 2019-08-26  6:10 ` Björn Töpel
  2019-08-26  6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel
  3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets

From: Björn Töpel <bjorn.topel@intel.com>

The umem member of struct xdp_sock is read outside of the control
mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
potentional store-tearing.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fafa3ce3ae6..e3e99ee5631b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -716,7 +716,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 
 		/* Make sure umem is ready before it can be seen by others */
 		smp_wmb();
-		xs->umem = umem;
+		WRITE_ONCE(xs->umem, umem);
 		mutex_unlock(&xs->mutex);
 		return 0;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface
  2019-08-26  6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
                   ` (2 preceding siblings ...)
  2019-08-26  6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
@ 2019-08-26  6:10 ` Björn Töpel
  3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets

From: Björn Töpel <bjorn.topel@intel.com>

When accessing the members of an XDP socket, the control mutex should
be held. This commit fixes that.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk_diag.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..c8f4f11edbbc 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -97,6 +97,7 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	msg->xdiag_ino = sk_ino;
 	sock_diag_save_cookie(sk, msg->xdiag_cookie);
 
+	mutex_lock(&xs->mutex);
 	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
 		goto out_nlmsg_trim;
 
@@ -117,10 +118,12 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	    sock_diag_put_meminfo(sk, nlskb, XDP_DIAG_MEMINFO))
 		goto out_nlmsg_trim;
 
+	mutex_unlock(&xs->mutex);
 	nlmsg_end(nlskb, nlh);
 	return 0;
 
 out_nlmsg_trim:
+	mutex_unlock(&xs->mutex);
 	nlmsg_cancel(nlskb, nlh);
 	return -EMSGSIZE;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
@ 2019-08-26 15:24   ` Ilya Maximets
  2019-08-26 16:34     ` Björn Töpel
  2019-08-26 17:54   ` Jonathan Lemon
  2019-09-03 15:22   ` Daniel Borkmann
  2 siblings, 1 reply; 11+ messages in thread
From: Ilya Maximets @ 2019-08-26 15:24 UTC (permalink / raw)
  To: Björn Töpel, ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton

On 26.08.2019 9:10, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
> 
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
> 
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
> 
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 40 insertions(+), 17 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..8fafa3ce3ae6 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  	return err;
>  }
>  
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> +	if (READ_ONCE(xs->state) == XSK_BOUND) {
> +		/* Matches smp_wmb() in bind(). */
> +		smp_rmb();
> +		return true;
> +	}
> +	return false;
> +}
> +
>  int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>  {
>  	u32 len;
>  
> +	if (!xsk_is_bound(xs))
> +		return -EINVAL;
> +
>  	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>  		return -EINVAL;
>  
> @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>  	struct sock *sk = sock->sk;
>  	struct xdp_sock *xs = xdp_sk(sk);
>  
> -	if (unlikely(!xs->dev))
> +	if (unlikely(!xsk_is_bound(xs)))
>  		return -ENXIO;
>  	if (unlikely(!(xs->dev->flags & IFF_UP)))
>  		return -ENETDOWN;
> @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
>  			     struct poll_table_struct *wait)
>  {
>  	unsigned int mask = datagram_poll(file, sock, wait);
> -	struct sock *sk = sock->sk;
> -	struct xdp_sock *xs = xdp_sk(sk);
> -	struct net_device *dev = xs->dev;
> -	struct xdp_umem *umem = xs->umem;
> +	struct xdp_sock *xs = xdp_sk(sock->sk);
> +	struct net_device *dev;
> +	struct xdp_umem *umem;
> +
> +	if (unlikely(!xsk_is_bound(xs)))
> +		return mask;
> +
> +	dev = xs->dev;
> +	umem = xs->umem;
>  
>  	if (umem->need_wakeup)
>  		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
>  {
>  	struct net_device *dev = xs->dev;
>  
> -	if (!dev || xs->state != XSK_BOUND)
> +	if (xs->state != XSK_BOUND)
>  		return;
> -
> -	xs->state = XSK_UNBOUND;
> +	WRITE_ONCE(xs->state, XSK_UNBOUND);
>  
>  	/* Wait for driver to stop using the xdp socket. */
>  	xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
>  	local_bh_enable();
>  
>  	xsk_delete_from_maps(xs);
> +	mutex_lock(&xs->mutex);
>  	xsk_unbind_dev(xs);
> +	mutex_unlock(&xs->mutex);
>  
>  	xskq_destroy(xs->rx);
>  	xskq_destroy(xs->tx);
> @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  		}
>  
>  		umem_xs = xdp_sk(sock->sk);
> -		if (!umem_xs->umem) {
> -			/* No umem to inherit. */
> +		if (!xsk_is_bound(umem_xs)) {

This changes the error code a bit.
Previously:
   umem exists + xs unbound    --> EINVAL
   no umem     + xs unbound    --> EBADF
   xs bound to different dev/q --> EINVAL

With this change:
   umem exists + xs unbound    --> EBADF
   no umem     + xs unbound    --> EBADF
   xs bound to different dev/q --> EINVAL

Just a note. Not sure if this is important.


>  			err = -EBADF;
>  			sockfd_put(sock);
>  			goto out_unlock;
> -		} else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> +		}
> +		if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
>  			err = -EINVAL;
>  			sockfd_put(sock);
>  			goto out_unlock;
>  		}
> -
>  		xdp_get_umem(umem_xs->umem);
> -		xs->umem = umem_xs->umem;
> +		WRITE_ONCE(xs->umem, umem_xs->umem);
>  		sockfd_put(sock);
>  	} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
>  		err = -EINVAL;
> @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>  	xdp_add_sk_umem(xs->umem, xs);
>  
>  out_unlock:
> -	if (err)
> +	if (err) {
>  		dev_put(dev);
> -	else
> -		xs->state = XSK_BOUND;
> +	} else {
> +		/* Matches smp_rmb() in bind() for shared umem
> +		 * sockets, and xsk_is_bound().
> +		 */
> +		smp_wmb();
> +		WRITE_ONCE(xs->state, XSK_BOUND);
> +	}
>  out_release:
>  	mutex_unlock(&xs->mutex);
>  	rtnl_unlock();
> @@ -869,7 +892,7 @@ static int xsk_mmap(struct file *file, struct socket *sock,
>  	unsigned long pfn;
>  	struct page *qpg;
>  
> -	if (xs->state != XSK_READY)
> +	if (READ_ONCE(xs->state) != XSK_READY)
>  		return -EBUSY;
>  
>  	if (offset == XDP_PGOFF_RX_RING) {
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26 15:24   ` Ilya Maximets
@ 2019-08-26 16:34     ` Björn Töpel
  2019-08-26 17:57       ` Jonathan Lemon
  0 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 16:34 UTC (permalink / raw)
  To: Ilya Maximets, Björn Töpel, ast, daniel, netdev
  Cc: magnus.karlsson, magnus.karlsson, bpf, jonathan.lemon,
	syzbot+c82697e3043781e08802, hdanton

On 2019-08-26 17:24, Ilya Maximets wrote:
> This changes the error code a bit.
> Previously:
>     umem exists + xs unbound    --> EINVAL
>     no umem     + xs unbound    --> EBADF
>     xs bound to different dev/q --> EINVAL
> 
> With this change:
>     umem exists + xs unbound    --> EBADF
>     no umem     + xs unbound    --> EBADF
>     xs bound to different dev/q --> EINVAL
> 
> Just a note. Not sure if this is important.
> 

Note that this is for *shared* umem, so it's very seldom used. Still,
you're right, that strictly this is an uapi break, but I'd vote for the
change still. I find it hard to see that anyone relies on EINVAL/EBADF
for shared umem bind.

Opinions? :-)


Björn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
  2019-08-26 15:24   ` Ilya Maximets
@ 2019-08-26 17:54   ` Jonathan Lemon
  2019-09-03 15:22   ` Daniel Borkmann
  2 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-26 17:54 UTC (permalink / raw)
  To: Björn Töpel
  Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
	magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
	i.maximets



On 25 Aug 2019, at 23:10, Björn Töpel wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP 
> rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26 16:34     ` Björn Töpel
@ 2019-08-26 17:57       ` Jonathan Lemon
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-26 17:57 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Ilya Maximets, Björn Töpel, ast, daniel, netdev,
	magnus.karlsson, magnus.karlsson, bpf,
	syzbot+c82697e3043781e08802, hdanton



On 26 Aug 2019, at 9:34, Björn Töpel wrote:

> On 2019-08-26 17:24, Ilya Maximets wrote:
>> This changes the error code a bit.
>> Previously:
>>     umem exists + xs unbound    --> EINVAL
>>     no umem     + xs unbound    --> EBADF
>>     xs bound to different dev/q --> EINVAL
>>
>> With this change:
>>     umem exists + xs unbound    --> EBADF
>>     no umem     + xs unbound    --> EBADF
>>     xs bound to different dev/q --> EINVAL
>>
>> Just a note. Not sure if this is important.
>>
>
> Note that this is for *shared* umem, so it's very seldom used. Still,
> you're right, that strictly this is an uapi break, but I'd vote for the
> change still. I find it hard to see that anyone relies on EINVAL/EBADF
> for shared umem bind.
>
> Opinions? :-)

I'd agree - if it isn't documented somewhere, it's not an API break. :)
-- 
Jonathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
  2019-08-26 15:24   ` Ilya Maximets
  2019-08-26 17:54   ` Jonathan Lemon
@ 2019-09-03 15:22   ` Daniel Borkmann
  2019-09-03 15:26     ` Björn Töpel
  2 siblings, 1 reply; 11+ messages in thread
From: Daniel Borkmann @ 2019-09-03 15:22 UTC (permalink / raw)
  To: Björn Töpel, ast, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets

On 8/26/19 8:10 AM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
> 
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
> 
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
> 
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Sorry for the delay.

> ---
>   net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 40 insertions(+), 17 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..8fafa3ce3ae6 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>   	return err;
>   }
>   
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> +	if (READ_ONCE(xs->state) == XSK_BOUND) {
> +		/* Matches smp_wmb() in bind(). */
> +		smp_rmb();
> +		return true;
> +	}
> +	return false;
> +}
> +
>   int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	u32 len;
>   
> +	if (!xsk_is_bound(xs))
> +		return -EINVAL;
> +
>   	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>   		return -EINVAL;
>   
> @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>   	struct sock *sk = sock->sk;
>   	struct xdp_sock *xs = xdp_sk(sk);
>   
> -	if (unlikely(!xs->dev))
> +	if (unlikely(!xsk_is_bound(xs)))
>   		return -ENXIO;
>   	if (unlikely(!(xs->dev->flags & IFF_UP)))
>   		return -ENETDOWN;
> @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
>   			     struct poll_table_struct *wait)
>   {
>   	unsigned int mask = datagram_poll(file, sock, wait);
> -	struct sock *sk = sock->sk;
> -	struct xdp_sock *xs = xdp_sk(sk);
> -	struct net_device *dev = xs->dev;
> -	struct xdp_umem *umem = xs->umem;
> +	struct xdp_sock *xs = xdp_sk(sock->sk);
> +	struct net_device *dev;
> +	struct xdp_umem *umem;
> +
> +	if (unlikely(!xsk_is_bound(xs)))
> +		return mask;
> +
> +	dev = xs->dev;
> +	umem = xs->umem;
>   
>   	if (umem->need_wakeup)
>   		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
>   {
>   	struct net_device *dev = xs->dev;
>   
> -	if (!dev || xs->state != XSK_BOUND)
> +	if (xs->state != XSK_BOUND)
>   		return;
> -
> -	xs->state = XSK_UNBOUND;
> +	WRITE_ONCE(xs->state, XSK_UNBOUND);
>   
>   	/* Wait for driver to stop using the xdp socket. */
>   	xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
>   	local_bh_enable();
>   
>   	xsk_delete_from_maps(xs);
> +	mutex_lock(&xs->mutex);
>   	xsk_unbind_dev(xs);
> +	mutex_unlock(&xs->mutex);
>   
>   	xskq_destroy(xs->rx);
>   	xskq_destroy(xs->tx);
> @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   		}
>   
>   		umem_xs = xdp_sk(sock->sk);
> -		if (!umem_xs->umem) {
> -			/* No umem to inherit. */
> +		if (!xsk_is_bound(umem_xs)) {
>   			err = -EBADF;
>   			sockfd_put(sock);
>   			goto out_unlock;
> -		} else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> +		}
> +		if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
>   			err = -EINVAL;
>   			sockfd_put(sock);
>   			goto out_unlock;
>   		}
> -
>   		xdp_get_umem(umem_xs->umem);
> -		xs->umem = umem_xs->umem;
> +		WRITE_ONCE(xs->umem, umem_xs->umem);
>   		sockfd_put(sock);
>   	} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
>   		err = -EINVAL;
> @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>   	xdp_add_sk_umem(xs->umem, xs);
>   
>   out_unlock:
> -	if (err)
> +	if (err) {
>   		dev_put(dev);
> -	else
> -		xs->state = XSK_BOUND;
> +	} else {
> +		/* Matches smp_rmb() in bind() for shared umem
> +		 * sockets, and xsk_is_bound().
> +		 */
> +		smp_wmb();

You write with what this barrier matches/pairs, but would be useful for readers
to have an explanation against what it protects. I presume to have things like
xs->umem public as you seem to guard it behind xsk_is_bound() in xsk_poll() and
other cases? Would be great to have a detailed analysis of all this e.g. in the
commit message so one wouldn't need to guess; right now it feels this is doing
many things at once and w/o further explanation of why READ_ONCE() or others are
omitted sometimes. Would be great to get a lot more clarity into this, perhaps
splitting it up a bit might also help.

> +		WRITE_ONCE(xs->state, XSK_BOUND);
> +	}

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  2019-09-03 15:22   ` Daniel Borkmann
@ 2019-09-03 15:26     ` Björn Töpel
  0 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-09-03 15:26 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Netdev, Björn Töpel, Karlsson,
	Magnus, Magnus Karlsson, bpf, Jonathan Lemon,
	syzbot+c82697e3043781e08802, Hillf Danton, Ilya Maximets

On Tue, 3 Sep 2019 at 17:22, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/26/19 8:10 AM, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The state variable was read, and written outside the control mutex
> > (struct xdp_sock, mutex), without proper barriers and {READ,
> > WRITE}_ONCE correctness.
> >
> > In this commit this issue is addressed, and the state member is now
> > used a point of synchronization whether the socket is setup correctly
> > or not.
> >
> > This also fixes a race, found by syzcaller, in xsk_poll() where umem
> > could be accessed when stale.
> >
> > Suggested-by: Hillf Danton <hdanton@sina.com>
> > Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> > Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Sorry for the delay.
>
> > ---
> >   net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
> >   1 file changed, 40 insertions(+), 17 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f3351013c2a5..8fafa3ce3ae6 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> >       return err;
> >   }
> >
> > +static bool xsk_is_bound(struct xdp_sock *xs)
> > +{
> > +     if (READ_ONCE(xs->state) == XSK_BOUND) {
> > +             /* Matches smp_wmb() in bind(). */
> > +             smp_rmb();
> > +             return true;
> > +     }
> > +     return false;
> > +}
> > +
> >   int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> >   {
> >       u32 len;
> >
> > +     if (!xsk_is_bound(xs))
> > +             return -EINVAL;
> > +
> >       if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> >               return -EINVAL;
> >
> > @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >       struct sock *sk = sock->sk;
> >       struct xdp_sock *xs = xdp_sk(sk);
> >
> > -     if (unlikely(!xs->dev))
> > +     if (unlikely(!xsk_is_bound(xs)))
> >               return -ENXIO;
> >       if (unlikely(!(xs->dev->flags & IFF_UP)))
> >               return -ENETDOWN;
> > @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> >                            struct poll_table_struct *wait)
> >   {
> >       unsigned int mask = datagram_poll(file, sock, wait);
> > -     struct sock *sk = sock->sk;
> > -     struct xdp_sock *xs = xdp_sk(sk);
> > -     struct net_device *dev = xs->dev;
> > -     struct xdp_umem *umem = xs->umem;
> > +     struct xdp_sock *xs = xdp_sk(sock->sk);
> > +     struct net_device *dev;
> > +     struct xdp_umem *umem;
> > +
> > +     if (unlikely(!xsk_is_bound(xs)))
> > +             return mask;
> > +
> > +     dev = xs->dev;
> > +     umem = xs->umem;
> >
> >       if (umem->need_wakeup)
> >               dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> > @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> >   {
> >       struct net_device *dev = xs->dev;
> >
> > -     if (!dev || xs->state != XSK_BOUND)
> > +     if (xs->state != XSK_BOUND)
> >               return;
> > -
> > -     xs->state = XSK_UNBOUND;
> > +     WRITE_ONCE(xs->state, XSK_UNBOUND);
> >
> >       /* Wait for driver to stop using the xdp socket. */
> >       xdp_del_sk_umem(xs->umem, xs);
> > @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
> >       local_bh_enable();
> >
> >       xsk_delete_from_maps(xs);
> > +     mutex_lock(&xs->mutex);
> >       xsk_unbind_dev(xs);
> > +     mutex_unlock(&xs->mutex);
> >
> >       xskq_destroy(xs->rx);
> >       xskq_destroy(xs->tx);
> > @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >               }
> >
> >               umem_xs = xdp_sk(sock->sk);
> > -             if (!umem_xs->umem) {
> > -                     /* No umem to inherit. */
> > +             if (!xsk_is_bound(umem_xs)) {
> >                       err = -EBADF;
> >                       sockfd_put(sock);
> >                       goto out_unlock;
> > -             } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> > +             }
> > +             if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> >                       err = -EINVAL;
> >                       sockfd_put(sock);
> >                       goto out_unlock;
> >               }
> > -
> >               xdp_get_umem(umem_xs->umem);
> > -             xs->umem = umem_xs->umem;
> > +             WRITE_ONCE(xs->umem, umem_xs->umem);
> >               sockfd_put(sock);
> >       } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> >               err = -EINVAL;
> > @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> >       xdp_add_sk_umem(xs->umem, xs);
> >
> >   out_unlock:
> > -     if (err)
> > +     if (err) {
> >               dev_put(dev);
> > -     else
> > -             xs->state = XSK_BOUND;
> > +     } else {
> > +             /* Matches smp_rmb() in bind() for shared umem
> > +              * sockets, and xsk_is_bound().
> > +              */
> > +             smp_wmb();
>
> You write with what this barrier matches/pairs, but would be useful for readers
> to have an explanation against what it protects. I presume to have things like
> xs->umem public as you seem to guard it behind xsk_is_bound() in xsk_poll() and
> other cases? Would be great to have a detailed analysis of all this e.g. in the
> commit message so one wouldn't need to guess; right now it feels this is doing
> many things at once and w/o further explanation of why READ_ONCE() or others are
> omitted sometimes. Would be great to get a lot more clarity into this, perhaps
> splitting it up a bit might also help.
>

I'll address that. Thanks for the review!


Björn

> > +             WRITE_ONCE(xs->state, XSK_BOUND);
> > +     }
>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-03 15:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26  6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
2019-08-26  6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
2019-08-26  6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
2019-08-26 15:24   ` Ilya Maximets
2019-08-26 16:34     ` Björn Töpel
2019-08-26 17:57       ` Jonathan Lemon
2019-08-26 17:54   ` Jonathan Lemon
2019-09-03 15:22   ` Daniel Borkmann
2019-09-03 15:26     ` Björn Töpel
2019-08-26  6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
2019-08-26  6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.