* [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes
@ 2019-08-26 6:10 Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 6:10 UTC (permalink / raw)
To: ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
bjorn.topel, jonathan.lemon, syzbot+c82697e3043781e08802,
hdanton, i.maximets
This is a four patch series of various barrier, {READ, WRITE}_ONCE
cleanups in the AF_XDP socket code. More details can be found in the
corresponding commit message.
For an AF_XDP socket, most control plane operations are done under the
control mutex (struct xdp_sock, mutex), but there are some places
where members of the struct is read outside the control mutex. This,
as pointed out by Daniel in [1], requires proper {READ,
WRITE}_ONCE-correctness [2] [3]. To address this, and to simplify the
code, the state variable (introduced by Ilya), is now used a point of
synchronization ("is the socket in a valid state, or not").
Thanks,
Björn
[1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
[2] https://lwn.net/Articles/793253/
[3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
v1->v2:
Removed redundant dev check. (Jonathan)
Björn Töpel (4):
xsk: avoid store-tearing when assigning queues
xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
xsk: avoid store-tearing when assigning umem
xsk: lock the control mutex in sock_diag interface
net/xdp/xsk.c | 61 +++++++++++++++++++++++++++++++---------------
net/xdp/xsk_diag.c | 3 +++
2 files changed, 45 insertions(+), 19 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues
2019-08-26 6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
@ 2019-08-26 6:10 ` Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 6:10 UTC (permalink / raw)
To: ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
From: Björn Töpel <bjorn.topel@intel.com>
Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
potential store-tearing. These members are read outside of the control
mutex in the mmap implementation.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..f3351013c2a5 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -409,7 +409,7 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
/* Make sure queue is ready before it can be seen by others */
smp_wmb();
- *queue = q;
+ WRITE_ONCE(*queue, q);
return 0;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
@ 2019-08-26 6:10 ` Björn Töpel
2019-08-26 15:24 ` Ilya Maximets
` (2 more replies)
2019-08-26 6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel
3 siblings, 3 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 6:10 UTC (permalink / raw)
To: ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
From: Björn Töpel <bjorn.topel@intel.com>
The state variable was read, and written outside the control mutex
(struct xdp_sock, mutex), without proper barriers and {READ,
WRITE}_ONCE correctness.
In this commit this issue is addressed, and the state member is now
used a point of synchronization whether the socket is setup correctly
or not.
This also fixes a race, found by syzcaller, in xsk_poll() where umem
could be accessed when stale.
Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
1 file changed, 40 insertions(+), 17 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f3351013c2a5..8fafa3ce3ae6 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
return err;
}
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+ if (READ_ONCE(xs->state) == XSK_BOUND) {
+ /* Matches smp_wmb() in bind(). */
+ smp_rmb();
+ return true;
+ }
+ return false;
+}
+
int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
{
u32 len;
+ if (!xsk_is_bound(xs))
+ return -EINVAL;
+
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
return -EINVAL;
@@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
struct sock *sk = sock->sk;
struct xdp_sock *xs = xdp_sk(sk);
- if (unlikely(!xs->dev))
+ if (unlikely(!xsk_is_bound(xs)))
return -ENXIO;
if (unlikely(!(xs->dev->flags & IFF_UP)))
return -ENETDOWN;
@@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait)
{
unsigned int mask = datagram_poll(file, sock, wait);
- struct sock *sk = sock->sk;
- struct xdp_sock *xs = xdp_sk(sk);
- struct net_device *dev = xs->dev;
- struct xdp_umem *umem = xs->umem;
+ struct xdp_sock *xs = xdp_sk(sock->sk);
+ struct net_device *dev;
+ struct xdp_umem *umem;
+
+ if (unlikely(!xsk_is_bound(xs)))
+ return mask;
+
+ dev = xs->dev;
+ umem = xs->umem;
if (umem->need_wakeup)
dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
@@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
{
struct net_device *dev = xs->dev;
- if (!dev || xs->state != XSK_BOUND)
+ if (xs->state != XSK_BOUND)
return;
-
- xs->state = XSK_UNBOUND;
+ WRITE_ONCE(xs->state, XSK_UNBOUND);
/* Wait for driver to stop using the xdp socket. */
xdp_del_sk_umem(xs->umem, xs);
@@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
local_bh_enable();
xsk_delete_from_maps(xs);
+ mutex_lock(&xs->mutex);
xsk_unbind_dev(xs);
+ mutex_unlock(&xs->mutex);
xskq_destroy(xs->rx);
xskq_destroy(xs->tx);
@@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
}
umem_xs = xdp_sk(sock->sk);
- if (!umem_xs->umem) {
- /* No umem to inherit. */
+ if (!xsk_is_bound(umem_xs)) {
err = -EBADF;
sockfd_put(sock);
goto out_unlock;
- } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
+ }
+ if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
err = -EINVAL;
sockfd_put(sock);
goto out_unlock;
}
-
xdp_get_umem(umem_xs->umem);
- xs->umem = umem_xs->umem;
+ WRITE_ONCE(xs->umem, umem_xs->umem);
sockfd_put(sock);
} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
err = -EINVAL;
@@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
xdp_add_sk_umem(xs->umem, xs);
out_unlock:
- if (err)
+ if (err) {
dev_put(dev);
- else
- xs->state = XSK_BOUND;
+ } else {
+ /* Matches smp_rmb() in bind() for shared umem
+ * sockets, and xsk_is_bound().
+ */
+ smp_wmb();
+ WRITE_ONCE(xs->state, XSK_BOUND);
+ }
out_release:
mutex_unlock(&xs->mutex);
rtnl_unlock();
@@ -869,7 +892,7 @@ static int xsk_mmap(struct file *file, struct socket *sock,
unsigned long pfn;
struct page *qpg;
- if (xs->state != XSK_READY)
+ if (READ_ONCE(xs->state) != XSK_READY)
return -EBUSY;
if (offset == XDP_PGOFF_RX_RING) {
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem
2019-08-26 6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
@ 2019-08-26 6:10 ` Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel
3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 6:10 UTC (permalink / raw)
To: ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
From: Björn Töpel <bjorn.topel@intel.com>
The umem member of struct xdp_sock is read outside of the control
mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
potentional store-tearing.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fafa3ce3ae6..e3e99ee5631b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -716,7 +716,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
/* Make sure umem is ready before it can be seen by others */
smp_wmb();
- xs->umem = umem;
+ WRITE_ONCE(xs->umem, umem);
mutex_unlock(&xs->mutex);
return 0;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface
2019-08-26 6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
` (2 preceding siblings ...)
2019-08-26 6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
@ 2019-08-26 6:10 ` Björn Töpel
3 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 6:10 UTC (permalink / raw)
To: ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
From: Björn Töpel <bjorn.topel@intel.com>
When accessing the members of an XDP socket, the control mutex should
be held. This commit fixes that.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk_diag.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..c8f4f11edbbc 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -97,6 +97,7 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
msg->xdiag_ino = sk_ino;
sock_diag_save_cookie(sk, msg->xdiag_cookie);
+ mutex_lock(&xs->mutex);
if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
goto out_nlmsg_trim;
@@ -117,10 +118,12 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
sock_diag_put_meminfo(sk, nlskb, XDP_DIAG_MEMINFO))
goto out_nlmsg_trim;
+ mutex_unlock(&xs->mutex);
nlmsg_end(nlskb, nlh);
return 0;
out_nlmsg_trim:
+ mutex_unlock(&xs->mutex);
nlmsg_cancel(nlskb, nlh);
return -EMSGSIZE;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
@ 2019-08-26 15:24 ` Ilya Maximets
2019-08-26 16:34 ` Björn Töpel
2019-08-26 17:54 ` Jonathan Lemon
2019-09-03 15:22 ` Daniel Borkmann
2 siblings, 1 reply; 11+ messages in thread
From: Ilya Maximets @ 2019-08-26 15:24 UTC (permalink / raw)
To: Björn Töpel, ast, daniel, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton
On 26.08.2019 9:10, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 40 insertions(+), 17 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..8fafa3ce3ae6 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + if (READ_ONCE(xs->state) == XSK_BOUND) {
> + /* Matches smp_wmb() in bind(). */
> + smp_rmb();
> + return true;
> + }
> + return false;
> +}
> +
> int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> {
> u32 len;
>
> + if (!xsk_is_bound(xs))
> + return -EINVAL;
> +
> if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> return -EINVAL;
>
> @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> - if (unlikely(!xs->dev))
> + if (unlikely(!xsk_is_bound(xs)))
> return -ENXIO;
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> return -ENETDOWN;
> @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> struct poll_table_struct *wait)
> {
> unsigned int mask = datagram_poll(file, sock, wait);
> - struct sock *sk = sock->sk;
> - struct xdp_sock *xs = xdp_sk(sk);
> - struct net_device *dev = xs->dev;
> - struct xdp_umem *umem = xs->umem;
> + struct xdp_sock *xs = xdp_sk(sock->sk);
> + struct net_device *dev;
> + struct xdp_umem *umem;
> +
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> + dev = xs->dev;
> + umem = xs->umem;
>
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (xs->state != XSK_BOUND)
> return;
> -
> - xs->state = XSK_UNBOUND;
> + WRITE_ONCE(xs->state, XSK_UNBOUND);
>
> /* Wait for driver to stop using the xdp socket. */
> xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
> local_bh_enable();
>
> xsk_delete_from_maps(xs);
> + mutex_lock(&xs->mutex);
> xsk_unbind_dev(xs);
> + mutex_unlock(&xs->mutex);
>
> xskq_destroy(xs->rx);
> xskq_destroy(xs->tx);
> @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> }
>
> umem_xs = xdp_sk(sock->sk);
> - if (!umem_xs->umem) {
> - /* No umem to inherit. */
> + if (!xsk_is_bound(umem_xs)) {
This changes the error code a bit.
Previously:
umem exists + xs unbound --> EINVAL
no umem + xs unbound --> EBADF
xs bound to different dev/q --> EINVAL
With this change:
umem exists + xs unbound --> EBADF
no umem + xs unbound --> EBADF
xs bound to different dev/q --> EINVAL
Just a note. Not sure if this is important.
> err = -EBADF;
> sockfd_put(sock);
> goto out_unlock;
> - } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> + }
> + if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> err = -EINVAL;
> sockfd_put(sock);
> goto out_unlock;
> }
> -
> xdp_get_umem(umem_xs->umem);
> - xs->umem = umem_xs->umem;
> + WRITE_ONCE(xs->umem, umem_xs->umem);
> sockfd_put(sock);
> } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> err = -EINVAL;
> @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> xdp_add_sk_umem(xs->umem, xs);
>
> out_unlock:
> - if (err)
> + if (err) {
> dev_put(dev);
> - else
> - xs->state = XSK_BOUND;
> + } else {
> + /* Matches smp_rmb() in bind() for shared umem
> + * sockets, and xsk_is_bound().
> + */
> + smp_wmb();
> + WRITE_ONCE(xs->state, XSK_BOUND);
> + }
> out_release:
> mutex_unlock(&xs->mutex);
> rtnl_unlock();
> @@ -869,7 +892,7 @@ static int xsk_mmap(struct file *file, struct socket *sock,
> unsigned long pfn;
> struct page *qpg;
>
> - if (xs->state != XSK_READY)
> + if (READ_ONCE(xs->state) != XSK_READY)
> return -EBUSY;
>
> if (offset == XDP_PGOFF_RX_RING) {
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 15:24 ` Ilya Maximets
@ 2019-08-26 16:34 ` Björn Töpel
2019-08-26 17:57 ` Jonathan Lemon
0 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2019-08-26 16:34 UTC (permalink / raw)
To: Ilya Maximets, Björn Töpel, ast, daniel, netdev
Cc: magnus.karlsson, magnus.karlsson, bpf, jonathan.lemon,
syzbot+c82697e3043781e08802, hdanton
On 2019-08-26 17:24, Ilya Maximets wrote:
> This changes the error code a bit.
> Previously:
> umem exists + xs unbound --> EINVAL
> no umem + xs unbound --> EBADF
> xs bound to different dev/q --> EINVAL
>
> With this change:
> umem exists + xs unbound --> EBADF
> no umem + xs unbound --> EBADF
> xs bound to different dev/q --> EINVAL
>
> Just a note. Not sure if this is important.
>
Note that this is for *shared* umem, so it's very seldom used. Still,
you're right, that strictly this is an uapi break, but I'd vote for the
change still. I find it hard to see that anyone relies on EINVAL/EBADF
for shared umem bind.
Opinions? :-)
Björn
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
2019-08-26 15:24 ` Ilya Maximets
@ 2019-08-26 17:54 ` Jonathan Lemon
2019-09-03 15:22 ` Daniel Borkmann
2 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-26 17:54 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
On 25 Aug 2019, at 23:10, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP
> rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 16:34 ` Björn Töpel
@ 2019-08-26 17:57 ` Jonathan Lemon
0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Lemon @ 2019-08-26 17:57 UTC (permalink / raw)
To: Björn Töpel
Cc: Ilya Maximets, Björn Töpel, ast, daniel, netdev,
magnus.karlsson, magnus.karlsson, bpf,
syzbot+c82697e3043781e08802, hdanton
On 26 Aug 2019, at 9:34, Björn Töpel wrote:
> On 2019-08-26 17:24, Ilya Maximets wrote:
>> This changes the error code a bit.
>> Previously:
>> umem exists + xs unbound --> EINVAL
>> no umem + xs unbound --> EBADF
>> xs bound to different dev/q --> EINVAL
>>
>> With this change:
>> umem exists + xs unbound --> EBADF
>> no umem + xs unbound --> EBADF
>> xs bound to different dev/q --> EINVAL
>>
>> Just a note. Not sure if this is important.
>>
>
> Note that this is for *shared* umem, so it's very seldom used. Still,
> you're right, that strictly this is an uapi break, but I'd vote for the
> change still. I find it hard to see that anyone relies on EINVAL/EBADF
> for shared umem bind.
>
> Opinions? :-)
I'd agree - if it isn't documented somewhere, it's not an API break. :)
--
Jonathan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
2019-08-26 15:24 ` Ilya Maximets
2019-08-26 17:54 ` Jonathan Lemon
@ 2019-09-03 15:22 ` Daniel Borkmann
2019-09-03 15:26 ` Björn Töpel
2 siblings, 1 reply; 11+ messages in thread
From: Daniel Borkmann @ 2019-09-03 15:22 UTC (permalink / raw)
To: Björn Töpel, ast, netdev
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
On 8/26/19 8:10 AM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Sorry for the delay.
> ---
> net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 40 insertions(+), 17 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..8fafa3ce3ae6 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + if (READ_ONCE(xs->state) == XSK_BOUND) {
> + /* Matches smp_wmb() in bind(). */
> + smp_rmb();
> + return true;
> + }
> + return false;
> +}
> +
> int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> {
> u32 len;
>
> + if (!xsk_is_bound(xs))
> + return -EINVAL;
> +
> if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> return -EINVAL;
>
> @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> - if (unlikely(!xs->dev))
> + if (unlikely(!xsk_is_bound(xs)))
> return -ENXIO;
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> return -ENETDOWN;
> @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> struct poll_table_struct *wait)
> {
> unsigned int mask = datagram_poll(file, sock, wait);
> - struct sock *sk = sock->sk;
> - struct xdp_sock *xs = xdp_sk(sk);
> - struct net_device *dev = xs->dev;
> - struct xdp_umem *umem = xs->umem;
> + struct xdp_sock *xs = xdp_sk(sock->sk);
> + struct net_device *dev;
> + struct xdp_umem *umem;
> +
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> + dev = xs->dev;
> + umem = xs->umem;
>
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (xs->state != XSK_BOUND)
> return;
> -
> - xs->state = XSK_UNBOUND;
> + WRITE_ONCE(xs->state, XSK_UNBOUND);
>
> /* Wait for driver to stop using the xdp socket. */
> xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
> local_bh_enable();
>
> xsk_delete_from_maps(xs);
> + mutex_lock(&xs->mutex);
> xsk_unbind_dev(xs);
> + mutex_unlock(&xs->mutex);
>
> xskq_destroy(xs->rx);
> xskq_destroy(xs->tx);
> @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> }
>
> umem_xs = xdp_sk(sock->sk);
> - if (!umem_xs->umem) {
> - /* No umem to inherit. */
> + if (!xsk_is_bound(umem_xs)) {
> err = -EBADF;
> sockfd_put(sock);
> goto out_unlock;
> - } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> + }
> + if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> err = -EINVAL;
> sockfd_put(sock);
> goto out_unlock;
> }
> -
> xdp_get_umem(umem_xs->umem);
> - xs->umem = umem_xs->umem;
> + WRITE_ONCE(xs->umem, umem_xs->umem);
> sockfd_put(sock);
> } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> err = -EINVAL;
> @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> xdp_add_sk_umem(xs->umem, xs);
>
> out_unlock:
> - if (err)
> + if (err) {
> dev_put(dev);
> - else
> - xs->state = XSK_BOUND;
> + } else {
> + /* Matches smp_rmb() in bind() for shared umem
> + * sockets, and xsk_is_bound().
> + */
> + smp_wmb();
You write with what this barrier matches/pairs, but would be useful for readers
to have an explanation against what it protects. I presume to have things like
xs->umem public as you seem to guard it behind xsk_is_bound() in xsk_poll() and
other cases? Would be great to have a detailed analysis of all this e.g. in the
commit message so one wouldn't need to guess; right now it feels this is doing
many things at once and w/o further explanation of why READ_ONCE() or others are
omitted sometimes. Would be great to get a lot more clarity into this, perhaps
splitting it up a bit might also help.
> + WRITE_ONCE(xs->state, XSK_BOUND);
> + }
Thanks,
Daniel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
2019-09-03 15:22 ` Daniel Borkmann
@ 2019-09-03 15:26 ` Björn Töpel
0 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2019-09-03 15:26 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, Netdev, Björn Töpel, Karlsson,
Magnus, Magnus Karlsson, bpf, Jonathan Lemon,
syzbot+c82697e3043781e08802, Hillf Danton, Ilya Maximets
On Tue, 3 Sep 2019 at 17:22, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/26/19 8:10 AM, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The state variable was read, and written outside the control mutex
> > (struct xdp_sock, mutex), without proper barriers and {READ,
> > WRITE}_ONCE correctness.
> >
> > In this commit this issue is addressed, and the state member is now
> > used a point of synchronization whether the socket is setup correctly
> > or not.
> >
> > This also fixes a race, found by syzcaller, in xsk_poll() where umem
> > could be accessed when stale.
> >
> > Suggested-by: Hillf Danton <hdanton@sina.com>
> > Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> > Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Sorry for the delay.
>
> > ---
> > net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 40 insertions(+), 17 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f3351013c2a5..8fafa3ce3ae6 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> > return err;
> > }
> >
> > +static bool xsk_is_bound(struct xdp_sock *xs)
> > +{
> > + if (READ_ONCE(xs->state) == XSK_BOUND) {
> > + /* Matches smp_wmb() in bind(). */
> > + smp_rmb();
> > + return true;
> > + }
> > + return false;
> > +}
> > +
> > int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> > {
> > u32 len;
> >
> > + if (!xsk_is_bound(xs))
> > + return -EINVAL;
> > +
> > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > return -EINVAL;
> >
> > @@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> > struct sock *sk = sock->sk;
> > struct xdp_sock *xs = xdp_sk(sk);
> >
> > - if (unlikely(!xs->dev))
> > + if (unlikely(!xsk_is_bound(xs)))
> > return -ENXIO;
> > if (unlikely(!(xs->dev->flags & IFF_UP)))
> > return -ENETDOWN;
> > @@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> > struct poll_table_struct *wait)
> > {
> > unsigned int mask = datagram_poll(file, sock, wait);
> > - struct sock *sk = sock->sk;
> > - struct xdp_sock *xs = xdp_sk(sk);
> > - struct net_device *dev = xs->dev;
> > - struct xdp_umem *umem = xs->umem;
> > + struct xdp_sock *xs = xdp_sk(sock->sk);
> > + struct net_device *dev;
> > + struct xdp_umem *umem;
> > +
> > + if (unlikely(!xsk_is_bound(xs)))
> > + return mask;
> > +
> > + dev = xs->dev;
> > + umem = xs->umem;
> >
> > if (umem->need_wakeup)
> > dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> > @@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> > {
> > struct net_device *dev = xs->dev;
> >
> > - if (!dev || xs->state != XSK_BOUND)
> > + if (xs->state != XSK_BOUND)
> > return;
> > -
> > - xs->state = XSK_UNBOUND;
> > + WRITE_ONCE(xs->state, XSK_UNBOUND);
> >
> > /* Wait for driver to stop using the xdp socket. */
> > xdp_del_sk_umem(xs->umem, xs);
> > @@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
> > local_bh_enable();
> >
> > xsk_delete_from_maps(xs);
> > + mutex_lock(&xs->mutex);
> > xsk_unbind_dev(xs);
> > + mutex_unlock(&xs->mutex);
> >
> > xskq_destroy(xs->rx);
> > xskq_destroy(xs->tx);
> > @@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> > }
> >
> > umem_xs = xdp_sk(sock->sk);
> > - if (!umem_xs->umem) {
> > - /* No umem to inherit. */
> > + if (!xsk_is_bound(umem_xs)) {
> > err = -EBADF;
> > sockfd_put(sock);
> > goto out_unlock;
> > - } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> > + }
> > + if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> > err = -EINVAL;
> > sockfd_put(sock);
> > goto out_unlock;
> > }
> > -
> > xdp_get_umem(umem_xs->umem);
> > - xs->umem = umem_xs->umem;
> > + WRITE_ONCE(xs->umem, umem_xs->umem);
> > sockfd_put(sock);
> > } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> > err = -EINVAL;
> > @@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> > xdp_add_sk_umem(xs->umem, xs);
> >
> > out_unlock:
> > - if (err)
> > + if (err) {
> > dev_put(dev);
> > - else
> > - xs->state = XSK_BOUND;
> > + } else {
> > + /* Matches smp_rmb() in bind() for shared umem
> > + * sockets, and xsk_is_bound().
> > + */
> > + smp_wmb();
>
> You write with what this barrier matches/pairs, but would be useful for readers
> to have an explanation against what it protects. I presume to have things like
> xs->umem public as you seem to guard it behind xsk_is_bound() in xsk_poll() and
> other cases? Would be great to have a detailed analysis of all this e.g. in the
> commit message so one wouldn't need to guess; right now it feels this is doing
> many things at once and w/o further explanation of why READ_ONCE() or others are
> omitted sometimes. Would be great to get a lot more clarity into this, perhaps
> splitting it up a bit might also help.
>
I'll address that. Thanks for the review!
Björn
> > + WRITE_ONCE(xs->state, XSK_BOUND);
> > + }
>
> Thanks,
> Daniel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-09-03 15:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 6:10 [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state Björn Töpel
2019-08-26 15:24 ` Ilya Maximets
2019-08-26 16:34 ` Björn Töpel
2019-08-26 17:57 ` Jonathan Lemon
2019-08-26 17:54 ` Jonathan Lemon
2019-09-03 15:22 ` Daniel Borkmann
2019-09-03 15:26 ` Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem Björn Töpel
2019-08-26 6:10 ` [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface Björn Töpel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).