All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/5] net: avoid KCSAN splats
@ 2019-10-24  5:44 Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 1/5] net: add skb_queue_empty_lockless() Eric Dumazet
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Often times we use skb_queue_empty() without holding a lock,
meaning that other cpus (or interrupt) can change the queue
under us. This is fine, but we need to properly annotate
the lockless intent to make sure the compiler wont over
optimize things.

Eric Dumazet (5):
  net: add skb_queue_empty_lockless()
  udp: use skb_queue_empty_lockless()
  net: use skb_queue_empty_lockless() in poll() handlers
  net: use skb_queue_empty_lockless() in busy poll contexts
  net: add READ_ONCE() annotation in __skb_wait_for_more_packets()

 drivers/crypto/chelsio/chtls/chtls_io.c |  2 +-
 drivers/isdn/capi/capi.c                |  2 +-
 drivers/nvme/host/tcp.c                 |  2 +-
 include/linux/skbuff.h                  | 33 ++++++++++++++++++-------
 net/atm/common.c                        |  2 +-
 net/bluetooth/af_bluetooth.c            |  4 +--
 net/caif/caif_socket.c                  |  2 +-
 net/core/datagram.c                     |  8 +++---
 net/core/sock.c                         |  2 +-
 net/decnet/af_decnet.c                  |  2 +-
 net/ipv4/tcp.c                          |  4 +--
 net/ipv4/udp.c                          |  8 +++---
 net/nfc/llcp_sock.c                     |  4 +--
 net/phonet/socket.c                     |  4 +--
 net/sctp/socket.c                       |  6 ++---
 net/tipc/socket.c                       |  4 +--
 net/unix/af_unix.c                      |  6 ++---
 net/vmw_vsock/af_vsock.c                |  2 +-
 18 files changed, 56 insertions(+), 41 deletions(-)

-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net 1/5] net: add skb_queue_empty_lockless()
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
@ 2019-10-24  5:44 ` Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 2/5] udp: use skb_queue_empty_lockless() Eric Dumazet
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.

Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.

The corresponding WRITE_ONCE() should add zero cost
for current compilers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a391147c03d4090252117e106a0df8f612955634..64a395c7f6895ce4dc439571de2eec8673b393e4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1495,6 +1495,19 @@ static inline int skb_queue_empty(const struct sk_buff_head *list)
 	return list->next == (const struct sk_buff *) list;
 }
 
+/**
+ *	skb_queue_empty_lockless - check if a queue is empty
+ *	@list: queue head
+ *
+ *	Returns true if the queue is empty, false otherwise.
+ *	This variant can be used in lockless contexts.
+ */
+static inline bool skb_queue_empty_lockless(const struct sk_buff_head *list)
+{
+	return READ_ONCE(list->next) == (const struct sk_buff *) list;
+}
+
+
 /**
  *	skb_queue_is_last - check if skb is the last entry in the queue
  *	@list: queue head
@@ -1848,9 +1861,11 @@ static inline void __skb_insert(struct sk_buff *newsk,
 				struct sk_buff *prev, struct sk_buff *next,
 				struct sk_buff_head *list)
 {
-	newsk->next = next;
-	newsk->prev = prev;
-	next->prev  = prev->next = newsk;
+	/* see skb_queue_empty_lockless() for the opposite READ_ONCE() */
+	WRITE_ONCE(newsk->next, next);
+	WRITE_ONCE(newsk->prev, prev);
+	WRITE_ONCE(next->prev, newsk);
+	WRITE_ONCE(prev->next, newsk);
 	list->qlen++;
 }
 
@@ -1861,11 +1876,11 @@ static inline void __skb_queue_splice(const struct sk_buff_head *list,
 	struct sk_buff *first = list->next;
 	struct sk_buff *last = list->prev;
 
-	first->prev = prev;
-	prev->next = first;
+	WRITE_ONCE(first->prev, prev);
+	WRITE_ONCE(prev->next, first);
 
-	last->next = next;
-	next->prev = last;
+	WRITE_ONCE(last->next, next);
+	WRITE_ONCE(next->prev, last);
 }
 
 /**
@@ -2006,8 +2021,8 @@ static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
 	next	   = skb->next;
 	prev	   = skb->prev;
 	skb->next  = skb->prev = NULL;
-	next->prev = prev;
-	prev->next = next;
+	WRITE_ONCE(next->prev, prev);
+	WRITE_ONCE(prev->next, next);
 }
 
 /**
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/5] udp: use skb_queue_empty_lockless()
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 1/5] net: add skb_queue_empty_lockless() Eric Dumazet
@ 2019-10-24  5:44 ` Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 3/5] net: use skb_queue_empty_lockless() in poll() handlers Eric Dumazet
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot

syzbot reported a data-race [1].

We should use skb_queue_empty_lockless() to document that we are
not ensuring a mutual exclusion and silence KCSAN.

[1]
BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb

write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
 __skb_insert include/linux/skbuff.h:1852 [inline]
 __skb_queue_before include/linux/skbuff.h:1958 [inline]
 __skb_queue_tail include/linux/skbuff.h:1991 [inline]
 __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
 udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
 skb_queue_empty include/linux/skbuff.h:1494 [inline]
 __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
 __do_sys_recvmmsg net/socket.c:2703 [inline]
 __se_sys_recvmmsg net/socket.c:2696 [inline]
 __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv4/udp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 14bc654b6842003b325efe268041749c6770947f..2cc259736c2e0d10f16dcd5333925e2a369843b9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1577,7 +1577,7 @@ static int first_packet_length(struct sock *sk)
 
 	spin_lock_bh(&rcvq->lock);
 	skb = __first_packet_length(sk, rcvq, &total);
-	if (!skb && !skb_queue_empty(sk_queue)) {
+	if (!skb && !skb_queue_empty_lockless(sk_queue)) {
 		spin_lock(&sk_queue->lock);
 		skb_queue_splice_tail_init(sk_queue, rcvq);
 		spin_unlock(&sk_queue->lock);
@@ -1650,7 +1650,7 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
 				return skb;
 			}
 
-			if (skb_queue_empty(sk_queue)) {
+			if (skb_queue_empty_lockless(sk_queue)) {
 				spin_unlock_bh(&queue->lock);
 				goto busy_check;
 			}
@@ -1676,7 +1676,7 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
 				break;
 
 			sk_busy_loop(sk, flags & MSG_DONTWAIT);
-		} while (!skb_queue_empty(sk_queue));
+		} while (!skb_queue_empty_lockless(sk_queue));
 
 		/* sk_queue is empty, reader_queue may contain peeked packets */
 	} while (timeo &&
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 3/5] net: use skb_queue_empty_lockless() in poll() handlers
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 1/5] net: add skb_queue_empty_lockless() Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 2/5] udp: use skb_queue_empty_lockless() Eric Dumazet
@ 2019-10-24  5:44 ` Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 4/5] net: use skb_queue_empty_lockless() in busy poll contexts Eric Dumazet
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Many poll() handlers are lockless. Using skb_queue_empty_lockless()
instead of skb_queue_empty() is more appropriate.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/isdn/capi/capi.c     | 2 +-
 net/atm/common.c             | 2 +-
 net/bluetooth/af_bluetooth.c | 4 ++--
 net/caif/caif_socket.c       | 2 +-
 net/core/datagram.c          | 4 ++--
 net/decnet/af_decnet.c       | 2 +-
 net/ipv4/tcp.c               | 2 +-
 net/ipv4/udp.c               | 2 +-
 net/nfc/llcp_sock.c          | 4 ++--
 net/phonet/socket.c          | 4 ++--
 net/sctp/socket.c            | 4 ++--
 net/tipc/socket.c            | 4 ++--
 net/unix/af_unix.c           | 6 +++---
 net/vmw_vsock/af_vsock.c     | 2 +-
 14 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c
index c92b405b7646cfbceaed3b6544fb71d145c9bb42..ba8619524231ccde29093076e6196652d17dd69b 100644
--- a/drivers/isdn/capi/capi.c
+++ b/drivers/isdn/capi/capi.c
@@ -744,7 +744,7 @@ capi_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &(cdev->recvwait), wait);
 	mask = EPOLLOUT | EPOLLWRNORM;
-	if (!skb_queue_empty(&cdev->recvqueue))
+	if (!skb_queue_empty_lockless(&cdev->recvqueue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 	return mask;
 }
diff --git a/net/atm/common.c b/net/atm/common.c
index b7528e77997c88b7cf15f0b48412e3abef0620e3..0ce530af534ddfed26a32b1b24f202ee7c582198 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -668,7 +668,7 @@ __poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
 		mask |= EPOLLHUP;
 
 	/* readable? */
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* writable? */
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 94ddf19998c7ee7098b52c9ef37f3e0296089fab..5f508c50649d0abc317ed83cc0768f6b7a64de96 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -460,7 +460,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket *sock,
 	if (sk->sk_state == BT_LISTEN)
 		return bt_accept_poll(sk);
 
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR |
 			(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
@@ -470,7 +470,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket *sock,
 	if (sk->sk_shutdown == SHUTDOWN_MASK)
 		mask |= EPOLLHUP;
 
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	if (sk->sk_state == BT_CLOSED)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 13ea920600aeb66af017134cbe5fcfd5ccc6020e..ef14da50a9819183c538f14b97b5e4986adafb45 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -953,7 +953,7 @@ static __poll_t caif_poll(struct file *file,
 		mask |= EPOLLRDHUP;
 
 	/* readable? */
-	if (!skb_queue_empty(&sk->sk_receive_queue) ||
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue) ||
 		(sk->sk_shutdown & RCV_SHUTDOWN))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index c210fc116103d9915a2a4abc5225e0eb75825b0b..5b685e110affab8f6d7cd3050ce88dfddb1357f5 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -767,7 +767,7 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 	mask = 0;
 
 	/* exceptional events? */
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR |
 			(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
@@ -777,7 +777,7 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 		mask |= EPOLLHUP;
 
 	/* readable? */
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 0ea75286abf46ad9da2e6c58d4facc91f731b8bf..3349ea81f9016fb785ec888fadf86faa4d859ed7 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1205,7 +1205,7 @@ static __poll_t dn_poll(struct file *file, struct socket *sock, poll_table  *wai
 	struct dn_scp *scp = DN_SK(sk);
 	__poll_t mask = datagram_poll(file, sock, wait);
 
-	if (!skb_queue_empty(&scp->other_receive_queue))
+	if (!skb_queue_empty_lockless(&scp->other_receive_queue))
 		mask |= EPOLLRDBAND;
 
 	return mask;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 42187a3b82f4f6280d831c34d9ca6b7bcffc191f..ffef502f52920170478d9fd000a507659e17de15 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -584,7 +584,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	}
 	/* This barrier is coupled with smp_wmb() in tcp_reset() */
 	smp_rmb();
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR;
 
 	return mask;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2cc259736c2e0d10f16dcd5333925e2a369843b9..345a3d43f5a655e009e99c16bb19e047cdf003c6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2712,7 +2712,7 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	__poll_t mask = datagram_poll(file, sock, wait);
 	struct sock *sk = sock->sk;
 
-	if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
+	if (!skb_queue_empty_lockless(&udp_sk(sk)->reader_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Check for false positives due to checksum errors */
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index ccdd790e163a81d0be5c69842fb3d25a57c743c4..28604414dec1b7d03f0d175cc87c10e0922c7043 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -554,11 +554,11 @@ static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
 	if (sk->sk_state == LLCP_LISTEN)
 		return llcp_accept_poll(sk);
 
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR |
 			(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	if (sk->sk_state == LLCP_CLOSED)
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 96ea9f254ae90ae957f800566b05a7f75d13948f..76d499f6af9ab32aa17422ef911497585b6649b7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -338,9 +338,9 @@ static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
 
 	if (sk->sk_state == TCP_CLOSE)
 		return EPOLLERR;
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
-	if (!skb_queue_empty(&pn->ctrlreq_queue))
+	if (!skb_queue_empty_lockless(&pn->ctrlreq_queue))
 		mask |= EPOLLPRI;
 	if (!mask && sk->sk_state == TCP_CLOSE_WAIT)
 		return EPOLLHUP;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 5ca0ec0e823c4a7c62e794a05129fbd80ff29331..cfb25391b8b0fb66b77db995933103007113aa32 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8476,7 +8476,7 @@ __poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	mask = 0;
 
 	/* Is there any exceptional events?  */
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR |
 			(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
@@ -8485,7 +8485,7 @@ __poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
 		mask |= EPOLLHUP;
 
 	/* Is it readable?  Reconsider this code with TCP-style support.  */
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* The association is either gone or not ready.  */
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index f8bbc4aab21397ecb184a6b869327a5b124cb7e4..4b92b196cfa668c9ee59b9dae4e83bb7b5b5ba3d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -740,7 +740,7 @@ static __poll_t tipc_poll(struct file *file, struct socket *sock,
 		/* fall through */
 	case TIPC_LISTEN:
 	case TIPC_CONNECTING:
-		if (!skb_queue_empty(&sk->sk_receive_queue))
+		if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 			revents |= EPOLLIN | EPOLLRDNORM;
 		break;
 	case TIPC_OPEN:
@@ -748,7 +748,7 @@ static __poll_t tipc_poll(struct file *file, struct socket *sock,
 			revents |= EPOLLOUT;
 		if (!tipc_sk_type_connectionless(sk))
 			break;
-		if (skb_queue_empty(&sk->sk_receive_queue))
+		if (skb_queue_empty_lockless(&sk->sk_receive_queue))
 			break;
 		revents |= EPOLLIN | EPOLLRDNORM;
 		break;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 67e87db5877f1132b52b0c93d349983873fec38f..0d8da809bea2e6053c6bd5687ae19ab3f3ca617a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2599,7 +2599,7 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 		mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
 
 	/* readable? */
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
@@ -2628,7 +2628,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	mask = 0;
 
 	/* exceptional events? */
-	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+	if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR |
 			(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
 
@@ -2638,7 +2638,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 		mask |= EPOLLHUP;
 
 	/* readable? */
-	if (!skb_queue_empty(&sk->sk_receive_queue))
+	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ab43b2bba31bca91de7fe7840c487e07683a922..582a3e4dfce295fa67100468335bc8dbf8dc1095 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -870,7 +870,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 		 * the queue and write as long as the socket isn't shutdown for
 		 * sending.
 		 */
-		if (!skb_queue_empty(&sk->sk_receive_queue) ||
+		if (!skb_queue_empty_lockless(&sk->sk_receive_queue) ||
 		    (sk->sk_shutdown & RCV_SHUTDOWN)) {
 			mask |= EPOLLIN | EPOLLRDNORM;
 		}
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 4/5] net: use skb_queue_empty_lockless() in busy poll contexts
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-10-24  5:44 ` [PATCH net 3/5] net: use skb_queue_empty_lockless() in poll() handlers Eric Dumazet
@ 2019-10-24  5:44 ` Eric Dumazet
  2019-10-24  5:44 ` [PATCH net 5/5] net: add READ_ONCE() annotation in __skb_wait_for_more_packets() Eric Dumazet
  2019-10-28 20:34 ` [PATCH net 0/5] net: avoid KCSAN splats David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 2 +-
 drivers/nvme/host/tcp.c                 | 2 +-
 net/core/datagram.c                     | 2 +-
 net/core/sock.c                         | 2 +-
 net/ipv4/tcp.c                          | 2 +-
 net/sctp/socket.c                       | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c b/drivers/crypto/chelsio/chtls/chtls_io.c
index 0891ab829b1b6b1318353e945e6394fab29f7c1e..98bc5a4cd5e7014990f064a92777308ae98b13e4 100644
--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -1702,7 +1702,7 @@ int chtls_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return peekmsg(sk, msg, len, nonblock, flags);
 
 	if (sk_can_busy_loop(sk) &&
-	    skb_queue_empty(&sk->sk_receive_queue) &&
+	    skb_queue_empty_lockless(&sk->sk_receive_queue) &&
 	    sk->sk_state == TCP_ESTABLISHED)
 		sk_busy_loop(sk, nonblock);
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 770dbcbc999e0bff81b8643644cab20599a0c2e3..7544be84ab3582e27ded19298b45960ce3460d96 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2219,7 +2219,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx)
 	struct nvme_tcp_queue *queue = hctx->driver_data;
 	struct sock *sk = queue->sock->sk;
 
-	if (sk_can_busy_loop(sk) && skb_queue_empty(&sk->sk_receive_queue))
+	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
 		sk_busy_loop(sk, true);
 	nvme_tcp_try_recv(queue);
 	return queue->nr_cqe;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 5b685e110affab8f6d7cd3050ce88dfddb1357f5..03515e46a49ab60cdd5f643efb3459d16f6021e5 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -278,7 +278,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 			break;
 
 		sk_busy_loop(sk, flags & MSG_DONTWAIT);
-	} while (sk->sk_receive_queue.prev != *last);
+	} while (READ_ONCE(sk->sk_receive_queue.prev) != *last);
 
 	error = -EAGAIN;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index a515392ba84b67b2bf5400e0cfb7c3454fa87af8..b8e758bcb6ad65c93e93fb64f70a61e353a5737e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3600,7 +3600,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time)
 {
 	struct sock *sk = p;
 
-	return !skb_queue_empty(&sk->sk_receive_queue) ||
+	return !skb_queue_empty_lockless(&sk->sk_receive_queue) ||
 	       sk_busy_loop_timeout(sk, start_time);
 }
 EXPORT_SYMBOL(sk_busy_loop_end);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ffef502f52920170478d9fd000a507659e17de15..d8876f0e9672718b4e02bc7aaaac30ecfd4903cb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1964,7 +1964,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
 
-	if (sk_can_busy_loop(sk) && skb_queue_empty(&sk->sk_receive_queue) &&
+	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue) &&
 	    (sk->sk_state == TCP_ESTABLISHED))
 		sk_busy_loop(sk, nonblock);
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index cfb25391b8b0fb66b77db995933103007113aa32..ca81e06df1651f16ab332cd9fc880c21b89a5c6d 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8871,7 +8871,7 @@ struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		if (sk_can_busy_loop(sk)) {
 			sk_busy_loop(sk, noblock);
 
-			if (!skb_queue_empty(&sk->sk_receive_queue))
+			if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 				continue;
 		}
 
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 5/5] net: add READ_ONCE() annotation in __skb_wait_for_more_packets()
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
                   ` (3 preceding siblings ...)
  2019-10-24  5:44 ` [PATCH net 4/5] net: use skb_queue_empty_lockless() in busy poll contexts Eric Dumazet
@ 2019-10-24  5:44 ` Eric Dumazet
  2019-10-28 20:34 ` [PATCH net 0/5] net: avoid KCSAN splats David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-10-24  5:44 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot

__skb_wait_for_more_packets() can be called while other cpus
can feed packets to the socket receive queue.

KCSAN reported :

BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb

write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0:
 __skb_insert include/linux/skbuff.h:1852 [inline]
 __skb_queue_before include/linux/skbuff.h:1958 [inline]
 __skb_queue_tail include/linux/skbuff.h:1991 [inline]
 __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470
 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
 udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1:
 __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100
 __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683
 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
 __do_sys_recvmmsg net/socket.c:2703 [inline]
 __se_sys_recvmmsg net/socket.c:2696 [inline]
 __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/core/datagram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 03515e46a49ab60cdd5f643efb3459d16f6021e5..da3c24ed129cd64db8bdb1916afa552a47c1a5a3 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -97,7 +97,7 @@ int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
 	if (error)
 		goto out_err;
 
-	if (sk->sk_receive_queue.prev != skb)
+	if (READ_ONCE(sk->sk_receive_queue.prev) != skb)
 		goto out;
 
 	/* Socket shut down? */
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 0/5] net: avoid KCSAN splats
  2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
                   ` (4 preceding siblings ...)
  2019-10-24  5:44 ` [PATCH net 5/5] net: add READ_ONCE() annotation in __skb_wait_for_more_packets() Eric Dumazet
@ 2019-10-28 20:34 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2019-10-28 20:34 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 23 Oct 2019 22:44:47 -0700

> Often times we use skb_queue_empty() without holding a lock,
> meaning that other cpus (or interrupt) can change the queue
> under us. This is fine, but we need to properly annotate
> the lockless intent to make sure the compiler wont over
> optimize things.

Series applied and I'll queue this up for -stable.

Thanks Eric.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-28 20:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  5:44 [PATCH net 0/5] net: avoid KCSAN splats Eric Dumazet
2019-10-24  5:44 ` [PATCH net 1/5] net: add skb_queue_empty_lockless() Eric Dumazet
2019-10-24  5:44 ` [PATCH net 2/5] udp: use skb_queue_empty_lockless() Eric Dumazet
2019-10-24  5:44 ` [PATCH net 3/5] net: use skb_queue_empty_lockless() in poll() handlers Eric Dumazet
2019-10-24  5:44 ` [PATCH net 4/5] net: use skb_queue_empty_lockless() in busy poll contexts Eric Dumazet
2019-10-24  5:44 ` [PATCH net 5/5] net: add READ_ONCE() annotation in __skb_wait_for_more_packets() Eric Dumazet
2019-10-28 20:34 ` [PATCH net 0/5] net: avoid KCSAN splats David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.