netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path
@ 2020-12-18 13:45 Magnus Karlsson
  2020-12-18 13:45 ` [PATCH bpf v2 1/2] xsk: fix race in SKB mode transmit with shared cq Magnus Karlsson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Magnus Karlsson @ 2020-12-18 13:45 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: Magnus Karlsson, bpf, A.Zema, maciej.fijalkowski, maciejromanfijalkowski

This patch set contains two bug fixes to the Tx SKB path. Details can
be found in the individual commit messages. Special thanks to Xuan
Zhuo for spotting both of them.

v1 -> v2:
* Rebase

Thanks: Magnus

Magnus Karlsson (2):
  xsk: fix race in SKB mode transmit with shared cq
  xsk: rollback reservation at NETDEV_TX_BUSY

 include/net/xdp_sock.h      |  4 ----
 include/net/xsk_buff_pool.h |  5 +++++
 net/xdp/xsk.c               | 12 +++++++++---
 net/xdp/xsk_buff_pool.c     |  1 +
 net/xdp/xsk_queue.h         |  5 +++++
 5 files changed, 20 insertions(+), 7 deletions(-)


base-commit: 8bee683384087a6275c9183a483435225f7bb209
--
2.29.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf v2 1/2] xsk: fix race in SKB mode transmit with shared cq
  2020-12-18 13:45 [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path Magnus Karlsson
@ 2020-12-18 13:45 ` Magnus Karlsson
  2020-12-18 13:45 ` [PATCH bpf v2 2/2] xsk: rollback reservation at NETDEV_TX_BUSY Magnus Karlsson
  2020-12-18 15:20 ` [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Magnus Karlsson @ 2020-12-18 13:45 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, A.Zema, maciej.fijalkowski, maciejromanfijalkowski, Xuan Zhuo

From: Magnus Karlsson <magnus.karlsson@intel.com>

Fix a race when multiple sockets are simultaneously calling sendto()
when the completion ring is shared in the SKB case. This is the case
when you share the same netdev and queue id through the
XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
be in xsk_generic_xmit() and call the backpressure mechanism in
xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
specific scenario, a race might occur since the rings are
single-producer single-consumer.

Fix this by moving the tx_completion_lock from the socket to the pool
as the pool is shared between the sockets that share the completion
ring. (The pool is not shared when this is not the case.) And then
protect the accesses to xskq_prod_reserve() with this lock. The
tx_completion_lock is renamed cq_lock to better reflect that it
protects accesses to the potentially shared completion ring.

Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/net/xdp_sock.h      | 4 ----
 include/net/xsk_buff_pool.h | 5 +++++
 net/xdp/xsk.c               | 9 ++++++---
 net/xdp/xsk_buff_pool.c     | 1 +
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 4f4e93bf814c..cc17bc957548 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -58,10 +58,6 @@ struct xdp_sock {
 
 	struct xsk_queue *tx ____cacheline_aligned_in_smp;
 	struct list_head tx_list;
-	/* Mutual exclusion of NAPI TX thread and sendmsg error paths
-	 * in the SKB destructor callback.
-	 */
-	spinlock_t tx_completion_lock;
 	/* Protects generic receive. */
 	spinlock_t rx_lock;
 
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 01755b838c74..eaa8386dbc63 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -73,6 +73,11 @@ struct xsk_buff_pool {
 	bool dma_need_sync;
 	bool unaligned;
 	void *addrs;
+	/* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
+	 * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
+	 * sockets share a single cq when the same netdev and queue id is shared.
+	 */
+	spinlock_t cq_lock;
 	struct xdp_buff_xsk *free_heads[];
 };
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index c6532d77fde7..d531f9cd0de6 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -423,9 +423,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	struct xdp_sock *xs = xdp_sk(skb->sk);
 	unsigned long flags;
 
-	spin_lock_irqsave(&xs->tx_completion_lock, flags);
+	spin_lock_irqsave(&xs->pool->cq_lock, flags);
 	xskq_prod_submit_addr(xs->pool->cq, addr);
-	spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
+	spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 
 	sock_wfree(skb);
 }
@@ -437,6 +437,7 @@ static int xsk_generic_xmit(struct sock *sk)
 	bool sent_frame = false;
 	struct xdp_desc desc;
 	struct sk_buff *skb;
+	unsigned long flags;
 	int err = 0;
 
 	mutex_lock(&xs->mutex);
@@ -468,10 +469,13 @@ static int xsk_generic_xmit(struct sock *sk)
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
+		spin_lock_irqsave(&xs->pool->cq_lock, flags);
 		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
+			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 			kfree_skb(skb);
 			goto out;
 		}
+		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 
 		skb->dev = xs->dev;
 		skb->priority = sk->sk_priority;
@@ -1303,7 +1307,6 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
 	xs->state = XSK_READY;
 	mutex_init(&xs->mutex);
 	spin_lock_init(&xs->rx_lock);
-	spin_lock_init(&xs->tx_completion_lock);
 
 	INIT_LIST_HEAD(&xs->map_list);
 	spin_lock_init(&xs->map_list_lock);
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 818b75060922..20598eea658c 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -71,6 +71,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
 	INIT_LIST_HEAD(&pool->free_list);
 	INIT_LIST_HEAD(&pool->xsk_tx_list);
 	spin_lock_init(&pool->xsk_tx_list_lock);
+	spin_lock_init(&pool->cq_lock);
 	refcount_set(&pool->users, 1);
 
 	pool->fq = xs->fq_tmp;
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf v2 2/2] xsk: rollback reservation at NETDEV_TX_BUSY
  2020-12-18 13:45 [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path Magnus Karlsson
  2020-12-18 13:45 ` [PATCH bpf v2 1/2] xsk: fix race in SKB mode transmit with shared cq Magnus Karlsson
@ 2020-12-18 13:45 ` Magnus Karlsson
  2020-12-18 15:20 ` [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Magnus Karlsson @ 2020-12-18 13:45 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, A.Zema, maciej.fijalkowski, maciejromanfijalkowski, Xuan Zhuo

From: Magnus Karlsson <magnus.karlsson@intel.com>

Rollback the reservation in the completion ring when we get a
NETDEV_TX_BUSY. When this error is received from the driver, we are
supposed to let the user application retry the transmit again. And in
order to do this, we need to roll back the failed send so it can be
retried. Unfortunately, we did not cancel the reservation we had made
in the completion ring. By not doing this, we actually make the
completion ring one entry smaller per NETDEV_TX_BUSY error we get, and
after enough of these errors the completion ring will be of size zero
and transmit will stop working.

Fix this by cancelling the reservation when we get a NETDEV_TX_BUSY
error.

Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c       | 3 +++
 net/xdp/xsk_queue.h | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d531f9cd0de6..8037b04a9edd 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -487,6 +487,9 @@ static int xsk_generic_xmit(struct sock *sk)
 		if  (err == NETDEV_TX_BUSY) {
 			/* Tell user-space to retry the send */
 			skb->destructor = sock_wfree;
+			spin_lock_irqsave(&xs->pool->cq_lock, flags);
+			xskq_prod_cancel(xs->pool->cq);
+			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 			/* Free skb without triggering the perf drop trace */
 			consume_skb(skb);
 			err = -EAGAIN;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 4a9663aa7afe..2823b7c3302d 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -334,6 +334,11 @@ static inline bool xskq_prod_is_full(struct xsk_queue *q)
 	return xskq_prod_nb_free(q, 1) ? false : true;
 }
 
+static inline void xskq_prod_cancel(struct xsk_queue *q)
+{
+	q->cached_prod--;
+}
+
 static inline int xskq_prod_reserve(struct xsk_queue *q)
 {
 	if (xskq_prod_is_full(q))
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path
  2020-12-18 13:45 [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path Magnus Karlsson
  2020-12-18 13:45 ` [PATCH bpf v2 1/2] xsk: fix race in SKB mode transmit with shared cq Magnus Karlsson
  2020-12-18 13:45 ` [PATCH bpf v2 2/2] xsk: rollback reservation at NETDEV_TX_BUSY Magnus Karlsson
@ 2020-12-18 15:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2020-12-18 15:20 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, bpf, A.Zema, maciej.fijalkowski,
	maciejromanfijalkowski

Hello:

This series was applied to bpf/bpf.git (refs/heads/master):

On Fri, 18 Dec 2020 14:45:23 +0100 you wrote:
> This patch set contains two bug fixes to the Tx SKB path. Details can
> be found in the individual commit messages. Special thanks to Xuan
> Zhuo for spotting both of them.
> 
> v1 -> v2:
> * Rebase
> 
> [...]

Here is the summary with links:
  - [bpf,v2,1/2] xsk: fix race in SKB mode transmit with shared cq
    https://git.kernel.org/bpf/bpf/c/f09ced4053bc
  - [bpf,v2,2/2] xsk: rollback reservation at NETDEV_TX_BUSY
    https://git.kernel.org/bpf/bpf/c/b1b95cb5c0a9

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-18 15:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 13:45 [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path Magnus Karlsson
2020-12-18 13:45 ` [PATCH bpf v2 1/2] xsk: fix race in SKB mode transmit with shared cq Magnus Karlsson
2020-12-18 13:45 ` [PATCH bpf v2 2/2] xsk: rollback reservation at NETDEV_TX_BUSY Magnus Karlsson
2020-12-18 15:20 ` [PATCH bpf v2 0/2] xsk: fix two bugs in the SKB Tx path patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).