linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Dust Li <dust.li@linux.alibaba.com>,
	Jason Wang <jasowang@redhat.com>,
	"David S . Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 37/61] virtio-net: support XDP when not more queues
Date: Tue, 24 Aug 2021 13:00:42 -0400	[thread overview]
Message-ID: <20210824170106.710221-38-sashal@kernel.org> (raw)
In-Reply-To: <20210824170106.710221-1-sashal@kernel.org>

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

[ Upstream commit 97c2c69e1926260c78c7f1c0b2c987934f1dc7a1 ]

The number of queues implemented by many virtio backends is limited,
especially some machines have a large number of CPUs. In this case, it
is often impossible to allocate a separate queue for
XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does
not use the XDP_TX/XDP_REDIRECT.

This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ
with __netif_tx_lock() hold when there are not enough queues.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/virtio_net.c | 62 +++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 15453d6fcc23..36f8aeb113a8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -195,6 +195,9 @@ struct virtnet_info {
 	/* # of XDP queue pairs currently used by the driver */
 	u16 xdp_queue_pairs;
 
+	/* xdp_queue_pairs may be 0, when xdp is already loaded. So add this. */
+	bool xdp_enabled;
+
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
 
@@ -485,12 +488,41 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 	return 0;
 }
 
-static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
-{
-	unsigned int qp;
-
-	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
-	return &vi->sq[qp];
+/* when vi->curr_queue_pairs > nr_cpu_ids, the txq/sq is only used for xdp tx on
+ * the current cpu, so it does not need to be locked.
+ *
+ * Here we use marco instead of inline functions because we have to deal with
+ * three issues at the same time: 1. the choice of sq. 2. judge and execute the
+ * lock/unlock of txq 3. make sparse happy. It is difficult for two inline
+ * functions to perfectly solve these three problems at the same time.
+ */
+#define virtnet_xdp_get_sq(vi) ({                                       \
+	struct netdev_queue *txq;                                       \
+	typeof(vi) v = (vi);                                            \
+	unsigned int qp;                                                \
+									\
+	if (v->curr_queue_pairs > nr_cpu_ids) {                         \
+		qp = v->curr_queue_pairs - v->xdp_queue_pairs;          \
+		qp += smp_processor_id();                               \
+		txq = netdev_get_tx_queue(v->dev, qp);                  \
+		__netif_tx_acquire(txq);                                \
+	} else {                                                        \
+		qp = smp_processor_id() % v->curr_queue_pairs;          \
+		txq = netdev_get_tx_queue(v->dev, qp);                  \
+		__netif_tx_lock(txq, raw_smp_processor_id());           \
+	}                                                               \
+	v->sq + qp;                                                     \
+})
+
+#define virtnet_xdp_put_sq(vi, q) {                                     \
+	struct netdev_queue *txq;                                       \
+	typeof(vi) v = (vi);                                            \
+									\
+	txq = netdev_get_tx_queue(v->dev, (q) - v->sq);                 \
+	if (v->curr_queue_pairs > nr_cpu_ids)                           \
+		__netif_tx_release(txq);                                \
+	else                                                            \
+		__netif_tx_unlock(txq);                                 \
 }
 
 static int virtnet_xdp_xmit(struct net_device *dev,
@@ -516,7 +548,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	if (!xdp_prog)
 		return -ENXIO;
 
-	sq = virtnet_xdp_sq(vi);
+	sq = virtnet_xdp_get_sq(vi);
 
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
 		ret = -EINVAL;
@@ -564,12 +596,13 @@ out:
 	sq->stats.kicks += kicks;
 	u64_stats_update_end(&sq->stats.syncp);
 
+	virtnet_xdp_put_sq(vi, sq);
 	return ret;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 {
-	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
+	return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
 }
 
 /* We copy the packet for XDP in the following cases:
@@ -1458,12 +1491,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 		xdp_do_flush_map();
 
 	if (xdp_xmit & VIRTIO_XDP_TX) {
-		sq = virtnet_xdp_sq(vi);
+		sq = virtnet_xdp_get_sq(vi);
 		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
 			u64_stats_update_begin(&sq->stats.syncp);
 			sq->stats.kicks++;
 			u64_stats_update_end(&sq->stats.syncp);
 		}
+		virtnet_xdp_put_sq(vi, sq);
 	}
 
 	return received;
@@ -2480,10 +2514,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 
 	/* XDP requires extra queues for XDP_TX */
 	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
-		NL_SET_ERR_MSG_MOD(extack, "Too few free TX rings available");
-		netdev_warn(dev, "request %i queues but max is %i\n",
+		netdev_warn(dev, "XDP request %i queues but max is %i. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.\n",
 			    curr_qp + xdp_qp, vi->max_queue_pairs);
-		return -ENOMEM;
+		xdp_qp = 0;
 	}
 
 	old_prog = rtnl_dereference(vi->rq[0].xdp_prog);
@@ -2520,11 +2553,14 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	vi->xdp_queue_pairs = xdp_qp;
 
 	if (prog) {
+		vi->xdp_enabled = true;
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
 			if (i == 0 && !old_prog)
 				virtnet_clear_guest_offloads(vi);
 		}
+	} else {
+		vi->xdp_enabled = false;
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -2609,7 +2645,7 @@ static int virtnet_set_features(struct net_device *dev,
 	int err;
 
 	if ((dev->features ^ features) & NETIF_F_LRO) {
-		if (vi->xdp_queue_pairs)
+		if (vi->xdp_enabled)
 			return -EBUSY;
 
 		if (features & NETIF_F_LRO)
-- 
2.30.2


  parent reply	other threads:[~2021-08-24 17:17 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 17:00 [PATCH 5.4 00/61] 5.4.143-rc1 review Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 01/61] ext4: fix EXT4_MAX_LOGICAL_BLOCK macro Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 02/61] x86/fpu: Make init_fpstate correct with optimized XSAVE Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 03/61] ath: Use safer key clearing with key cache entries Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 04/61] ath9k: Clear key cache explicitly on disabling hardware Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 05/61] ath: Export ath_hw_keysetmac() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 06/61] ath: Modify ath_key_delete() to not need full key entry Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 07/61] ath9k: Postpone key cache entry deletion for TXQ frames reference it Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 08/61] mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 09/61] media: zr364xx: propagate errors from zr364xx_start_readpipe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 10/61] media: zr364xx: fix memory leaks in probe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 11/61] media: drivers/media/usb: fix memory leak in zr364xx_probe Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 12/61] USB: core: Avoid WARNings for 0-length descriptor requests Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 13/61] dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 14/61] dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 15/61] ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 16/61] dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 17/61] scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 18/61] scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 19/61] scsi: core: Avoid printing an error if target_alloc() returns -ENXIO Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 20/61] scsi: core: Fix capacity set to zero after offlinining device Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 21/61] ARM: dts: nomadik: Fix up interrupt controller node names Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 22/61] net: usb: lan78xx: don't modify phy_device state concurrently Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 23/61] drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 24/61] Bluetooth: hidp: use correct wait queue when removing ctrl_wait Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 25/61] iommu: Check if group is NULL before remove device Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 26/61] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 27/61] dccp: add do-while-0 stubs for dccp_pr_debug macros Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 28/61] virtio: Protect vqs list access Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 29/61] vhost: Fix the calculation in vhost_overflow() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 30/61] bpf: Clear zext_dst of dead insns Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 31/61] bnxt: don't lock the tx queue from napi poll Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 32/61] bnxt: disable napi before canceling DIM Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 33/61] net: 6pack: fix slab-out-of-bounds in decode_data Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 34/61] ptp_pch: Restore dependency on PCI Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 35/61] bnxt_en: Add missing DMA memory barriers Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 36/61] vrf: Reset skb conntrack connection on VRF rcv Sasha Levin
2021-08-24 17:00 ` Sasha Levin [this message]
2021-08-24 17:00 ` [PATCH 5.4 38/61] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 39/61] net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 40/61] net: mdio-mux: Don't ignore memory allocation errors Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 41/61] net: mdio-mux: Handle -EPROBE_DEFER correctly Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 42/61] ovs: clear skb->tstamp in forwarding path Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 43/61] i40e: Fix ATR queue selection Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 44/61] iavf: Fix ping is lost after untrusted VF had tried to change MAC Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 45/61] ovl: add splice file read write helper Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 46/61] mmc: dw_mmc: Fix hang on data CRC error Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 47/61] ALSA: hda - fix the 'Capture Switch' value change notifications Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 48/61] tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 49/61] slimbus: messaging: start transaction ids from 1 instead of zero Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 50/61] slimbus: messaging: check for valid transaction id Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 51/61] slimbus: ngd: reset dma setup during runtime pm Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 52/61] ipack: tpci200: fix many double free issues in tpci200_pci_probe Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 53/61] ipack: tpci200: fix memory leak in the tpci200_register Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 54/61] btrfs: prevent rename2 from exchanging a subvol with a directory from different parents Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 55/61] PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 56/61] ASoC: intel: atom: Fix breakage for PCM buffer address setup Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 57/61] mm, memcg: avoid stale protection values when cgroup is above protection Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 58/61] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 59/61] fs: warn about impending deprecation of mandatory locks Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 60/61] netfilter: nft_exthdr: fix endianness of tcp option cast Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 61/61] Linux 5.4.143-rc1 Sasha Levin
2021-08-25  7:38 ` [PATCH 5.4 00/61] 5.4.143-rc1 review Samuel Zou
2021-08-25 13:03 ` Sudip Mukherjee
2021-08-25 18:37 ` Daniel Díaz
2021-08-25 20:25 ` Guenter Roeck
2021-08-25 22:36 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210824170106.710221-38-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=davem@davemloft.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).