All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Laurent Bernaille <laurent.bernaille@datadoghq.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Toshiaki Makita <toshiaki.makita1@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Paolo Abeni <pabeni@redhat.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.15 28/41] veth: Do not record rx queue hint in veth_xmit
Date: Fri, 14 Jan 2022 09:16:28 +0100	[thread overview]
Message-ID: <20220114081546.093902853@linuxfoundation.org> (raw)
In-Reply-To: <20220114081545.158363487@linuxfoundation.org>

From: Daniel Borkmann <daniel@iogearbox.net>

commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream.

Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea9220251 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/veth.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -342,7 +342,6 @@ static netdev_tx_t veth_xmit(struct sk_b
 		 */
 		use_napi = rcu_access_pointer(rq->napi) &&
 			   veth_skb_is_eligible_for_gro(dev, rcv, skb);
-		skb_record_rx_queue(skb, rxq);
 	}
 
 	skb_tx_timestamp(skb);



  parent reply	other threads:[~2022-01-14  8:22 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14  8:16 [PATCH 5.15 00/41] 5.15.15-rc1 review Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 01/41] s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 02/41] workqueue: Fix unbind_workers() VS wq_worker_running() race Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 03/41] staging: r8188eu: switch the led off during deinit Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 04/41] bpf: Fix out of bounds access from invalid *_or_null type verification Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 05/41] Bluetooth: btusb: Add protocol for MediaTek bluetooth devices(MT7922) Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 06/41] Bluetooth: btusb: Add the new support ID for Realtek RTL8852A Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 07/41] Bluetooth: btusb: Add support for IMC Networks Mediatek Chip(MT7921) Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 08/41] Bbluetooth: btusb: Add another Bluetooth part for Realtek 8852AE Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 09/41] Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 10/41] Bluetooth: btusb: enable Mediatek to support AOSP extension Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 11/41] Bluetooth: btusb: Add one more Bluetooth part for the Realtek RTL8852AE Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 12/41] Bluetooth: btusb: Add the new support IDs for WCN6855 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 13/41] fget: clarify and improve __fget_files() implementation Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 14/41] Bluetooth: btusb: Add one more Bluetooth part for WCN6855 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 15/41] Bluetooth: btusb: Add two more Bluetooth parts " Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 16/41] Bluetooth: btusb: Add support for Foxconn MT7922A Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 17/41] Bluetooth: btintel: Fix broken LED quirk for legacy ROM devices Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 18/41] Bluetooth: btusb: Add support for Foxconn QCA 0xe0d0 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 19/41] Bluetooth: bfusb: fix division by zero in send path Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 20/41] ARM: dts: exynos: Fix BCM4330 Bluetooth reset polarity in I9100 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 21/41] USB: core: Fix bug in resuming hubs handling of wakeup requests Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 22/41] USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 23/41] ath11k: Fix buffer overflow when scanning with extraie Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 24/41] mmc: sdhci-pci: Add PCI ID for Intel ADL Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 25/41] Bluetooth: add quirk disabling LE Read Transmit Power Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 26/41] Bluetooth: btbcm: disable read tx power for some Macs with the T2 Security chip Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 27/41] Bluetooth: btbcm: disable read tx power for MacBook Air 8,1 and 8,2 Greg Kroah-Hartman
2022-01-14  8:16 ` Greg Kroah-Hartman [this message]
2022-01-14  8:16 ` [PATCH 5.15 29/41] mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 30/41] x86/mce: Remove noinstr annotation from mce_setup() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 31/41] can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 32/41] can: isotp: convert struct tpcon::{idx,len} to unsigned int Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 33/41] can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 34/41] random: fix data race on crng_node_pool Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 35/41] random: fix data race on crng init time Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 36/41] random: fix crash on multiple early calls to add_bootloader_randomness() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 37/41] platform/x86/intel: hid: add quirk to support Surface Go 3 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 38/41] media: Revert "media: uvcvideo: Set unique vdev name based in type" Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 39/41] staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 40/41] drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 41/41] staging: greybus: fix stack size warning with UBSAN Greg Kroah-Hartman
2022-01-14 17:43 ` [PATCH 5.15 00/41] 5.15.15-rc1 review Naresh Kamboju
2022-01-14 18:09 ` Jon Hunter
2022-01-14 19:59 ` Ron Economos
2022-01-15  8:14   ` Greg Kroah-Hartman
2022-01-15 11:52     ` Ron Economos
2022-01-15 12:15       ` Greg Kroah-Hartman
2022-01-15 12:31         ` Ron Economos
2022-01-14 22:29 ` Florian Fainelli
2022-01-14 23:32 ` Fox Chen
2022-01-15  0:24 ` Shuah Khan
2022-01-15 11:03 ` Sudip Mukherjee
2022-01-15 14:47 ` Andrei Rabusov
2022-01-15 16:39 ` Guenter Roeck
2022-01-15 16:48 ` Jeffrin Jose T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220114081546.093902853@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=laurent.bernaille@datadoghq.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=toshiaki.makita1@gmail.com \
    --cc=willemb@google.com \
    --subject='Re: [PATCH 5.15 28/41] veth: Do not record rx queue hint in veth_xmit' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.