All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Laurent Bernaille <laurent.bernaille@datadoghq.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Toshiaki Makita <toshiaki.makita1@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Paolo Abeni <pabeni@redhat.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.4 07/18] veth: Do not record rx queue hint in veth_xmit
Date: Fri, 14 Jan 2022 09:16:14 +0100	[thread overview]
Message-ID: <20220114081541.713024234@linuxfoundation.org> (raw)
In-Reply-To: <20220114081541.465841464@linuxfoundation.org>

From: Daniel Borkmann <daniel@iogearbox.net>

commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream.

Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea9220251 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/veth.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -254,7 +254,6 @@ static netdev_tx_t veth_xmit(struct sk_b
 	if (rxq < rcv->real_num_rx_queues) {
 		rq = &rcv_priv->rq[rxq];
 		rcv_xdp = rcu_access_pointer(rq->xdp_prog);
-		skb_record_rx_queue(skb, rxq);
 	}
 
 	skb_tx_timestamp(skb);



  parent reply	other threads:[~2022-01-14  8:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14  8:16 [PATCH 5.4 00/18] 5.4.172-rc1 review Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 01/18] workqueue: Fix unbind_workers() VS wq_worker_running() race Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 02/18] Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 03/18] Bluetooth: bfusb: fix division by zero in send path Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 04/18] USB: core: Fix bug in resuming hubs handling of wakeup requests Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 05/18] USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 06/18] mmc: sdhci-pci: Add PCI ID for Intel ADL Greg Kroah-Hartman
2022-01-14  8:16 ` Greg Kroah-Hartman [this message]
2022-01-14  8:16 ` [PATCH 5.4 08/18] mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 09/18] drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 10/18] can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 11/18] can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 12/18] random: fix data race on crng_node_pool Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 13/18] random: fix data race on crng init time Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 14/18] random: fix crash on multiple early calls to add_bootloader_randomness() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 15/18] media: Revert "media: uvcvideo: Set unique vdev name based in type" Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 16/18] staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 17/18] drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.4 18/18] staging: greybus: fix stack size warning with UBSAN Greg Kroah-Hartman
2022-01-14 18:09 ` [PATCH 5.4 00/18] 5.4.172-rc1 review Jon Hunter
2022-01-14 20:15 ` Florian Fainelli
2022-01-15  0:26 ` Shuah Khan
2022-01-15  6:05 ` Naresh Kamboju
2022-01-15 11:09 ` Sudip Mukherjee
2022-01-15 16:38 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220114081541.713024234@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=laurent.bernaille@datadoghq.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=toshiaki.makita1@gmail.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.