All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	John Fastabend <john.fastabend@gmail.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.19 09/50] net: sched: fix reordering issues
Date: Wed, 18 Sep 2019 08:18:52 +0200	[thread overview]
Message-ID: <20190918061223.855532778@linuxfoundation.org> (raw)
In-Reply-To: <20190918061223.116178343@linuxfoundation.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit b88dd52c62bb5c5d58f0963287f41fd084352c57 ]

Whenever MQ is not used on a multiqueue device, we experience
serious reordering problems. Bisection found the cited
commit.

The issue can be described this way :

- A single qdisc hierarchy is shared by all transmit queues.
  (eg : tc qdisc replace dev eth0 root fq_codel)

- When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
  a different transmit queue than the one used to build a packet train,
  we stop building the current list and save the 'bad' skb (P1) in a
  special queue. (bad_txq)

- When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
  skb (P1), it checks if the associated transmit queues is still in frozen
  state. If the queue is still blocked (by BQL or NIC tx ring full),
  we leave the skb in bad_txq and return NULL.

- dequeue_skb() calls q->dequeue() to get another packet (P2)

  The other packet can target the problematic queue (that we found
  in frozen state for the bad_txq packet), but another cpu just ran
  TX completion and made room in the txq that is now ready to accept
  new packets.

- Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
  at next round. In practice P2 is the lead of a big packet train
  (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/

To solve this problem, we have to block the dequeue process as long
as the first packet in bad_txq can not be sent. Reordering issues
disappear and no side effects have been seen.

Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/sch_generic.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -49,6 +49,8 @@ EXPORT_SYMBOL(default_qdisc_ops);
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
 
+#define SKB_XOFF_MAGIC ((struct sk_buff *)1UL)
+
 static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
 {
 	const struct netdev_queue *txq = q->dev_queue;
@@ -74,7 +76,7 @@ static inline struct sk_buff *__skb_dequ
 				q->q.qlen--;
 			}
 		} else {
-			skb = NULL;
+			skb = SKB_XOFF_MAGIC;
 		}
 	}
 
@@ -272,8 +274,11 @@ validate:
 		return skb;
 
 	skb = qdisc_dequeue_skb_bad_txq(q);
-	if (unlikely(skb))
+	if (unlikely(skb)) {
+		if (skb == SKB_XOFF_MAGIC)
+			return NULL;
 		goto bulk;
+	}
 	skb = q->dequeue(q);
 	if (skb) {
 bulk:



  parent reply	other threads:[~2019-09-18  6:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18  6:18 [PATCH 4.19 00/50] 4.19.74-stable review Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 01/50] bridge/mdb: remove wrong use of NLM_F_MULTI Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 02/50] cdc_ether: fix rndis support for Mediatek based smartphones Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 03/50] ipv6: Fix the link time qualifier of ping_v6_proc_exit_net() Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 04/50] isdn/capi: check message length in capi_write() Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 05/50] ixgbe: Fix secpath usage for IPsec TX offload Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 06/50] net: Fix null de-reference of device refcount Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 07/50] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 08/50] net: phylink: Fix flow control resolution Greg Kroah-Hartman
2019-09-18  6:18 ` Greg Kroah-Hartman [this message]
2019-09-18  6:18 ` [PATCH 4.19 10/50] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 11/50] sctp: Fix the link time qualifier of sctp_ctrlsock_exit() Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 12/50] sctp: use transport pf_retrans in sctp_do_8_2_transport_strike Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 13/50] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 14/50] tipc: add NULL pointer check before calling kfree_rcu Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 15/50] tun: fix use-after-free when register netdev failed Greg Kroah-Hartman
2019-09-18  6:18 ` [PATCH 4.19 16/50] gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist Greg Kroah-Hartman
2019-09-19  7:46   ` Pavel Machek
2019-09-20 13:56     ` Hans de Goede
2019-09-18  6:19 ` [PATCH 4.19 17/50] gpio: fix line flag validation in linehandle_create Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 18/50] Btrfs: fix assertion failure during fsync and use of stale transaction Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 19/50] ixgbe: Prevent u8 wrapping of ITR value to something less than 10us Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 20/50] genirq: Prevent NULL pointer dereference in resend_irqs() Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 21/50] KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset() Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 22/50] KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 23/50] KVM: x86: work around leak of uninitialized stack contents Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 24/50] KVM: nVMX: handle page fault in vmread Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 25/50] x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 26/50] powerpc: Add barrier_nospec to raw_copy_in_user() Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 27/50] drm/meson: Add support for XBGR8888 & ABGR8888 formats Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 28/50] clk: rockchip: Dont yell about bad mmc phases when getting Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 29/50] mtd: rawnand: mtk: Fix wrongly assigned OOB buffer pointer issue Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 30/50] PCI: Always allow probing with driver_override Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 31/50] gpio: fix line flag validation in lineevent_create Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 32/50] ubifs: Correctly use tnc_next() in search_dh_cookie() Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 33/50] driver core: Fix use-after-free and double free on glue directory Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 34/50] crypto: talitos - check AES key size Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 35/50] crypto: talitos - fix CTR alg blocksize Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 36/50] crypto: talitos - check data blocksize in ablkcipher Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 37/50] crypto: talitos - fix ECB algs ivsize Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 38/50] crypto: talitos - Do not modify req->cryptlen on decryption Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 39/50] crypto: talitos - HMAC SNOOP NO AFEU mode requires SW icv checking Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 40/50] firmware: ti_sci: Always request response from firmware Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 41/50] drm: panel-orientation-quirks: Add extra quirk table entry for GPD MicroPC Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 42/50] drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 43/50] Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature" Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 44/50] iio: adc: stm32-dfsdm: fix data type Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 45/50] modules: fix BUG when load module with rodata=n Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 46/50] modules: fix compile error if dont have strict module rwx Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 47/50] platform/x86: pmc_atom: Add CB4063 Beckhoff Automation board to critclk_systems DMI table Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 48/50] rsi: fix a double free bug in rsi_91x_deinit() Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 49/50] nvmem: Use the same permissions for eeprom as for nvmem Greg Kroah-Hartman
2019-09-18  6:19 ` [PATCH 4.19 50/50] x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning Greg Kroah-Hartman
2019-09-18 11:59 ` [PATCH 4.19 00/50] 4.19.74-stable review kernelci.org bot
2019-09-18 12:59 ` Guenter Roeck
2019-09-18 13:40   ` Greg Kroah-Hartman
2019-09-18 13:51     ` Greg Kroah-Hartman
2019-09-18 16:28 ` Jon Hunter
2019-09-18 16:28   ` Jon Hunter
2019-09-18 19:15 ` Naresh Kamboju
2019-09-18 19:37 ` Guenter Roeck
2019-09-19  6:37   ` Greg Kroah-Hartman
2019-09-19  1:24 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190918061223.855532778@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.