From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B4CC4CEC9 for ; Wed, 18 Sep 2019 06:30:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9E97321928 for ; Wed, 18 Sep 2019 06:30:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568788249; bh=QX/ODPROnjlsW3TmwXUDdR+0VJvtH9gWlChsXjJEceg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=zMHDhX6ls3DJzNCnxPHJTCZCcf3GvAv7hpYqhZnIjkk0+7/9YBAw9PIxUtNP2mlWh LnCzK3uJ7lAc84/D/bc9UklDNkp2HGoylrfS53VuFJWYWOPe877vBddMmk0fRBefov 7WEjiwNOxHjEAgu0rDTdqGPlSwvehQ5Mxw9oHNcM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729802AbfIRGas (ORCPT ); Wed, 18 Sep 2019 02:30:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:44760 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728510AbfIRGYT (ORCPT ); Wed, 18 Sep 2019 02:24:19 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 413BD21920; Wed, 18 Sep 2019 06:24:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568787858; bh=QX/ODPROnjlsW3TmwXUDdR+0VJvtH9gWlChsXjJEceg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=asga7kpF6+EzMTm/etNTLuRky8/wegQbz1qb2UnSGTeikJIGTDdXlBa5govZnvcy6 EIi/57elGMOdvPyY1JX+6TJmWxVujfyFmCcJBqJcD6uBwtYt21EwHo2pVHe3l9et1x HfrDWfMxqDz1oPk8Wy/ibrWyaPHeRgMM+Frle7gI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Eric Dumazet , John Fastabend , "David S. Miller" Subject: [PATCH 5.2 10/85] net: sched: fix reordering issues Date: Wed, 18 Sep 2019 08:18:28 +0200 Message-Id: <20190918061234.460383725@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190918061234.107708857@linuxfoundation.org> References: <20190918061234.107708857@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric Dumazet [ Upstream commit b88dd52c62bb5c5d58f0963287f41fd084352c57 ] Whenever MQ is not used on a multiqueue device, we experience serious reordering problems. Bisection found the cited commit. The issue can be described this way : - A single qdisc hierarchy is shared by all transmit queues. (eg : tc qdisc replace dev eth0 root fq_codel) - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting a different transmit queue than the one used to build a packet train, we stop building the current list and save the 'bad' skb (P1) in a special queue. (bad_txq) - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this skb (P1), it checks if the associated transmit queues is still in frozen state. If the queue is still blocked (by BQL or NIC tx ring full), we leave the skb in bad_txq and return NULL. - dequeue_skb() calls q->dequeue() to get another packet (P2) The other packet can target the problematic queue (that we found in frozen state for the bad_txq packet), but another cpu just ran TX completion and made room in the txq that is now ready to accept new packets. - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent at next round. In practice P2 is the lead of a big packet train (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/ To solve this problem, we have to block the dequeue process as long as the first packet in bad_txq can not be sent. Reordering issues disappear and no side effects have been seen. Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Signed-off-by: Eric Dumazet Cc: John Fastabend Acked-by: John Fastabend Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_generic.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -46,6 +46,8 @@ EXPORT_SYMBOL(default_qdisc_ops); * - updates to tree and tree walking are only done under the rtnl mutex. */ +#define SKB_XOFF_MAGIC ((struct sk_buff *)1UL) + static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q) { const struct netdev_queue *txq = q->dev_queue; @@ -71,7 +73,7 @@ static inline struct sk_buff *__skb_dequ q->q.qlen--; } } else { - skb = NULL; + skb = SKB_XOFF_MAGIC; } } @@ -253,8 +255,11 @@ validate: return skb; skb = qdisc_dequeue_skb_bad_txq(q); - if (unlikely(skb)) + if (unlikely(skb)) { + if (skb == SKB_XOFF_MAGIC) + return NULL; goto bulk; + } skb = q->dequeue(q); if (skb) { bulk: