From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75D1BC7619A for ; Wed, 5 Apr 2023 22:20:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229608AbjDEWUn (ORCPT ); Wed, 5 Apr 2023 18:20:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229815AbjDEWUm (ORCPT ); Wed, 5 Apr 2023 18:20:42 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BFDD198 for ; Wed, 5 Apr 2023 15:20:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E114260BFE for ; Wed, 5 Apr 2023 22:20:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D2A8C433D2; Wed, 5 Apr 2023 22:20:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680733240; bh=8WIKf/lhOSq4UZQnUDxG0ueB9hFQUlHsa6oJ0QOxsbY=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=BPBP0y7MeU1+bqslSYDftV6gyy9c0YZIx794GGLhWninG2VmkU6tFljGtU/dlIUlZ dHkchLtv8Ln0FRRIPqtvyB+8aXT8aAIypVAQs3/0sd7nEjMNS3WzclWgA4QwMqSJG7 6hSx9MZByBi1Lprp2eChqjg3R+0wCiK6cZ8dIEGcbiLB7A2C2tzEP394gd8sZUzmhS +0g0A0RCKtlZp9gXWG/RDRr720pPMKx1VYW3ZCXKyx+VHiGGQs/vszaECAnEGILvqB 197glgtBNG5phI/J2+fkUjWKTGqt6Fr9G75pTJJDT+s9ab+mdb4XgNlf/9PqyItigI gDOVw7x63ZH9g== Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id C872515404B4; Wed, 5 Apr 2023 15:20:39 -0700 (PDT) Date: Wed, 5 Apr 2023 15:20:39 -0700 From: "Paul E. McKenney" To: Alexander Duyck Cc: Jakub Kicinski , Heiner Kallweit , davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com, pabeni@redhat.com, Herbert Xu Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Message-ID: <1e9bbdde-df97-4319-a4b7-e426c4351317@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20230401051221.3160913-1-kuba@kernel.org> <20230401051221.3160913-2-kuba@kernel.org> <20230401115854.371a5b4c@kernel.org> <20230403085601.44f04cd2@kernel.org> <20230403120345.0c02232c@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, Apr 03, 2023 at 01:27:44PM -0700, Alexander Duyck wrote: > On Mon, Apr 3, 2023 at 12:03 PM Jakub Kicinski wrote: > > > > On Mon, 3 Apr 2023 11:11:35 -0700 Alexander Duyck wrote: > > > On Mon, Apr 3, 2023 at 8:56 AM Jakub Kicinski wrote: > > > > I don't think in terms of flushes. Let me add line numbers to the > > > > producer and the consumer. > > > > > > > > c1. WRITE cons > > > > c2. mb() # A > > > > c3. READ stopped > > > > c4. rmb() # C > > > > c5. READ prod, cons > > > > > > > > p1. WRITE prod > > > > p2. READ prod, cons > > > > p3. mb() # B > > > > p4. WRITE stopped > > > > p5. READ prod, cons > > > > > > > > The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb() > > > > orders c3 and c5 vs p1 and p4. Let me impenitently add Paul.. > > > > > > So which function is supposed to be consumer vs producer here? > > > > producer is xmit consumer is NAPI > > > > > I think your write stopped is on the wrong side of the memory barrier. > > > It should be writing prod and stopped both before the barrier. > > > > Indeed, Paul pointed out over chat that we need two barriers there > > to be correct :( Should be fine in practice, first one is BQL, > > second one is on the slow path. > > > > > The maybe/try stop should essentially be: > > > 1. write tail > > > 2. read prod/cons > > > 3. if unused >= 1x packet > > > 3.a return > > > > > > 4. set stop > > > 5. mb() > > > 6. Re-read prod/cons > > > 7. if unused >= 1x packet > > > 7.a. test_and_clear stop > > > > > > The maybe/try wake would be: > > > 1. write head > > > 2. read prod/cons > > > 3. if consumed == 0 || unused < 2x packet > > > 3.a. return > > > > > > 4. mb() > > > 5. test_and_clear stop > > > > > > > > One other thing to keep in mind is that the wake gives itself a pretty > > > > > good runway. We are talking about enough to transmit at least 2 > > > > > frames. So if another consumer is stopping it we aren't waking it > > > > > unless there is enough space for yet another frame after the current > > > > > consumer. > > > > > > > > Ack, the race is very unlikely, basically the completing CPU would have > > > > to take an expensive IRQ between checking the descriptor count and > > > > checking if stopped -- to let the sending CPU queue multiple frames. > > > > > > > > But in theory the race is there, right? > > > > > > I don't think this is so much a race as a skid. Specifically when we > > > wake the queue it will only run for one more packet in such a > > > scenario. I think it is being run more like a flow control threshold > > > rather than some sort of lock. > > > > > > I think I see what you are getting at though. Basically if the xmit > > > function were to cycle several times between steps 3.a and 4 in the > > > maybe/try wake it could fill the queue and then trigger the wake even > > > though the queue is full and the unused space was already consumed. > > > > Yup, exactly. So we either need to sprinkle a couple more barriers > > and tests in, or document that the code is only 99.999999% safe > > against false positive restarts and drivers need to check for ring > > full at the beginning of xmit. > > > > I'm quite tempted to add the barriers, because on the NAPI/consumer > > side we could use this as an opportunity to start piggy backing on > > the BQL barrier. > > The thing is the more barriers we add the more it will hurt > performance. I'd be tempted to just increase the runway we have as we > could afford a 1 packet skid if we had a 2 packet runway for the > start/stop thresholds. > > I suspect that is probably why we haven't seen any issues as the > DESC_NEEDED is pretty generous since it is assuming worst case > scenarios. Mightn't preemption or interrupts cause further issues? Or are preemption and/or interrupts disabled across the relevant sections of code? Thanx, Paul