From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_2 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACF99C433E7 for ; Mon, 19 Oct 2020 23:30:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E1B82242A for ; Mon, 19 Oct 2020 23:30:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389065AbgJSXaO (ORCPT ); Mon, 19 Oct 2020 19:30:14 -0400 Received: from kernel.crashing.org ([76.164.61.194]:42630 "EHLO kernel.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389000AbgJSXaO (ORCPT ); Mon, 19 Oct 2020 19:30:14 -0400 Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 09JNNgCF024309 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 19 Oct 2020 18:23:45 -0500 Message-ID: <1a02e57b6b7d425a19dc59f84091c38ca4edcf47.camel@kernel.crashing.org> Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue From: Benjamin Herrenschmidt To: Jakub Kicinski , Joel Stanley Cc: Dylan Hung , "David S . Miller" , netdev@vger.kernel.org, Linux Kernel Mailing List , Po-Yu Chuang , linux-aspeed , OpenBMC Maillist , BMC-SW Date: Tue, 20 Oct 2020 10:23:41 +1100 In-Reply-To: <20201019120040.3152ea0b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> References: <20201019073908.32262-1-dylan_hung@aspeedtech.com> <20201019120040.3152ea0b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2020-10-19 at 12:00 -0700, Jakub Kicinski wrote: > On Mon, 19 Oct 2020 08:57:03 +0000 Joel Stanley wrote: > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > > > b/drivers/net/ethernet/faraday/ftgmac100.c > > > index 00024dd41147..9a99a87f29f3 100644 > > > --- a/drivers/net/ethernet/faraday/ftgmac100.c > > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > > > @@ -804,7 +804,8 @@ static netdev_tx_t > > > ftgmac100_hard_start_xmit(struct sk_buff *skb, > > > * before setting the OWN bit on the first descriptor. > > > */ > > > dma_wmb(); > > > - first->txdes0 = cpu_to_le32(f_ctl_stat); > > > + WRITE_ONCE(first->txdes0, cpu_to_le32(f_ctl_stat)); > > > + READ_ONCE(first->txdes0); > > > > I understand what you're trying to do here, but I'm not sure that > > this > > is the correct way to go about it. > > > > It does cause the compiler to produce a store and then a load. > > +1 @first is system memory from dma_alloc_coherent(), right? > > You shouldn't have to do this. Is coherent DMA memory broken > on your platform? I suspect the problem is that the HW (and yes this would be a HW bug) doesn't order the CPU -> memory and the CPU -> MMIO path. What I think happens is that the store to txde0 is potentially still in a buffer somewhere on its way to memory, gets bypassed by the store to MMIO, causing the MAC to try to read the descriptor, and getting the "old" data from memory. It's ... fishy, but that isn't the first time an Aspeed chip has that type of bug (there's a similar one in the USB device controler iirc). Cheers, Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BE9FC433DF for ; Mon, 19 Oct 2020 23:25:33 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AEC89208B8 for ; Mon, 19 Oct 2020 23:25:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AEC89208B8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=openbmc-bounces+openbmc=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CFXtL0JD2zDqTp for ; Tue, 20 Oct 2020 10:25:30 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=permerror (SPF Permanent Error: Unknown mechanism found: ip:192.40.192.88/32) smtp.mailfrom=kernel.crashing.org (client-ip=76.164.61.194; helo=kernel.crashing.org; envelope-from=benh@kernel.crashing.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Received: from kernel.crashing.org (kernel.crashing.org [76.164.61.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4CFXrh72fczDqMg; Tue, 20 Oct 2020 10:24:04 +1100 (AEDT) Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 09JNNgCF024309 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 19 Oct 2020 18:23:45 -0500 Message-ID: <1a02e57b6b7d425a19dc59f84091c38ca4edcf47.camel@kernel.crashing.org> Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue From: Benjamin Herrenschmidt To: Jakub Kicinski , Joel Stanley Date: Tue, 20 Oct 2020 10:23:41 +1100 In-Reply-To: <20201019120040.3152ea0b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> References: <20201019073908.32262-1-dylan_hung@aspeedtech.com> <20201019120040.3152ea0b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: openbmc@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development list for OpenBMC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: BMC-SW , Po-Yu Chuang , linux-aspeed , netdev@vger.kernel.org, OpenBMC Maillist , Linux Kernel Mailing List , Dylan Hung , "David S . Miller" Errors-To: openbmc-bounces+openbmc=archiver.kernel.org@lists.ozlabs.org Sender: "openbmc" On Mon, 2020-10-19 at 12:00 -0700, Jakub Kicinski wrote: > On Mon, 19 Oct 2020 08:57:03 +0000 Joel Stanley wrote: > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > > > b/drivers/net/ethernet/faraday/ftgmac100.c > > > index 00024dd41147..9a99a87f29f3 100644 > > > --- a/drivers/net/ethernet/faraday/ftgmac100.c > > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > > > @@ -804,7 +804,8 @@ static netdev_tx_t > > > ftgmac100_hard_start_xmit(struct sk_buff *skb, > > > * before setting the OWN bit on the first descriptor. > > > */ > > > dma_wmb(); > > > - first->txdes0 = cpu_to_le32(f_ctl_stat); > > > + WRITE_ONCE(first->txdes0, cpu_to_le32(f_ctl_stat)); > > > + READ_ONCE(first->txdes0); > > > > I understand what you're trying to do here, but I'm not sure that > > this > > is the correct way to go about it. > > > > It does cause the compiler to produce a store and then a load. > > +1 @first is system memory from dma_alloc_coherent(), right? > > You shouldn't have to do this. Is coherent DMA memory broken > on your platform? I suspect the problem is that the HW (and yes this would be a HW bug) doesn't order the CPU -> memory and the CPU -> MMIO path. What I think happens is that the store to txde0 is potentially still in a buffer somewhere on its way to memory, gets bypassed by the store to MMIO, causing the MAC to try to read the descriptor, and getting the "old" data from memory. It's ... fishy, but that isn't the first time an Aspeed chip has that type of bug (there's a similar one in the USB device controler iirc). Cheers, Ben.