All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Arnd Bergmann <arnd@arndb.de>, Dylan Hung <dylan_hung@aspeedtech.com>
Cc: Jakub Kicinski <kuba@kernel.org>, Joel Stanley <joel@jms.id.au>,
	"David S . Miller" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Po-Yu Chuang <ratbert@faraday-tech.com>,
	linux-aspeed <linux-aspeed@lists.ozlabs.org>,
	OpenBMC Maillist <openbmc@lists.ozlabs.org>,
	BMC-SW <BMC-SW@aspeedtech.com>
Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue
Date: Wed, 21 Oct 2020 09:10:02 +1100	[thread overview]
Message-ID: <32bfb619bbb3cd6f52f9e5da205673702fed228f.camel@kernel.crashing.org> (raw)
In-Reply-To: <CAK8P3a2pEfbLDWTppVHmGxXduOWPCwBw-8bMY9h3EbEecsVfTA@mail.gmail.com>

On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote:
> On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung <dylan_hung@aspeedtech.com> wrote:
> > > +1 @first is system memory from dma_alloc_coherent(), right?
> > > 
> > > You shouldn't have to do this. Is coherent DMA memory broken on your
> > > platform?
> > 
> > It is about the arbitration on the DRAM controller.  There are two queues in the dram controller, one is for the CPU access and the other is for the HW engines.
> > When CPU issues a store command, the dram controller just acknowledges cpu's request and pushes the request into the queue.  Then CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA memory.
> > But since the cpu's request may still stay in the queue, the HW engine may fetch the wrong data.

Actually, I take back what I said earlier, the above seems to imply
this is more generic.

Dylan, please confirm, does this affect *all* DMA capable devices ? If
yes, then it's a really really bad design bug in your chips
unfortunately and the proper fix is indeed to make dma_wmb() do a dummy
read of some sort (what address though ? would any dummy non-cachable
page do ?) to force the data out as *all* drivers will potentially be
affected.

I was under the impression that it was a specific timing issue in the
vhub and ethernet parts, but if it's more generic then it needs to be
fixed globally.

> There is still something missing in the explanation: The iowrite32()
> only tells the
> device that it should check the queue, but not where the data is. I would expect
> the device to either see the correct data that was marked valid by the
> 'dma_wmb();first->txdes0 = cpu_to_le32(f_ctl_stat);' operation, or it would see
> the old f_ctl_stat value telling it that the data is not yet valid and
> not look at
> the rest of the descriptor. In the second case you would see the data
> not getting sent out until the next start_xmit(), but the device should not
> fetch wrong data.
> 
> There are two possible scenarios in which your patch would still help:
> 
> a) the dma_wmb() does not serialize the stores as seen by DMA the
>     way it is supposed to, so the device can observe the new value of txdec0
>     before it observes the correct data.
> 
> b) The txdes0 field sometimes contains stale data that marks the
>     descriptor as valid before the correct data is written. This field
>     should have been set in ftgmac100_tx_complete_packet() earlier
> 
> If either of the two is the case, then the READ_ONCE() would just
> introduce a long delay before the iowrite32() that makes it more likely
> that the data is there, but the inconsistent state would still be observable
> by the device if it is still working on previous frames.

I think it just get stuck until we try another packet, ie, it doesn't
see the new descriptor valid bit. But Dylan can elaborate.

Cheers,
Ben.



WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Arnd Bergmann <arnd@arndb.de>, Dylan Hung <dylan_hung@aspeedtech.com>
Cc: BMC-SW <BMC-SW@aspeedtech.com>,
	linux-aspeed <linux-aspeed@lists.ozlabs.org>,
	Po-Yu Chuang <ratbert@faraday-tech.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	OpenBMC Maillist <openbmc@lists.ozlabs.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue
Date: Wed, 21 Oct 2020 09:10:02 +1100	[thread overview]
Message-ID: <32bfb619bbb3cd6f52f9e5da205673702fed228f.camel@kernel.crashing.org> (raw)
In-Reply-To: <CAK8P3a2pEfbLDWTppVHmGxXduOWPCwBw-8bMY9h3EbEecsVfTA@mail.gmail.com>

On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote:
> On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung <dylan_hung@aspeedtech.com> wrote:
> > > +1 @first is system memory from dma_alloc_coherent(), right?
> > > 
> > > You shouldn't have to do this. Is coherent DMA memory broken on your
> > > platform?
> > 
> > It is about the arbitration on the DRAM controller.  There are two queues in the dram controller, one is for the CPU access and the other is for the HW engines.
> > When CPU issues a store command, the dram controller just acknowledges cpu's request and pushes the request into the queue.  Then CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA memory.
> > But since the cpu's request may still stay in the queue, the HW engine may fetch the wrong data.

Actually, I take back what I said earlier, the above seems to imply
this is more generic.

Dylan, please confirm, does this affect *all* DMA capable devices ? If
yes, then it's a really really bad design bug in your chips
unfortunately and the proper fix is indeed to make dma_wmb() do a dummy
read of some sort (what address though ? would any dummy non-cachable
page do ?) to force the data out as *all* drivers will potentially be
affected.

I was under the impression that it was a specific timing issue in the
vhub and ethernet parts, but if it's more generic then it needs to be
fixed globally.

> There is still something missing in the explanation: The iowrite32()
> only tells the
> device that it should check the queue, but not where the data is. I would expect
> the device to either see the correct data that was marked valid by the
> 'dma_wmb();first->txdes0 = cpu_to_le32(f_ctl_stat);' operation, or it would see
> the old f_ctl_stat value telling it that the data is not yet valid and
> not look at
> the rest of the descriptor. In the second case you would see the data
> not getting sent out until the next start_xmit(), but the device should not
> fetch wrong data.
> 
> There are two possible scenarios in which your patch would still help:
> 
> a) the dma_wmb() does not serialize the stores as seen by DMA the
>     way it is supposed to, so the device can observe the new value of txdec0
>     before it observes the correct data.
> 
> b) The txdes0 field sometimes contains stale data that marks the
>     descriptor as valid before the correct data is written. This field
>     should have been set in ftgmac100_tx_complete_packet() earlier
> 
> If either of the two is the case, then the READ_ONCE() would just
> introduce a long delay before the iowrite32() that makes it more likely
> that the data is there, but the inconsistent state would still be observable
> by the device if it is still working on previous frames.

I think it just get stuck until we try another packet, ie, it doesn't
see the new descriptor valid bit. But Dylan can elaborate.

Cheers,
Ben.



  reply	other threads:[~2020-10-20 22:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19  7:39 [PATCH] net: ftgmac100: Fix missing TX-poll issue Dylan Hung
2020-10-19  7:39 ` Dylan Hung
2020-10-19  8:57 ` Joel Stanley
2020-10-19  8:57   ` Joel Stanley
2020-10-19  9:19   ` Dylan Hung
2020-10-19  9:19     ` Dylan Hung
2020-10-19 19:00   ` Jakub Kicinski
2020-10-19 19:00     ` Jakub Kicinski
2020-10-19 23:23     ` Benjamin Herrenschmidt
2020-10-19 23:23       ` Benjamin Herrenschmidt
2020-10-20  2:57       ` Jakub Kicinski
2020-10-20  2:57         ` Jakub Kicinski
2020-10-20  6:15         ` Benjamin Herrenschmidt
2020-10-20  6:15           ` Benjamin Herrenschmidt
2020-10-20 17:24           ` Jakub Kicinski
2020-10-20 17:24             ` Jakub Kicinski
2020-10-20  6:14     ` Dylan Hung
2020-10-20  6:14       ` Dylan Hung
2020-10-20 13:15       ` David Laight
2020-10-20 13:15         ` David Laight
2020-10-20 22:05         ` Benjamin Herrenschmidt
2020-10-20 22:05           ` Benjamin Herrenschmidt
2020-10-20 19:49       ` Arnd Bergmann
2020-10-20 22:10         ` Benjamin Herrenschmidt [this message]
2020-10-20 22:10           ` Benjamin Herrenschmidt
2020-10-20 22:25           ` Andrew Jeffery
2020-10-23 13:08             ` Dylan Hung
2020-10-26 22:21               ` Benjamin Herrenschmidt
2020-10-27  2:18                 ` Joel Stanley
2020-10-27  2:18                   ` Joel Stanley
2020-10-21  7:16           ` Arnd Bergmann
2020-10-21 12:11             ` Arnd Bergmann
2020-10-21 12:11               ` Arnd Bergmann
2020-10-22  7:40               ` Benjamin Herrenschmidt
2020-10-22  7:40                 ` Benjamin Herrenschmidt
2020-10-23  8:39                 ` Arnd Bergmann
2020-10-23  8:39                   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32bfb619bbb3cd6f52f9e5da205673702fed228f.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=BMC-SW@aspeedtech.com \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=dylan_hung@aspeedtech.com \
    --cc=joel@jms.id.au \
    --cc=kuba@kernel.org \
    --cc=linux-aspeed@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=openbmc@lists.ozlabs.org \
    --cc=ratbert@faraday-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.