netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Stanley <joel@jms.id.au>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dylan Hung <dylan_hung@aspeedtech.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Po-Yu Chuang <ratbert@faraday-tech.com>,
	linux-aspeed <linux-aspeed@lists.ozlabs.org>,
	OpenBMC Maillist <openbmc@lists.ozlabs.org>,
	BMC-SW <BMC-SW@aspeedtech.com>
Subject: Re: [PATCH 1/4] ftgmac100: Fix race issue on TX descriptor[0]
Date: Tue, 20 Oct 2020 07:13:05 +0000	[thread overview]
Message-ID: <CACPK8Xdyw7eSw6RqJcku_PHANviovmkbjNSONAB4_K+_+Tt1KQ@mail.gmail.com> (raw)
In-Reply-To: <d5c4682b3e049f7dac66b17e7a726b8c20ee5789.camel@kernel.crashing.org>

On Tue, 20 Oct 2020 at 06:23, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2020-10-20 at 04:13 +0000, Joel Stanley wrote:
> > On Mon, 19 Oct 2020 at 23:20, Benjamin Herrenschmidt
> > <benh@kernel.crashing.org> wrote:
> > >
> > > On Mon, 2020-10-19 at 16:57 +0800, Dylan Hung wrote:
> > > > These rules must be followed when accessing the TX descriptor:
> > > >
> > > > 1. A TX descriptor is "cleanable" only when its value is non-zero
> > > > and the owner bit is set to "software"
> > >
> > > Can you elaborate ? What is the point of that change ? The owner
> > > bit
> > > should be sufficient, why do we need to check other fields ?
> >
> > I would like Dylan to clarify too. The datasheet has a footnote below
> > the descriptor layout:
> >
> >  - TXDES#0: Bits 27 ~ 14 are valid only when FTS = 1
> >  - TXDES#1: Bits 31 ~ 0 are valid only when FTS = 1
> >
> > So the ownership bit (31) is not valid unless FTS is set. However,
> > this isn't what his patch does. It adds checks for EDOTR.
>
> No I think it adds a check for everything except EDOTR which just marks
> the end of ring and needs to be ignored in the comparison.

Of course. I missed the invert.

I did some testing with just this patch (and "[4/4] ftgmac100: Restart
MAC HW once") from Dylan. It seemed to resolve the hang, but there
were occasional retries. Putting in some tracing I only hit the
condition in ftgmac100_tx_complete_packet, never in
ftgmac100_hard_start_xmit.

> That said, we do need a better explanation.
>
> One potential bug I did find by looking at my code however is:
>
> static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv)
> {
>         struct net_device *netdev = priv->netdev;
>         struct ftgmac100_txdes *txdes;
>         struct sk_buff *skb;
>         unsigned int pointer;
>         u32 ctl_stat;
>
>         pointer = priv->tx_clean_pointer;
>         txdes = &priv->txdes[pointer];
>
>         ctl_stat = le32_to_cpu(txdes->txdes0);
>         if (ctl_stat & FTGMAC100_TXDES0_TXDMA_OWN)
>                 return false;
>
>         skb = priv->tx_skbs[pointer];
>         netdev->stats.tx_packets++;
>         netdev->stats.tx_bytes += skb->len;
>         ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
>         txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);
>
>   ^^^^ There should probably be an smp_wmb() here to ensure that all the above
> stores are visible before the tx clean pointer is updated.
>
>         priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);
>
>         return true;
> }
>
> Similarly we probablu should have one before setting tx_pointer in start_xmit().

I added the two smp_wmb you suggested (with only 4/4 applied). This
did the trick; iperf on a gigabit link is running well with no
retries.

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
b/drivers/net/ethernet/faraday/ftgmac100.c
index 331d4bdd4a67..15cdfeb135b0 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -653,6 +653,11 @@ static bool ftgmac100_tx_complete_packet(struct
ftgmac100 *priv)
     ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat);
     txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);

+    /* Ensure the descriptor config is visible before setting the tx
+     * pointer.
+     */
+    smp_wmb();
+
     priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);

     return true;
@@ -806,6 +811,11 @@ static netdev_tx_t
ftgmac100_hard_start_xmit(struct sk_buff *skb,
     dma_wmb();
     first->txdes0 = cpu_to_le32(f_ctl_stat);

+    /* Ensure the descriptor config is visible before setting the tx
+     * pointer.
+     */
+    smp_wmb();
+
     /* Update next TX pointer */
     priv->tx_pointer = pointer;

I left the test running while writing this email and I did start to
see some retries. I'm not sure if that's because my laptop is one of
the test machines, or if we have another issue.

I will do some further testing over night.

Cheers,

Joel

>
> As for the read side of this, I'm not 100% sure, I'll have to think more about
> it, it *think* the existing barriers are sufficient at first sight.
>
> Cheers,
> Ben.
>
> > >
> > > > 2. A TX descriptor is "writable" only when its value is zero
> > > > regardless the edotr mask.
> > >
> > > Again, why is that ? Can you elaborate ? What race are you trying
> > > to
> > > address here ?
> > >
> > > Cheers,
> > > Ben.
> > >
> > > > Fixes: 52c0cae87465 ("ftgmac100: Remove tx descriptor accessors")
> > > > Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
> > > > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > > > ---
> > > >  drivers/net/ethernet/faraday/ftgmac100.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > > > b/drivers/net/ethernet/faraday/ftgmac100.c
> > > > index 00024dd41147..7cacbe4aecb7 100644
> > > > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > > > @@ -647,6 +647,9 @@ static bool
> > > > ftgmac100_tx_complete_packet(struct
> > > > ftgmac100 *priv)
> > > >       if (ctl_stat & FTGMAC100_TXDES0_TXDMA_OWN)
> > > >               return false;
> > > >
> > > > +     if ((ctl_stat & ~(priv->txdes0_edotr_mask)) == 0)
> > > > +             return false;
> > > > +
> > > >       skb = priv->tx_skbs[pointer];
> > > >       netdev->stats.tx_packets++;
> > > >       netdev->stats.tx_bytes += skb->len;
> > > > @@ -756,6 +759,9 @@ static netdev_tx_t
> > > > ftgmac100_hard_start_xmit(struct sk_buff *skb,
> > > >       pointer = priv->tx_pointer;
> > > >       txdes = first = &priv->txdes[pointer];
> > > >
> > > > +     if (le32_to_cpu(txdes->txdes0) & ~priv->txdes0_edotr_mask)
> > > > +             goto drop;
> > > > +
> > > >       /* Setup it up with the packet head. Don't write the head
> > > > to
> > > > the
> > > >        * ring just yet
> > > >        */
> > > > @@ -787,6 +793,10 @@ static netdev_tx_t
> > > > ftgmac100_hard_start_xmit(struct sk_buff *skb,
> > > >               /* Setup descriptor */
> > > >               priv->tx_skbs[pointer] = skb;
> > > >               txdes = &priv->txdes[pointer];
> > > > +
> > > > +             if (le32_to_cpu(txdes->txdes0) & ~priv-
> > > > > txdes0_edotr_mask)
> > > >
> > > > +                     goto dma_err;
> > > > +
> > > >               ctl_stat = ftgmac100_base_tx_ctlstat(priv,
> > > > pointer);
> > > >               ctl_stat |= FTGMAC100_TXDES0_TXDMA_OWN;
> > > >               ctl_stat |= FTGMAC100_TXDES0_TXBUF_SIZE(len);
>

  reply	other threads:[~2020-10-20  7:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19  8:57 [PATCH 0/4] fix ftgmac100 issues on aspeed soc Dylan Hung
2020-10-19  8:57 ` [PATCH 1/4] ftgmac100: Fix race issue on TX descriptor[0] Dylan Hung
2020-10-19 23:19   ` Benjamin Herrenschmidt
2020-10-20  4:13     ` Joel Stanley
2020-10-20  6:23       ` Benjamin Herrenschmidt
2020-10-20  7:13         ` Joel Stanley [this message]
2020-10-19  8:57 ` [PATCH 2/4] ftgmac100: Fix missing-poll issue Dylan Hung
2020-10-19  8:57 ` [PATCH 3/4] ftgmac100: Add a dummy read to ensure running sequence Dylan Hung
2020-10-19 23:25   ` Benjamin Herrenschmidt
2020-10-19  8:57 ` [PATCH 4/4] ftgmac100: Restart MAC HW once Dylan Hung
2020-10-19 23:26   ` Benjamin Herrenschmidt
2020-10-20  4:14   ` Joel Stanley
2021-03-12  0:26     ` Joel Stanley
2021-03-12  0:28       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACPK8Xdyw7eSw6RqJcku_PHANviovmkbjNSONAB4_K+_+Tt1KQ@mail.gmail.com \
    --to=joel@jms.id.au \
    --cc=BMC-SW@aspeedtech.com \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=dylan_hung@aspeedtech.com \
    --cc=kuba@kernel.org \
    --cc=linux-aspeed@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=openbmc@lists.ozlabs.org \
    --cc=ratbert@faraday-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).