All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Chris Packham <Chris.Packham@alliedtelesis.co.nz>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	"linux-spi@vger.kernel.org" <linux-spi@vger.kernel.org>,
	"broonie@kernel.org" <broonie@kernel.org>,
	"paulus@samba.org" <paulus@samba.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: fsl_espi errors on v5.7.15
Date: Fri, 4 Sep 2020 19:23:05 +0200	[thread overview]
Message-ID: <CAFSsGVvRMQoEoBN1hpao_4uM1yF2wwuKPbMkAcwFWyE1X8HDbQ@mail.gmail.com> (raw)
In-Reply-To: <6054f0ec-d994-105b-6399-6cdb65ddd1b6@alliedtelesis.co.nz>

[-- Attachment #1: Type: text/plain, Size: 10949 bytes --]

On Fri 4. Sep 2020 at 01:58, Chris Packham <
Chris.Packham@alliedtelesis.co.nz> wrote:

>
>
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
>
> > Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>
> >> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>
> >>> On 30.08.2020 23:59, Chris Packham wrote:
>
> >>>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>
> >>>>> On 30.08.2020 23:00, Chris Packham wrote:
>
> >>>>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>
> >>>>>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>
> >>>>>> <snip>
>
> >>>>>>
>
> >>>>>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>
> >>>>>>>>>>>>
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> >>>>>>>>>>>>
>
> >>>>>>>>>>>> With my current workaround of emptying the RX FIFO. It seems
>
> >>>>>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra
> byte in the
>
> >>>>>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a
> READ_FSR.
>
> >>>>>>>>>>>>
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 70
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 03
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 05
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 00
>
> >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03
>
> >>>>>>>>>>>>
>
> >>>>>>>>>>>>      From all the Micron SPI-NOR datasheets I've got access
> to it is
>
> >>>>>>>>>>>> possible to continually read the SR/FSR. But I've no idea why
> it
>
> >>>>>>>>>>>> happens some times and not others.
>
> >>>>>>>>>>> So I think I've got a reproduction and I think I've bisected
> the problem
>
> >>>>>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt
> replay in
>
> >>>>>>>>>>> C"). My day is just finishing now so I haven't applied too
> much scrutiny
>
> >>>>>>>>>>> to this result. Given the various rabbit holes I've been down
> on this
>
> >>>>>>>>>>> issue already I'd take this information with a good degree of
> skepticism.
>
> >>>>>>>>>>>
>
> >>>>>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>
> >>>>>>>>>> It doesn't have yet the change you're referring to, and the
> fsl-espi driver
>
> >>>>>>>>>> is basically the same as in 5.7 (just two small changes in 5.7).
>
> >>>>>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related
> patches
>
> >>>>>>>>> around this time that could affect book E, so it's good if that
> exact
>
> >>>>>>>>> patch is confirmed.
>
> >>>>>>>> My confirmation is basically that I can induce the issue in a 5.4
> kernel
>
> >>>>>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue
> in
>
> >>>>>>>> 5.9-rc2 by reverting that one commit.
>
> >>>>>>>>
>
> >>>>>>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>
> >>>>>>>> confirmed the bisection result by building at 3282a3da25bd (which
> sees
>
> >>>>>>>> the issue) and the commit just before (which does not).
>
> >>>>>>> Thanks for testing, that confirms it well.
>
> >>>>>>>
>
> >>>>>>> [snip patch]
>
> >>>>>>>
>
> >>>>>>>> I still saw the issue with this change applied.
> PPC_IRQ_SOFT_MASK_DEBUG
>
> >>>>>>>> didn't report anything (either with or without the change above).
>
> >>>>>>> Okay, it was a bit of a shot in the dark. I still can't see what
>
> >>>>>>> else has changed.
>
> >>>>>>>
>
> >>>>>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>
> >>>>>>> higher interrupt latency?
>
> >>>>>>>
>
> >>>>>>> I don't think the patch should cause significantly worse latency,
>
> >>>>>>> (it's supposed to be a bit better if anything because it doesn't
> set
>
> >>>>>>> up the full interrupt frame). But it's possible.
>
> >>>>>> My working theory is that the SPI_DON indication is all about the TX
>
> >>>>>> direction an now that the interrupts are faster we're hitting an
> error
>
> >>>>>> because there is still RX activity going on. Heiner disagrees with
> my
>
> >>>>>> interpretation of the SPI_DON indication and the fact that it
> doesn't
>
> >>>>>> happen every time does throw doubt on it.
>
> >>>>>>
>
> >>>>> It's right that the eSPI spec can be interpreted that SPI_DON refers
> to
>
> >>>>> TX only. However this wouldn't really make sense, because also for RX
>
> >>>>> we program the frame length, and therefore want to be notified once
> the
>
> >>>>> full frame was received. Also practical experience shows that SPI_DON
>
> >>>>> is set also after RX-only transfers.
>
> >>>>> Typical SPI NOR use case is that you write read command + start
> address,
>
> >>>>> followed by a longer read. If the TX-only interpretation would be
> right,
>
> >>>>> we'd always end up with SPI_DON not being set.
>
> >>>>>
>
> >>>>>> I can't really explain the extra RX byte in the fifo. We know how
> many
>
> >>>>>> bytes to expect and we pull that many from the fifo so it's not as
> if
>
> >>>>>> we're missing an interrupt causing us to skip the last byte. I've
> been
>
> >>>>>> looking for some kind of off-by-one calculation but again if it were
>
> >>>>>> something like that it'd happen all the time.
>
> >>>>>>
>
> >>>>> Maybe it helps to know what value this extra byte in the FIFO has.
> Is it:
>
> >>>>> - a duplicate of the last read byte
>
> >>>>> - or the next byte (at <end address> + 1)
>
> >>>>> - or a fixed value, e.g. always 0x00 or 0xff
>
> >>>> The values were up thread a bit but I'll repeat them here
>
> >>>>
>
> >>>> fsl_espi ffe110000.spi: tx 70
>
> >>>> fsl_espi ffe110000.spi: rx 03
>
> >>>> fsl_espi ffe110000.spi: Extra RX 00
>
> >>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>
> >>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> >>>> fsl_espi ffe110000.spi: tx 05
>
> >>>> fsl_espi ffe110000.spi: rx 00
>
> >>>> fsl_espi ffe110000.spi: Extra RX 03
>
> >>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>
> >>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>
> >>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> >>>> fsl_espi ffe110000.spi: tx 05
>
> >>>> fsl_espi ffe110000.spi: rx 00
>
> >>>> fsl_espi ffe110000.spi: Extra RX 03
>
> >>>>
>
> >>>>
>
> >>>> The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them
> with
>
> >>>> either a READ_SR or a READ_FSR. Never a data read.
>
> >>>>
>
> >>> Just remembered something about SPIE_DON:
>
> >>> Transfers are always full duplex, therefore in case of a read the chip
>
> >>> sends dummy zero's. Having said that in case of a read SPIE_DON means
>
> >>> that the last dummy zero was shifted out.
>
> >>>
>
> >>> READ_SR and READ_FSR are the shortest transfers, 1 byte out and 1 byte
> in.
>
> >>> So the issue may have a dependency on the length of the transfer.
>
> >>> However I see no good explanation so far. You can try adding a delay of
>
> >>> a few miroseconds between the following to commands in fsl_espi_bufs().
>
> >>>
>
> >>>     fsl_espi_write_reg(espi, ESPI_SPIM, mask);
>
> >>>
>
> >>>     /* Prevent filling the fifo from getting interrupted */
>
> >>>     spin_lock_irq(&espi->lock);
>
> >>>
>
> >>> Maybe enabling interrupts and seeing the SPIE_DON interrupt are too
> close.
>
> >> I think this might be heading in the right direction. Playing about with
>
> >> a delay does seem to make the two symptoms less likely. Although I have
>
> >> to set it quite high (i.e. msleep(100)) to completely avoid any
>
> >> possibility of seeing either message.
>
> > The patch might replay the interrupt a little bit faster, but it would
>
> > be a few microseconds at most I think (just from improved code).
>
> >
>
> > Would you be able to ftrace the interrupt handler function and see if you
>
> > can see a difference in number or timing of interrupts? I'm at a bit of
>
> > a loss.
>
>
>
> I tried ftrace but I really wasn't sure what I was looking for.
>
> Capturing a "bad" case was pretty tricky. But I think I've identified a
>
> fix (I'll send it as a proper patch shortly). The gist is
>
>
>
> diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
>
> index 7e7c92cafdbb..cb120b68c0e2 100644
>
> --- a/drivers/spi/spi-fsl-espi.c
>
> +++ b/drivers/spi/spi-fsl-espi.c
>
> @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi
>
> *espi, u32 events)
>
>   static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)
>
>   {
>
>          struct fsl_espi *espi = context_data;
>
> -       u32 events;
>
> +       u32 events, mask;
>
>
>
>          spin_lock(&espi->lock);
>
>
>
>          /* Get interrupt events(tx/rx) */
>
>          events = fsl_espi_read_reg(espi, ESPI_SPIE);
>
> -       if (!events) {
>
> +       mask = fsl_espi_read_reg(espi, ESPI_SPIM);
>
> +       if (!(events & mask)) {
>
>                  spin_unlock(&espi->lock);
>
>                  return IRQ_NONE;
>
>          }
>
>
>
> The SPIE register contains the TXCNT so events is pretty much always
>
> going to have something set. By checking events against what we've
>
> actually requested interrupts for we don't see any spurious events.
>
>
Usually we shouldn’t receive interrupts we’re not interested in, except the
interrupt is shared. This leads to the question: is the SPI interrupt
shared with another device on your system? Do you see spurious interrupts
with the patch under /proc/irq/(irq)/spurious?


>
> I've tested this on the T2080RDB and on our custom hardware and it seems
>
> to resolve the problem.
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 14916 bytes --]

  reply	other threads:[~2020-09-05  8:41 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14  2:48 fsl_espi errors on v5.7.15 Chris Packham
2020-08-14  2:48 ` Chris Packham
2020-08-14  6:19 ` Heiner Kallweit
2020-08-14  6:19   ` Heiner Kallweit
2020-08-16 21:09   ` Chris Packham
2020-08-16 21:09     ` Chris Packham
2020-08-18 22:44     ` Chris Packham
2020-08-18 22:44       ` Chris Packham
2020-08-19  6:15       ` Heiner Kallweit
2020-08-19  6:15         ` Heiner Kallweit
2020-08-19 21:08         ` Chris Packham
2020-08-19 21:08           ` Chris Packham
2020-08-24 22:04           ` Chris Packham
2020-08-24 22:04             ` Chris Packham
2020-08-25  3:54             ` Chris Packham
2020-08-25  3:54               ` Chris Packham
2020-08-25  7:22               ` Heiner Kallweit
2020-08-25  7:22                 ` Heiner Kallweit
2020-08-25 22:22                 ` Chris Packham
2020-08-25 22:22                   ` Chris Packham
2020-08-26  1:48                   ` Chris Packham
2020-08-26  1:48                     ` Chris Packham
2020-08-26  6:07                     ` Chris Packham
2020-08-26  6:07                       ` Chris Packham
2020-08-26  6:38                       ` Heiner Kallweit
2020-08-26  6:38                         ` Heiner Kallweit
2020-08-26 22:26                         ` Chris Packham
2020-08-26 22:26                           ` Chris Packham
2020-08-27  7:12                         ` Nicholas Piggin
2020-08-27  7:12                           ` Nicholas Piggin
2020-08-27 22:07                           ` Chris Packham
2020-08-27 22:07                             ` Chris Packham
2020-08-30 12:30                             ` Nicholas Piggin
2020-08-30 12:30                               ` Nicholas Piggin
2020-08-30 13:55                               ` Heiner Kallweit
2020-08-30 13:55                                 ` Heiner Kallweit
2020-08-30 21:00                               ` Chris Packham
2020-08-30 21:00                                 ` Chris Packham
2020-08-30 21:41                                 ` Heiner Kallweit
2020-08-30 21:41                                   ` Heiner Kallweit
2020-08-30 21:59                                   ` Chris Packham
2020-08-30 21:59                                     ` Chris Packham
2020-08-31 12:33                                     ` Heiner Kallweit
2020-08-31 12:33                                       ` Heiner Kallweit
2020-09-01  1:25                                       ` Chris Packham
2020-09-01  1:25                                         ` Chris Packham
2020-09-01  6:14                                         ` Nicholas Piggin
2020-09-01  6:14                                           ` Nicholas Piggin
2020-09-01 23:29                                           ` Chris Packham
2020-09-01 23:29                                             ` Chris Packham
2020-09-01 23:31                                             ` Chris Packham
2020-09-01 23:31                                               ` Chris Packham
2020-09-03 23:58                                           ` Chris Packham
2020-09-03 23:58                                             ` Chris Packham
2020-09-04 17:23                                             ` Heiner Kallweit [this message]
2020-09-06 20:59                                               ` Chris Packham
2020-09-06 21:03                                               ` Chris Packham
2020-09-06 21:03                                                 ` Chris Packham
2020-09-07  9:53                                             ` Joakim Tjernlund
2020-09-07  9:53                                               ` Joakim Tjernlund
2020-09-07 15:38                                               ` Joakim Tjernlund
2020-09-07 15:38                                                 ` Joakim Tjernlund
2020-09-01  2:52                                       ` Chris Packham
2020-09-01  2:52                                         ` Chris Packham
2020-09-01  7:38                                         ` Heiner Kallweit
2020-09-01  7:38                                           ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFSsGVvRMQoEoBN1hpao_4uM1yF2wwuKPbMkAcwFWyE1X8HDbQ@mail.gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=Chris.Packham@alliedtelesis.co.nz \
    --cc=broonie@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.