From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D6F4C43461 for ; Sat, 5 Sep 2020 08:41:27 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CDBF92074D for ; Sat, 5 Sep 2020 08:41:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YSI+ZBRC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CDBF92074D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4Bk7LW4Ft9zDqwW for ; Sat, 5 Sep 2020 18:41:23 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2a00:1450:4864:20::432; helo=mail-wr1-x432.google.com; envelope-from=hkallweit1@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=YSI+ZBRC; dkim-atps=neutral Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4BjkzG1VpxzDqpM for ; Sat, 5 Sep 2020 03:23:20 +1000 (AEST) Received: by mail-wr1-x432.google.com with SMTP id k15so7451254wrn.10 for ; Fri, 04 Sep 2020 10:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=E8FqCt6t38dImRby4iUwfizmSh/d4KR9tnYbKg3B/eQ=; b=YSI+ZBRCs/5Nvq8g5P85uNlZ+Qjvymf0Z8KPoBQev9L+hoIp/gIyFp47CoP/ZU0GvA QYqx41dk9zz/luwPsOkcA1VBBK5q2nslMBJQvTP42vbRLOi6ePbK/umrncp1iPRpDVz0 LGXRwXdJetqXllgb68v5tE+2zOGWusfwjUwCCdoHQME1k6k2mGe5PzDS6m0MS+G2CxxP 13PX7HV3tg8KghDnKOa97xNbTCfiDnzWF91xNUPrZQlIR+XOHiWiYf7Gn3NM+GP8s6Wg 63zWRAE1L7e7louDXaGb3oc2Ew26tKEbrC7WkpDsPBrUYryK7dY3Xu1ooXnU7kMVrw/g +pow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E8FqCt6t38dImRby4iUwfizmSh/d4KR9tnYbKg3B/eQ=; b=KRU6JDMt1MJqnmfoT1vL77DD1Cfe5dxHdQivS2ftTi90iNI5wQR24o56EUHQ2CEy2s 2VlXEF4xlBq7gQgqiND6qCeiSd/LWkAsUVNmPo2dkY3UhJFh2UL3pwgKf27sF5SPzVdE k/hAP+2gKymIcSYdLpIx9fp/tOMgByisoTg1Nw5jj075U52GApBfqLpU85jlBIGCC/VN 2ZyXmXcsl/Z1Um3aA0/PpG551Sx0K6v3wDVVzSjjVI1sapa74tTU+3IY1EArhdYWmjeX /V9X9dJnT7ITB1BC6RC017SozxO1DKo1iPXUWX3L58ZlbMr5Dld2CL9og9NEJchFelsm dqHQ== X-Gm-Message-State: AOAM531aN3IU08FsTzCBRUwF8+EHdZI1DYbMM91wweRXFUBB6YoIOpI1 HH3sXQuSkH04WqlmHNGgWDXLR1dvDqVmHHqh7/Y= X-Google-Smtp-Source: ABdhPJzZbm/lXPnbc0DhAMosvgWOhh1osZwTonHob0iZla64Gha6jca4b7y6JtA5gIY5gVM+BELGvWmeJvBwqLmh+Ic= X-Received: by 2002:adf:e74d:: with SMTP id c13mr8246340wrn.45.1599240196353; Fri, 04 Sep 2020 10:23:16 -0700 (PDT) MIME-Version: 1.0 References: <1598940515.6e06nwgi0c.astroid@bobo.none> <6054f0ec-d994-105b-6399-6cdb65ddd1b6@alliedtelesis.co.nz> In-Reply-To: <6054f0ec-d994-105b-6399-6cdb65ddd1b6@alliedtelesis.co.nz> From: Heiner Kallweit Date: Fri, 4 Sep 2020 19:23:05 +0200 Message-ID: Subject: Re: fsl_espi errors on v5.7.15 To: Chris Packham Content-Type: multipart/alternative; boundary="000000000000d0723905ae801f03" X-Mailman-Approved-At: Sat, 05 Sep 2020 18:36:43 +1000 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "linux-kernel@vger.kernel.org" , Nicholas Piggin , "linux-spi@vger.kernel.org" , "broonie@kernel.org" , "paulus@samba.org" , "linuxppc-dev@lists.ozlabs.org" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" --000000000000d0723905ae801f03 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri 4. Sep 2020 at 01:58, Chris Packham < Chris.Packham@alliedtelesis.co.nz> wrote: > > > On 1/09/20 6:14 pm, Nicholas Piggin wrote: > > > Excerpts from Chris Packham's message of September 1, 2020 11:25 am: > > >> On 1/09/20 12:33 am, Heiner Kallweit wrote: > > >>> On 30.08.2020 23:59, Chris Packham wrote: > > >>>> On 31/08/20 9:41 am, Heiner Kallweit wrote: > > >>>>> On 30.08.2020 23:00, Chris Packham wrote: > > >>>>>> On 31/08/20 12:30 am, Nicholas Piggin wrote: > > >>>>>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am: > > >>>>>> > > >>>>>> > > >>>>>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080R= DB > > >>>>>>>>>>>> > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren'= t > empty! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32 > > >>>>>>>>>>>> > > >>>>>>>>>>>> With my current workaround of emptying the RX FIFO. It seems > > >>>>>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra > byte in the > > >>>>>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a > READ_FSR. > > >>>>>>>>>>>> > > >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 70 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 03 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren'= t > empty! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 05 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 00 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set= ! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren'= t > empty! > > >>>>>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: tx 05 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: rx 00 > > >>>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 > > >>>>>>>>>>>> > > >>>>>>>>>>>> From all the Micron SPI-NOR datasheets I've got access > to it is > > >>>>>>>>>>>> possible to continually read the SR/FSR. But I've no idea wh= y > it > > >>>>>>>>>>>> happens some times and not others. > > >>>>>>>>>>> So I think I've got a reproduction and I think I've bisected > the problem > > >>>>>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt > replay in > > >>>>>>>>>>> C"). My day is just finishing now so I haven't applied too > much scrutiny > > >>>>>>>>>>> to this result. Given the various rabbit holes I've been down > on this > > >>>>>>>>>>> issue already I'd take this information with a good degree of > skepticism. > > >>>>>>>>>>> > > >>>>>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel. > > >>>>>>>>>> It doesn't have yet the change you're referring to, and the > fsl-espi driver > > >>>>>>>>>> is basically the same as in 5.7 (just two small changes in 5.7= ). > > >>>>>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related > patches > > >>>>>>>>> around this time that could affect book E, so it's good if that > exact > > >>>>>>>>> patch is confirmed. > > >>>>>>>> My confirmation is basically that I can induce the issue in a 5.= 4 > kernel > > >>>>>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue > in > > >>>>>>>> 5.9-rc2 by reverting that one commit. > > >>>>>>>> > > >>>>>>>> I both cases it's not exactly a clean cherry-pick/revert so I al= so > > >>>>>>>> confirmed the bisection result by building at 3282a3da25bd (whic= h > sees > > >>>>>>>> the issue) and the commit just before (which does not). > > >>>>>>> Thanks for testing, that confirms it well. > > >>>>>>> > > >>>>>>> [snip patch] > > >>>>>>> > > >>>>>>>> I still saw the issue with this change applied. > PPC_IRQ_SOFT_MASK_DEBUG > > >>>>>>>> didn't report anything (either with or without the change above)= . > > >>>>>>> Okay, it was a bit of a shot in the dark. I still can't see what > > >>>>>>> else has changed. > > >>>>>>> > > >>>>>>> What would cause this, a lost interrupt? A spurious interrupt? Or > > >>>>>>> higher interrupt latency? > > >>>>>>> > > >>>>>>> I don't think the patch should cause significantly worse latency, > > >>>>>>> (it's supposed to be a bit better if anything because it doesn't > set > > >>>>>>> up the full interrupt frame). But it's possible. > > >>>>>> My working theory is that the SPI_DON indication is all about the = TX > > >>>>>> direction an now that the interrupts are faster we're hitting an > error > > >>>>>> because there is still RX activity going on. Heiner disagrees with > my > > >>>>>> interpretation of the SPI_DON indication and the fact that it > doesn't > > >>>>>> happen every time does throw doubt on it. > > >>>>>> > > >>>>> It's right that the eSPI spec can be interpreted that SPI_DON refer= s > to > > >>>>> TX only. However this wouldn't really make sense, because also for = RX > > >>>>> we program the frame length, and therefore want to be notified once > the > > >>>>> full frame was received. Also practical experience shows that SPI_D= ON > > >>>>> is set also after RX-only transfers. > > >>>>> Typical SPI NOR use case is that you write read command + start > address, > > >>>>> followed by a longer read. If the TX-only interpretation would be > right, > > >>>>> we'd always end up with SPI_DON not being set. > > >>>>> > > >>>>>> I can't really explain the extra RX byte in the fifo. We know how > many > > >>>>>> bytes to expect and we pull that many from the fifo so it's not as > if > > >>>>>> we're missing an interrupt causing us to skip the last byte. I've > been > > >>>>>> looking for some kind of off-by-one calculation but again if it we= re > > >>>>>> something like that it'd happen all the time. > > >>>>>> > > >>>>> Maybe it helps to know what value this extra byte in the FIFO has. > Is it: > > >>>>> - a duplicate of the last read byte > > >>>>> - or the next byte (at + 1) > > >>>>> - or a fixed value, e.g. always 0x00 or 0xff > > >>>> The values were up thread a bit but I'll repeat them here > > >>>> > > >>>> fsl_espi ffe110000.spi: tx 70 > > >>>> fsl_espi ffe110000.spi: rx 03 > > >>>> fsl_espi ffe110000.spi: Extra RX 00 > > >>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! > > >>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! > > >>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32 > > >>>> fsl_espi ffe110000.spi: tx 05 > > >>>> fsl_espi ffe110000.spi: rx 00 > > >>>> fsl_espi ffe110000.spi: Extra RX 03 > > >>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! > > >>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! > > >>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32 > > >>>> fsl_espi ffe110000.spi: tx 05 > > >>>> fsl_espi ffe110000.spi: rx 00 > > >>>> fsl_espi ffe110000.spi: Extra RX 03 > > >>>> > > >>>> > > >>>> The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them > with > > >>>> either a READ_SR or a READ_FSR. Never a data read. > > >>>> > > >>> Just remembered something about SPIE_DON: > > >>> Transfers are always full duplex, therefore in case of a read the chi= p > > >>> sends dummy zero's. Having said that in case of a read SPIE_DON means > > >>> that the last dummy zero was shifted out. > > >>> > > >>> READ_SR and READ_FSR are the shortest transfers, 1 byte out and 1 byt= e > in. > > >>> So the issue may have a dependency on the length of the transfer. > > >>> However I see no good explanation so far. You can try adding a delay = of > > >>> a few miroseconds between the following to commands in fsl_espi_bufs(= ). > > >>> > > >>> fsl_espi_write_reg(espi, ESPI_SPIM, mask); > > >>> > > >>> /* Prevent filling the fifo from getting interrupted */ > > >>> spin_lock_irq(&espi->lock); > > >>> > > >>> Maybe enabling interrupts and seeing the SPIE_DON interrupt are too > close. > > >> I think this might be heading in the right direction. Playing about wi= th > > >> a delay does seem to make the two symptoms less likely. Although I hav= e > > >> to set it quite high (i.e. msleep(100)) to completely avoid any > > >> possibility of seeing either message. > > > The patch might replay the interrupt a little bit faster, but it would > > > be a few microseconds at most I think (just from improved code). > > > > > > Would you be able to ftrace the interrupt handler function and see if y= ou > > > can see a difference in number or timing of interrupts? I'm at a bit of > > > a loss. > > > > I tried ftrace but I really wasn't sure what I was looking for. > > Capturing a "bad" case was pretty tricky. But I think I've identified a > > fix (I'll send it as a proper patch shortly). The gist is > > > > diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c > > index 7e7c92cafdbb..cb120b68c0e2 100644 > > --- a/drivers/spi/spi-fsl-espi.c > > +++ b/drivers/spi/spi-fsl-espi.c > > @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi > > *espi, u32 events) > > static irqreturn_t fsl_espi_irq(s32 irq, void *context_data) > > { > > struct fsl_espi *espi =3D context_data; > > - u32 events; > > + u32 events, mask; > > > > spin_lock(&espi->lock); > > > > /* Get interrupt events(tx/rx) */ > > events =3D fsl_espi_read_reg(espi, ESPI_SPIE); > > - if (!events) { > > + mask =3D fsl_espi_read_reg(espi, ESPI_SPIM); > > + if (!(events & mask)) { > > spin_unlock(&espi->lock); > > return IRQ_NONE; > > } > > > > The SPIE register contains the TXCNT so events is pretty much always > > going to have something set. By checking events against what we've > > actually requested interrupts for we don't see any spurious events. > > Usually we shouldn=E2=80=99t receive interrupts we=E2=80=99re not intereste= d in, except the interrupt is shared. This leads to the question: is the SPI interrupt shared with another device on your system? Do you see spurious interrupts with the patch under /proc/irq/(irq)/spurious? > > I've tested this on the T2080RDB and on our custom hardware and it seems > > to resolve the problem. > > > > --000000000000d0723905ae801f03 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri 4. Sep 2020 at 01:58, Chris Packham <Chris.Packham@alliedtelesis.co.nz<= /a>> wrote:


On 1/09/20 6= :14 pm, Nicholas Piggin wrote:

> Excerpts from Chris Packham'= s message of September 1, 2020 11:25 am:

>> On 1/09/20 12:33 a= m, Heiner Kallweit wrote:

>>> On 30.08.2020 23:59, Chris Pa= ckham wrote:

>>>> On 31/08/20 9:41 am, Heiner Kallweit w= rote:

>>>>> On 30.08.2020 23:00, Chris Packham wrote:=

>>>>>> On 31/08/20 12:30 am, Nicholas Piggin wrot= e:

>>>>>>> Excerpts from Chris Packham's me= ssage of August 28, 2020 8:07 am:

>>>>>> <snip&= gt;

>>>>>>

>>>>>>>>= >>>> I've also now seen the RX FIFO not empty error on the = T2080RDB

>>>>>>>>>>>>

>= ;>>>>>>>>>>> fsl_espi ffe110000.spi: Trans= fer done but SPIE_DON isn't set!

>>>>>>>>= ;>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn= 9;t set!

>>>>>>>>>>>> fsl_espi f= fe110000.spi: Transfer done but SPIE_DON isn't set!

>>>= >>>>>>>>> fsl_espi ffe110000.spi: Transfer done = but SPIE_DON isn't set!

>>>>>>>>>>= >> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren= 9;t empty!

>>>>>>>>>>>> fsl_espi= ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32

>>>>= >>>>>>>>

>>>>>>>>>= ;>>> With my current workaround of emptying the RX FIFO. It seems<= br>
>>>>>>>>>>>> survivable. Interes= tingly it only ever seems to be 1 extra byte in the

>>>>= >>>>>>>> RX FIFO and it seems to be after either a = READ_SR or a READ_FSR.

>>>>>>>>>>>&= gt;

>>>>>>>>>>>> fsl_espi ffe110= 000.spi: tx 70

>>>>>>>>>>>> fsl_= espi ffe110000.spi: rx 03

>>>>>>>>>>&g= t;> fsl_espi ffe110000.spi: Extra RX 00

>>>>>>&= gt;>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON = isn't set!

>>>>>>>>>>>> fsl_= espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>>>>>>>>> fsl_espi ffe110000.sp= i: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32

>>>>>>>&= gt;>>>> fsl_espi ffe110000.spi: tx 05

>>>>&g= t;>>>>>>> fsl_espi ffe110000.spi: rx 00

>>= ;>>>>>>>>>> fsl_espi ffe110000.spi: Extra RX = 03

>>>>>>>>>>>> fsl_espi ffe1100= 00.spi: Transfer done but SPIE_DON isn't set!

>>>>&g= t;>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx= /tx fifo's aren't empty!

>>>>>>>>>= ;>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT =3D 32
>>>>>>>>>>>> fsl_espi ffe110000.s= pi: tx 05

>>>>>>>>>>>> fsl_espi = ffe110000.spi: rx 00

>>>>>>>>>>>>= ; fsl_espi ffe110000.spi: Extra RX 03

>>>>>>>&g= t;>>>>

>>>>>>>>>>>>= =C2=A0 =C2=A0 =C2=A0 From all the Micron SPI-NOR datasheets I've got ac= cess to it is

>>>>>>>>>>>> possi= ble to continually read the SR/FSR. But I've no idea why it

>= >>>>>>>>>>> happens some times and not oth= ers.

>>>>>>>>>>> So I think I'v= e got a reproduction and I think I've bisected the problem

>&= gt;>>>>>>>>> to commit 3282a3da25bd ("power= pc/64: Implement soft interrupt replay in

>>>>>>&g= t;>>>> C"). My day is just finishing now so I haven't = applied too much scrutiny

>>>>>>>>>>&g= t; to this result. Given the various rabbit holes I've been down on thi= s

>>>>>>>>>>> issue already I'd= take this information with a good degree of skepticism.

>>>= ;>>>>>>>>

>>>>>>>>&g= t;> OK, so an easy test should be to re-test with a 5.4 kernel.

&= gt;>>>>>>>>> It doesn't have yet the change = you're referring to, and the fsl-espi driver

>>>>>= ;>>>>> is basically the same as in 5.7 (just two small chang= es in 5.7).

>>>>>>>>> There's 6cc0c16= d82f88 and maybe also other interrupt related patches

>>>&g= t;>>>>> around this time that could affect book E, so it'= ;s good if that exact

>>>>>>>>> patch is = confirmed.

>>>>>>>> My confirmation is basic= ally that I can induce the issue in a 5.4 kernel

>>>>>= ;>>> by cherry-picking 3282a3da25bd. I'm also able to "fi= x" the issue in

>>>>>>>> 5.9-rc2 by rev= erting that one commit.

>>>>>>>>

>= >>>>>>> I both cases it's not exactly a clean cher= ry-pick/revert so I also

>>>>>>>> confirmed = the bisection result by building at 3282a3da25bd (which sees

>>= ;>>>>>> the issue) and the commit just before (which does= not).

>>>>>>> Thanks for testing, that confirm= s it well.

>>>>>>>

>>>>>&= gt;> [snip patch]

>>>>>>>

>>>= ;>>>>> I still saw the issue with this change applied. PPC_I= RQ_SOFT_MASK_DEBUG

>>>>>>>> didn't repor= t anything (either with or without the change above).

>>>&g= t;>>> Okay, it was a bit of a shot in the dark. I still can't = see what

>>>>>>> else has changed.

>&= gt;>>>>>

>>>>>>> What would caus= e this, a lost interrupt? A spurious interrupt? Or

>>>>&= gt;>> higher interrupt latency?

>>>>>>>
>>>>>>> I don't think the patch should cause= significantly worse latency,

>>>>>>> (it's= supposed to be a bit better if anything because it doesn't set

= >>>>>>> up the full interrupt frame). But it's pos= sible.

>>>>>> My working theory is that the SPI_DO= N indication is all about the TX

>>>>>> direction = an now that the interrupts are faster we're hitting an error

>= ;>>>>> because there is still RX activity going on. Heiner d= isagrees with my

>>>>>> interpretation of the SPI_= DON indication and the fact that it doesn't

>>>>>= > happen every time does throw doubt on it.

>>>>>&= gt;

>>>>> It's right that the eSPI spec can be in= terpreted that SPI_DON refers to

>>>>> TX only. Howev= er this wouldn't really make sense, because also for RX

>>= >>> we program the frame length, and therefore want to be notified= once the

>>>>> full frame was received. Also practic= al experience shows that SPI_DON

>>>>> is set also af= ter RX-only transfers.

>>>>> Typical SPI NOR use case= is that you write read command + start address,

>>>>>= ; followed by a longer read. If the TX-only interpretation would be right,<= br>
>>>>> we'd always end up with SPI_DON not being s= et.

>>>>>

>>>>>> I can't= really explain the extra RX byte in the fifo. We know how many

>= >>>>> bytes to expect and we pull that many from the fifo so= it's not as if

>>>>>> we're missing an in= terrupt causing us to skip the last byte. I've been

>>>= >>> looking for some kind of off-by-one calculation but again if i= t were

>>>>>> something like that it'd happen = all the time.

>>>>>>

>>>>> M= aybe it helps to know what value this extra byte in the FIFO has. Is it:
>>>>> - a duplicate of the last read byte

>&g= t;>>> - or the next byte (at <end address> + 1)

>&= gt;>>> - or a fixed value, e.g. always 0x00 or 0xff

>>= ;>> The values were up thread a bit but I'll repeat them here
=
>>>>

>>>> fsl_espi ffe110000.spi: tx 70<= br>
>>>> fsl_espi ffe110000.spi: rx 03

>>>&g= t; fsl_espi ffe110000.spi: Extra RX 00

>>>> fsl_espi ffe= 110000.spi: Transfer done but SPIE_DON isn't set!

>>>&g= t; fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't em= pty!

>>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE= _TXCNT =3D 32

>>>> fsl_espi ffe110000.spi: tx 05

= >>>> fsl_espi ffe110000.spi: rx 00

>>>> fsl_= espi ffe110000.spi: Extra RX 03

>>>> fsl_espi ffe110000.= spi: Transfer done but SPIE_DON isn't set!

>>>> fsl_= espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>> fsl_espi ffe110000.spi: SPIE_RXCNT =3D 1, SPIE_TXCNT = =3D 32

>>>> fsl_espi ffe110000.spi: tx 05

>>= ;>> fsl_espi ffe110000.spi: rx 00

>>>> fsl_espi ff= e110000.spi: Extra RX 03

>>>>

>>>>
>>>> The rx 00 Extra RX 03 is a bit concerning. I've o= nly ever seen them with

>>>> either a READ_SR or a READ_= FSR. Never a data read.

>>>>

>>> Just re= membered something about SPIE_DON:

>>> Transfers are always= full duplex, therefore in case of a read the chip

>>> send= s dummy zero's. Having said that in case of a read SPIE_DON means
>>> that the last dummy zero was shifted out.

>>>= ;

>>> READ_SR and READ_FSR are the shortest transfers, 1 by= te out and 1 byte in.

>>> So the issue may have a dependenc= y on the length of the transfer.

>>> However I see no good = explanation so far. You can try adding a delay of

>>> a few= miroseconds between the following to commands in fsl_espi_bufs().

&= gt;>>

>>>=C2=A0 =C2=A0 =C2=A0fsl_espi_write_reg(espi,= ESPI_SPIM, mask);

>>>

>>>=C2=A0 =C2=A0 =C2= =A0/* Prevent filling the fifo from getting interrupted */

>>&= gt;=C2=A0 =C2=A0 =C2=A0spin_lock_irq(&espi->lock);

>>&g= t;

>>> Maybe enabling interrupts and seeing the SPIE_DON in= terrupt are too close.

>> I think this might be heading in the= right direction. Playing about with

>> a delay does seem to m= ake the two symptoms less likely. Although I have

>> to set it= quite high (i.e. msleep(100)) to completely avoid any

>> poss= ibility of seeing either message.

> The patch might replay the in= terrupt a little bit faster, but it would

> be a few microseconds= at most I think (just from improved code).

>

> Would y= ou be able to ftrace the interrupt handler function and see if you

&= gt; can see a difference in number or timing of interrupts? I'm at a bi= t of

> a loss.



I tried ftrace but I really wasn= 9;t sure what I was looking for.

Capturing a "bad" case w= as pretty tricky. But I think I've identified a

fix (I'll s= end it as a proper patch shortly). The gist is



diff --git a/= drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c

index 7e7c92= cafdbb..cb120b68c0e2 100644

--- a/drivers/spi/spi-fsl-espi.c

= +++ b/drivers/spi/spi-fsl-espi.c

@@ -574,13 +574,14 @@ static void f= sl_espi_cpu_irq(struct fsl_espi

*espi, u32 events)

=C2=A0=C2= =A0static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)

=C2= =A0=C2=A0{

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct f= sl_espi *espi =3D context_data;

-=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 u32 events;

+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 u32 events, ma= sk;



=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 spin_lo= ck(&espi->lock);



=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 /* Get interrupt events(tx/rx) */

=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 events =3D fsl_espi_read_reg(espi, ESPI_SPIE);<= br>
-=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!events) {

+=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mask =3D fsl_espi_read_reg(espi, ESPI_SPIM);=

+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!(events & mask)) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 spin_unlock(&espi->lock);

=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 return IRQ_NONE;

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 }



The SPIE register contains the TXCNT so ev= ents is pretty much always

going to have something set. By checking= events against what we've

actually requested interrupts for we= don't see any spurious events.


<= br>
I've tested this on the T2080RDB and on our custom hardware and = it seems

to resolve the problem.



=
--000000000000d0723905ae801f03--