All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bough Chen <haibo.chen@nxp.com>
To: "tharvey@gateworks.com" <tharvey@gateworks.com>
Cc: Linux MMC List <linux-mmc@vger.kernel.org>,
	Marcel Ziswiler <marcel@ziswiler.com>,
	Fabio Estevam <festevam@gmail.com>,
	Schrempf Frieder <frieder.schrempf@kontron.de>,
	Adam Ford <aford173@gmail.com>,
	Lucas Stach <l.stach@pengutronix.de>, Peng Fan <peng.fan@nxp.com>,
	Frank Li <frank.li@nxp.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Shawn Guo <shawnguo@kernel.org>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Sascha Hauer <s.hauer@pengutronix.de>,
	Pengutronix Kernel Team <kernel@pengutronix.de>,
	dl-linux-imx <linux-imx@nxp.com>,
	Cale Collins <ccollins@gateworks.com>
Subject: RE: IMX8MM eMMC CQHCI timeout
Date: Thu, 4 Nov 2021 02:13:18 +0000	[thread overview]
Message-ID: <DB7PR04MB401000F24AAD8C5310AAB7D1908D9@DB7PR04MB4010.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <CAJ+vNU3zKEVz=fHu2hLmEpsQKzinUFW-28Lm=2wSEghjMvQtmw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8581 bytes --]

> -----Original Message-----
> From: Tim Harvey [mailto:tharvey@gateworks.com]
> Sent: 2021年11月4日 0:50
> To: Bough Chen <haibo.chen@nxp.com>
> Cc: Linux MMC List <linux-mmc@vger.kernel.org>; Marcel Ziswiler
> <marcel@ziswiler.com>; Fabio Estevam <festevam@gmail.com>; Schrempf
> Frieder <frieder.schrempf@kontron.de>; Adam Ford <aford173@gmail.com>;
> Lucas Stach <l.stach@pengutronix.de>; Peng Fan <peng.fan@nxp.com>; Frank
> Li <frank.li@nxp.com>; Adrian Hunter <adrian.hunter@intel.com>; Shawn Guo
> <shawnguo@kernel.org>; Ulf Hansson <ulf.hansson@linaro.org>; Sascha
> Hauer <s.hauer@pengutronix.de>; Pengutronix Kernel Team
> <kernel@pengutronix.de>; dl-linux-imx <linux-imx@nxp.com>; Cale Collins
> <ccollins@gateworks.com>
> Subject: Re: IMX8MM eMMC CQHCI timeout
> 
> On Sun, Oct 31, 2021 at 6:57 PM Bough Chen <haibo.chen@nxp.com> wrote:
> >
> > > -----Original Message-----
> > > From: Tim Harvey [mailto:tharvey@gateworks.com]
> > > Sent: 2021年10月30日 4:47
> > > To: Linux MMC List <linux-mmc@vger.kernel.org>; Marcel Ziswiler
> > > <marcel@ziswiler.com>; Fabio Estevam <festevam@gmail.com>; Schrempf
> > > Frieder <frieder.schrempf@kontron.de>; Adam Ford
> > > <aford173@gmail.com>; Bough Chen <haibo.chen@nxp.com>; Lucas Stach
> > > <l.stach@pengutronix.de>; Peng Fan <peng.fan@nxp.com>; Frank Li
> > > <frank.li@nxp.com>
> > > Cc: Adrian Hunter <adrian.hunter@intel.com>; Shawn Guo
> > > <shawnguo@kernel.org>; Ulf Hansson <ulf.hansson@linaro.org>; Sascha
> > > Hauer <s.hauer@pengutronix.de>; Pengutronix Kernel Team
> > > <kernel@pengutronix.de>; dl-linux-imx <linux-imx@nxp.com>; Cale
> > > Collins <ccollins@gateworks.com>
> > > Subject: IMX8MM eMMC CQHCI timeout
> > >
> > > Greetings,
> > >
> > > I've encountered the following MMC CQHCI timeout message a couple of
> > > times now on IMX8MM boards with eMMC with a 5.10 based kernel:
> > >
> > > [  224.356283] mmc2: cqhci: ============ CQHCI REGISTER DUMP
> > > ===========
> > > [  224.362764] mmc2: cqhci: Caps:      0x0000310a | Version:
> > > 0x00000510
> > > [  224.369250] mmc2: cqhci: Config:    0x00001001 | Control:
> 0x00000000
> > > [  224.375726] mmc2: cqhci: Int stat:  0x00000000 | Int enab:
> 0x00000006
> > > [  224.382197] mmc2: cqhci: Int sig:   0x00000006 | Int Coal:
> 0x00000000
> > > [  224.388665] mmc2: cqhci: TDL base:  0x8003f000 | TDL up32:
> 0x00000000
> > > [  224.395129] mmc2: cqhci: Doorbell:  0xbf01dfff | TCN:
> 0x00000000
> > > [  224.401598] mmc2: cqhci: Dev queue: 0x00000000 | Dev Pend:
> 0x08000000
> > > [  224.408064] mmc2: cqhci: Task clr:  0x00000000 | SSC1:
> 0x00011000
> > > [  224.414532] mmc2: cqhci: SSC2:      0x00000001 | DCMD rsp:
> > > 0x00000800
> > > [  224.420997] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:
> > > 0x00000000
> > > [  224.427467] mmc2: cqhci: Resp idx:  0x0000000d | Resp arg:
> > > 0x00000000 [  224.433934] mmc2: sdhci: ============ SDHCI REGISTER
> > > DUMP =========== [  224.440404] mmc2: sdhci: Sys addr:  0x7c722000
> | Version:
> > > 0x00000002 [  224.446877] mmc2: sdhci: Blk size:  0x00000200 | Blk
cnt:
> > > 0x00000020 [  224.453346] mmc2: sdhci: Argument:  0x00018000 | Trn
> > > mode: 0x00000023
> > > [  224.459811] mmc2: sdhci: Present:   0x01f88008 | Host ctl:
> 0x00000030
> > > [  224.466281] mmc2: sdhci: Power:     0x00000002 | Blk gap:
> > > 0x00000080
> > > [  224.472752] mmc2: sdhci: Wake-up:   0x00000008 | Clock:
> > > 0x0000000f
> > > [  224.479225] mmc2: sdhci: Timeout:   0x0000008f | Int stat:
> 0x00000000
> > > [  224.485690] mmc2: sdhci: Int enab:  0x107f4000 | Sig enab:
> > > 0x107f4000 [  224.492161] mmc2: sdhci: ACmd stat: 0x00000000 | Slot
int:
> 0x00000502
> > > [  224.498628] mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:
> > > 0x8000b407
> > > [  224.505097] mmc2: sdhci: Cmd:       0x00000d1a | Max curr:
> 0x00ffffff
> > > [  224.511575] mmc2: sdhci: Resp[0]:   0x00000000 | Resp[1]:
> 0xffc003ff
> > > [  224.518043] mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00d07f01
> > > [  224.524512] mmc2: sdhci: Host ctl2: 0x00000088 [  224.528986]
> mmc2:
> > > sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0xfe179020 [  224.535451]
> > > mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
> ==== [
> > > 224.543052] mmc2: sdhci-esdhc-imx: cmd debug status:  0x2120 [
> > > 224.548740] mmc2: sdhci-esdhc-imx: data debug status:  0x2200 [
> > > 224.554510] mmc2: sdhci-esdhc-imx: trans debug status:  0x2300 [
> > > 224.560368] mmc2: sdhci-esdhc-imx: dma debug status:  0x2400 [
> > > 224.566054] mmc2: sdhci-esdhc-imx: adma debug status:  0x2510 [
> > > 224.571826] mmc2: sdhci-esdhc-imx: fifo debug status:  0x2680 [
> > > 224.577608] mmc2: sdhci-esdhc-imx: async fifo debug status:  0x2750
> > > [  224.583900] mmc2: sdhci:
> > > ============================================
> > >
> > > I don't know how to make the issue occur, both times it occured
> > > simply
> > reading
> > > a file in the rootfs ext4 fs on the emmc.
> > >
> > > Some research shows:
> > > -
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fco
> > > mmu
> > >
> nity.nxp.com%2Ft5%2Fi-MX-Processors%2FThe-issues-on-quot-mmc0-cqhci-
> > > tim
> > >
> eout-for-tag-0-quot%2Fm-p%2F993779&amp;data=04%7C01%7Chaibo.chen%4
> > >
> 0nxp.com%7C1dc0981634f5460a779808d99b1d5a88%7C686ea1d3bc2b4c6fa9
> > >
> 2cd99c5c301635%7C0%7C0%7C637711372651089473%7CUnknown%7CTWFp
> > >
> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > >
> 6Mn0%3D%7C1000&amp;sdata=ITcs7%2FMy%2F1Vx1TMB2VlaY4QhibKuSFBD
> > > 6UZhzVFl%2FqY%3D&amp;reserved=0
> > > -
> > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit
> > > .torad%2F&amp;data=04%7C01%7Chaibo.chen%40nxp.com%7C281983c39
> 6a442e7
> > >
> 8d2108d99ee9f858%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6
> 37715
> > >
> 549993442194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> IjoiV2l
> > >
> uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CyMZIUVjzXj
> 2tD3
> > > MfO4kUAOXr5SazgtJSRlhro9wOvU%3D&amp;reserved=0
> > >
> ex.com%2Fcgit%2Flinux-toradex.git%2Fcommit%2F%3Fh%3Dtoradex_5.4-2.3.
> > > x
> -imx%26id%3Dfd33531be843566c59a5fc655f204bbd36d7f3c6&amp;data=04%
> > >
> 7C01%7Chaibo.chen%40nxp.com%7C1dc0981634f5460a779808d99b1d5a88%
> > >
> 7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637711372651089473
> > > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> iLCJ
> > >
> BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xaamzPb2CdW6YDzW
> > > g8uBb0PjomkoWAziu5qglvMbT2I%3D&amp;reserved=0
> > >
> > > I'm not clear if this info is up-to-date. The NXP 5.4 kernel did not
> > enable this
> > > feature but if I'm not mistaken CQHCI support itself didn't land in
> > mainline until
> > > a later kernel so it would make sense it was not enabled at that
> > > time. I
> > do see
> > > the NXP 5.10 kernels have this enabled so I'm curious if it is an
> > > issue
> > there.
> > >
> > > Any other IMX8MM or other SoC users know what this could be about or
> > > what
> > I
> > > could do for a test to try to reproduce it so I can see if it occurs
> > > in
> > other kernel
> > > versions?
> >
> > Hi Tim,
> >
> > I'm debugging this issue those days, but unfortunately, still not find
> > the root cause.
> > The register value of Doorbell, Dev Queue, Dev Pend seems abnormal.
> > This issue happens on all i.MX SoC which support cmdq feature when cpu
> > loading is high.. Now I lack a mmc logic analyzer, make it not easy to
> > debug this issue. So stll need some time. Sorry about that.
> > If you want to make mmc work stable, you can disable the cmdq as a
> > workaround.
> >
> > Best Regards
> > Haibo Chen
> 
> Haibo,
> 
> Thanks for the information. Do you know how to easily reproduce it
reliably for
> testing?

Still not, can only meet this issue randomly after few hours stress test
under high CPU loading.

My next step is :
1, find a way to reproduce this issue easily
2, get emmc logic analyzer.


> 
> I have tried the following on an eMMC filesystem:
> stress --cpu 32 --io 32 &
> dd if=/dev/zero of=foo bs=1M count=1000 & dd if=/dev/zero of=foo bs=1M
> count=1000 & rm foo
> 
> I'm unable to reproduce the issue that way, and it has only happened
randomly
> once or twice.
> 
> Perhaps we should disable CMDQ for now until you can sort this out? I can
> submit a patch for that.

Yes, please.

Best Regards
Haibo Chen
> 
> Best regards,
> 
> Tim

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 9551 bytes --]

      parent reply	other threads:[~2021-11-04  2:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29 20:47 IMX8MM eMMC CQHCI timeout Tim Harvey
2021-11-01  1:57 ` Bough Chen
2021-11-03 16:49   ` Tim Harvey
2021-11-03 16:54     ` [PATCH] mmc: sdhci-esdhc-imx: disable CMDQ support Tim Harvey
2021-11-03 17:12       ` Fabio Estevam
2021-11-04  2:06       ` Bough Chen
2021-11-04  2:06         ` Bough Chen
2021-11-05  7:56       ` Adrian Hunter
2021-11-15 14:54       ` Ulf Hansson
2021-11-03 17:21     ` IMX8MM eMMC CQHCI timeout Marcel Ziswiler
2021-11-04  2:13     ` Bough Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB7PR04MB401000F24AAD8C5310AAB7D1908D9@DB7PR04MB4010.eurprd04.prod.outlook.com \
    --to=haibo.chen@nxp.com \
    --cc=adrian.hunter@intel.com \
    --cc=aford173@gmail.com \
    --cc=ccollins@gateworks.com \
    --cc=festevam@gmail.com \
    --cc=frank.li@nxp.com \
    --cc=frieder.schrempf@kontron.de \
    --cc=kernel@pengutronix.de \
    --cc=l.stach@pengutronix.de \
    --cc=linux-imx@nxp.com \
    --cc=linux-mmc@vger.kernel.org \
    --cc=marcel@ziswiler.com \
    --cc=peng.fan@nxp.com \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=tharvey@gateworks.com \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.