linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Dahl <ada@thorsis.com>
To: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Alexander Dahl <ada@thorsis.com>,
	linux-mtd@lists.infradead.org,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	linux-kernel@vger.kernel.org
Subject: Re: mtd: nand: raw: Possible bug in nand_onfi_detect()?
Date: Thu, 7 Mar 2024 17:02:16 +0100	[thread overview]
Message-ID: <20240307-pantry-deceit-78ce20f47899@thorsis.com> (raw)
In-Reply-To: <20240306164831.29eed907@xps-13>

Hello Miquel,

thanks for looking into this, see my remarks below.

Am Wed, Mar 06, 2024 at 04:48:31PM +0100 schrieb Miquel Raynal:
> Hi Alexander,
> 
> ada@thorsis.com wrote on Wed, 6 Mar 2024 15:36:04 +0100:
> 
> > Hello everyone,
> > 
> > I think I found a bug in nand_onfi_detect() which was introduced with
> > commit c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page
> > read to constraint controllers") back in 2020.
> 
> Interesting. I don't think this patch did broke anything, as
> constrained controllers would just not support the read_data_op() call
> anyway.
> 
> That being said, I don't see why the atmel controller would
> refuse this operation, as it is supposed to support all
> operations without limitation. This is one of the three issues
> you have, that probably needs fixing.

I found a flaw in my debug messages hiding the underlying issue for
this.  I'm afraid this is another bug introduced by you with commit
9f820fc0651c ("mtd: rawnand: Check the data only read pattern only
once").  See this line in rawnand_check_data_only_read_support():

    if (!nand_read_data_op(chip, NULL, SZ_512, true, true))

This leads to nand_read_data_op() returning -EINVAL, because it checks
if its second argument is non-NULL.

I guess not only the atmel nand controller is affected here, but _all_
nand controllers?  The flag can never be set, and so use_datain is
false here?

> > Background on how I found this: I'm currently struggling getting raw
> > nand flash access to fly with an at91 sam9x60 SoC and a S34ML02G1
> > Spansion SLC raw NAND flash on a custom board.  The setup is
> > comparable to the sam9x60 curiosity board and can be reproduced with
> > that one.
> > 
> > NAND flash on sam9x60 curiosity board works fine with what is in
> > mainline Linux kernel.  However after removing the line 'rb-gpios =
> > <&pioD 5 GPIO_ACTIVE_HIGH>;' from at91-sam9x60_curiosity.dts all data
> > read from the flash appears to be zeros only.  (I did not add that
> > line to the dts of my custom board first, this is how I stumbled over
> > this.)
> > 
> > I have no explanation for that behaviour, it should work without R/B#
> > by reading the status register, maybe we investigate that
> > in depth later.
> 
> I don't see why at a first look. The default is "no RB" if no property
> is given in the DT so it should work.

Correct, nand_soft_waitrdy() is used in that case.

> Tracing the wait ready function calls might help.

Did that already.  On each call here the status register read contains
E0h and nand_soft_waitrdy() returns without error, because the
NAND_STATUS_READY flag is set.  It just looks fine, although it is
not afterwards.

> >  However those all zeros data reads happens when
> > reading the ONFI param page as well es data read from OOB/spare area
> > later and I bet it's the same with usual data.
> 
> Reading data without observing tWB + tR may lead to this.

I already suspected some timing issue.  Deeper investigation will have
to wait until we soldered some wires to the chip and connect a logic
analyzer however.  At least that's the plan, but this will have to
wait some days until after I finished some other tasks.

> > This read error reveals a bug in nand_onfi_detect().  After setting
> > up some things there's this for loop:
> > 
> >     for (i = 0; i < ONFI_PARAM_PAGES; i++) {
> > 
> > For i = 0 nand_read_param_page_op() is called and in my case all zeros
> > are returned and thus the CRC calculated does not match the all zeros
> > CRC read.  So the usual break on successful reading the first page is
> > skipped and for reading the second page nand_change_read_column_op()
> > is called.  I think that one always fails on this line:
> > 
> >     if (offset_in_page + len > mtd->writesize + mtd->oobsize) {
> > 
> > Those variables contain the following values:
> > 
> >     offset_in_page: 256
> >     len: 256
> >     mtd->writesize: 0
> >     mtd->oobsize: 0
> 
> Indeed. We probably need some kind of extra check that does not perform
> the if clause above if !mtd->writesize.
> 
> > The condition is true and nand_change_read_column_op() returns with
> > -EINVAL, because mtd->writesize and mtd->oobsize are not set yet in
> > that code path.  Those are probably initialized later, maybe with
> > parameters read from that ONFI param page?
> > 
> > Returning with error from nand_change_read_column_op() leads to
> > jumping out of nand_onfi_detect() early, and no ONFI param page is
> > evaluated at all, although the second or third page could be intact.
> > 
> > I guess this would also fail with any other reason for not matching
> > CRCs in the first page, but I have not faulty NAND flash chip to
> > confirm that.
> 
> Thanks for the whole report, it is interesting and should lead to fixes:
> - why does the controller refuses the datain op?

See above.

> - why nand_soft_waitrdy is not enough?

I don't know.  That's one reason I asked here.

> - changing the condition in nand_change_read_column_op()
> 
> Can you take care of these?

The last one probably after in depth reading of the code again, unsure
for the other two.

Greets
Alex


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2024-03-07 16:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06 14:36 mtd: nand: raw: Possible bug in nand_onfi_detect()? Alexander Dahl
2024-03-06 15:48 ` Miquel Raynal
2024-03-07 16:02   ` Alexander Dahl [this message]
2024-03-07 17:19     ` Miquel Raynal
2024-03-25  9:09       ` Miquel Raynal
2024-03-25  9:59         ` Alexander Dahl
2024-05-07 16:08 ` Miquel Raynal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240307-pantry-deceit-78ce20f47899@thorsis.com \
    --to=ada@thorsis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).