All of lore.kernel.org
 help / color / mirror / Atom feed
* NAND BBT corruption on MPC83xx
@ 2011-06-17 20:54 Matthew L. Creech
  2011-06-17 21:34 ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew L. Creech @ 2011-06-17 20:54 UTC (permalink / raw)
  To: linuxppc-dev

Hi, I posted this on the Linux-MTD list but haven't gotten any hits.
Since it looks like it could be MPC83xx-specific, I'm reposting here.
Rick Johnson noted a problem in fsl_elbc_nand.c back in May which
might be related:

http://lists.infradead.org/pipermail/linux-mtd/2011-May/035372.html


We've gotten some devices back from the field which all suffer from
this same problem on bootup when attaching UBI (these messages are
from U-Boot):

...
Bad block table found at page 524224, version 0x01
Bad block table found at page 524160, version 0x01
nand_bbt: ECC error while reading bad block table
...(long stream of bogus bad blocks)...
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI error: vtbl_check: volume table check failed: record 0, error 9
UBI error: ubi_init: cannot attach mtd1
UBI error: ubi_init: UBI error: cannot initialize UBI, error -22
UBI init error -22

Full console dumps from 2 devices are here:

http://mcreech.com/work/bbt-ecc-error.txt
http://mcreech.com/work/bbt-ecc-error2.txt

Another device encountered a slightly different error, but which I
assume is due to the same underlying problem:

UBI error: init_volumes: not enough PEBs, required 8061, available 8059
UBI error: ubi_wl_init_scan: no enough physical eraseblocks (-2, need 1)

A full dump from that one is here:

http://mcreech.com/work/bbt-ecc-error3.txt

Are there any known issues that could cause the BBT to
become corrupt like this?

I noticed that the reported bad blocks were all aligned at multiples
of 0x80000 (with one exception).  Dump #1 shows:
 - one BBT with lots of bytes that have their lower 1 or 2 bits
un-set (e.g. 0xfe instead of 0xff): this explains all the
each-4th-block alignment.
 - the other BBT shows only one factory-marked bad block at
0x062e0000, which is presumably correct.  This is preserved in the
bogus BBT, and is the only non-0x80000-aligned bad block in the table.
 - Only the first 1024 bytes of the BBT contain bogus info - the
latter half of the BBT is all correct

It seems like the original BBT somehow had 0-2 bits corrupted at the
low end of each of its bytes, either while in memory or when the BBT
was written to NAND.  Any ideas on what I can do to isolate the
problem?  Thanks in advance!

More info on this board:
- MPC 8313 SoC
- 1GB Samsung NAND flash (K9K8G08U0B)
- Linux 2.6.31
- U-Boot 2009.06

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
  2011-06-17 20:54 NAND BBT corruption on MPC83xx Matthew L. Creech
@ 2011-06-17 21:34 ` Scott Wood
  2011-06-18 17:55     ` Mike Hench
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Scott Wood @ 2011-06-17 21:34 UTC (permalink / raw)
  To: Matthew L. Creech; +Cc: linux-mtd, linuxppc-dev

On Fri, 17 Jun 2011 16:54:27 -0400
"Matthew L. Creech" <mlcreech@gmail.com> wrote:

> Hi, I posted this on the Linux-MTD list but haven't gotten any hits.
> Since it looks like it could be MPC83xx-specific, I'm reposting here.
> Rick Johnson noted a problem in fsl_elbc_nand.c back in May which
> might be related:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2011-May/035372.html

It seems that the generic code always passes -1 with PAGEPROG, and only
provides the actual page address on SEQIN.

I don't think the ECC readback is needed, and the fact that it looks like
it has always been broken would seem to confirm that.  It's broken in
other ways, too -- it assumes a particular ECC layout.  Let's get rid of it.

As for the corruption, could it be degradation from repeated reads of that
one page?

> More info on this board:
> - MPC 8313 SoC
> - 1GB Samsung NAND flash (K9K8G08U0B)
> - Linux 2.6.31
> - U-Boot 2009.06

Hmm, 2.6.31... it's probably not related to this problem, but you
should cherry pick b3a70f0bc32d1b70584bcaa6019fa4260b0da92e and
476459a6cf46d20ec73d9b211f3894ced5f9871e.

-Scott

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: NAND BBT corruption on MPC83xx
  2011-06-17 21:34 ` Scott Wood
@ 2011-06-18 17:55     ` Mike Hench
  2011-06-20 15:20   ` Matthew L. Creech
  2011-07-05 19:58     ` Matthew L. Creech
  2 siblings, 0 replies; 19+ messages in thread
From: Mike Hench @ 2011-06-18 17:55 UTC (permalink / raw)
  To: Scott Wood, Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd

Scott Wood wrote:
> As for the corruption, could it be degradation from repeated reads of
that
> one page?

Read Disturb. I Did not know SLC did that.
It just takes 10x as long as MLC, on the order of a million reads.
Supposedly erasing the block fixes it.
It is not a permanent damage thing.
I was seeing ~9 hours before failure with heavy writes.
~4GByte/hour =3D 2M pages, total ~18 million reads before errors in that
last block showed up.

Cool. Now we know.
Thanks.

Mike Hench

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: NAND BBT corruption on MPC83xx
@ 2011-06-18 17:55     ` Mike Hench
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Hench @ 2011-06-18 17:55 UTC (permalink / raw)
  To: Scott Wood, Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd

Scott Wood wrote:
> As for the corruption, could it be degradation from repeated reads of
that
> one page?

Read Disturb. I Did not know SLC did that.
It just takes 10x as long as MLC, on the order of a million reads.
Supposedly erasing the block fixes it.
It is not a permanent damage thing.
I was seeing ~9 hours before failure with heavy writes.
~4GByte/hour = 2M pages, total ~18 million reads before errors in that
last block showed up.

Cool. Now we know.
Thanks.

Mike Hench

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: NAND BBT corruption on MPC83xx
  2011-06-18 17:55     ` Mike Hench
@ 2011-06-20 11:22       ` Atlant Schmidt
  -1 siblings, 0 replies; 19+ messages in thread
From: Atlant Schmidt @ 2011-06-20 11:22 UTC (permalink / raw)
  To: 'Mike Hench', Scott Wood, Matthew L. Creech
  Cc: linux-mtd, linuxppc-dev

Mike:

> It is not a permanent damage thing.

  A "read disturb" does no permanent damage to the chip
  but if the read disturb event involves more bits than
  can be corrected by your ECC code, it can do permanent
  damage to the *DATA* you've stored in that block.

  For this reason, a good flash management system manages
  to at least occasionally read through *ALL* of the in-use
  blocks in the device so that single-bit errors can be
  scrubbed out (read and successfully corrected) before
  an adjacent bit in the block also fails (which would
  eventually lead to a multi-bit error that might be
  beyond the ability to be corrected by the ECC).

  As far as I know (and I'm sure the list will correct
  me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any
  Linux layer provides this routine scrubbing; you have
  to code it up yourself, probably by accessing the
  device at the UBI (underlying block device/LEB) layer.

                        Atlant

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists=
.infradead.org] On Behalf Of Mike Hench
Sent: Saturday, June 18, 2011 13:55
To: Scott Wood; Matthew L. Creech
Cc: linuxppc-dev@lists.ozlabs.org; linux-mtd@lists.infradead.org
Subject: RE: NAND BBT corruption on MPC83xx

Scott Wood wrote:
> As for the corruption, could it be degradation from repeated reads of
that
> one page?

Read Disturb. I Did not know SLC did that.
It just takes 10x as long as MLC, on the order of a million reads.
Supposedly erasing the block fixes it.
It is not a permanent damage thing.
I was seeing ~9 hours before failure with heavy writes.
~4GByte/hour =3D 2M pages, total ~18 million reads before errors in that
last block showed up.

Cool. Now we know.
Thanks.

Mike Hench



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

This e-mail and the information, including any attachments, it contains are=
 intended to be a confidential communication only to the person or entity t=
o whom it is addressed and may contain information that is privileged. If t=
he reader of this message is not the intended recipient, you are hereby not=
ified that any dissemination, distribution or copying of this communication=
 is strictly prohibited. If you have received this communication in error, =
please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: NAND BBT corruption on MPC83xx
@ 2011-06-20 11:22       ` Atlant Schmidt
  0 siblings, 0 replies; 19+ messages in thread
From: Atlant Schmidt @ 2011-06-20 11:22 UTC (permalink / raw)
  To: 'Mike Hench', Scott Wood, Matthew L. Creech
  Cc: linux-mtd, linuxppc-dev

Mike:

> It is not a permanent damage thing.

  A "read disturb" does no permanent damage to the chip
  but if the read disturb event involves more bits than
  can be corrected by your ECC code, it can do permanent
  damage to the *DATA* you've stored in that block.

  For this reason, a good flash management system manages
  to at least occasionally read through *ALL* of the in-use
  blocks in the device so that single-bit errors can be
  scrubbed out (read and successfully corrected) before
  an adjacent bit in the block also fails (which would
  eventually lead to a multi-bit error that might be
  beyond the ability to be corrected by the ECC).

  As far as I know (and I'm sure the list will correct
  me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any
  Linux layer provides this routine scrubbing; you have
  to code it up yourself, probably by accessing the
  device at the UBI (underlying block device/LEB) layer.

                        Atlant

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Mike Hench
Sent: Saturday, June 18, 2011 13:55
To: Scott Wood; Matthew L. Creech
Cc: linuxppc-dev@lists.ozlabs.org; linux-mtd@lists.infradead.org
Subject: RE: NAND BBT corruption on MPC83xx

Scott Wood wrote:
> As for the corruption, could it be degradation from repeated reads of
that
> one page?

Read Disturb. I Did not know SLC did that.
It just takes 10x as long as MLC, on the order of a million reads.
Supposedly erasing the block fixes it.
It is not a permanent damage thing.
I was seeing ~9 hours before failure with heavy writes.
~4GByte/hour = 2M pages, total ~18 million reads before errors in that
last block showed up.

Cool. Now we know.
Thanks.

Mike Hench



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
  2011-06-17 21:34 ` Scott Wood
  2011-06-18 17:55     ` Mike Hench
@ 2011-06-20 15:20   ` Matthew L. Creech
  2011-07-05 19:58     ` Matthew L. Creech
  2 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-06-20 15:20 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, mhench

On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood@freescale.com> wrote:
>
> As for the corruption, could it be degradation from repeated reads of that
> one page?
>

Could be.  I think Mike's theory was that the -1 page_addr sort of
"wrapped around", and caused us to read in the last block on flash
each time NAND_CMD_PAGEPROG was performed.  So with a lot of writes
happening, we could end up with a BBT that looks like this.

That makes sense I guess, since set_addr() in fsl_elbc_nand.c uses
page_addr to set FBAR.  I don't see anything about it in the manual,
but if FBAR wraps beyond the end of the chip, maybe the bits that
don't make sense are simply ignored.  (In which case we should
probably add a check in set_addr() to prevent anything like this in
the future)

In theory I should be able to prove it out by running 2 devices in
parallel - one with that block of code still there, and one with it
removed.  If the former device sees bit-flips in the BBT and the
latter one doesn't, we'll be sure of the culprit.  I'll try this and
come back with the results.

Thanks!

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: NAND BBT corruption on MPC83xx
  2011-06-20 11:22       ` Atlant Schmidt
  (?)
@ 2011-06-23  8:31       ` Artem Bityutskiy
  -1 siblings, 0 replies; 19+ messages in thread
From: Artem Bityutskiy @ 2011-06-23  8:31 UTC (permalink / raw)
  To: Atlant Schmidt
  Cc: Scott Wood, linuxppc-dev, linux-mtd, Matthew L. Creech,
	'Mike Hench'

On Mon, 2011-06-20 at 07:22 -0400, Atlant Schmidt wrote:
> 
>   As far as I know (and I'm sure the list will correct
>   me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any
>   Linux layer provides this routine scrubbing; you have
>   to code it up yourself, probably by accessing the
>   device at the UBI (underlying block device/LEB) layer. 

UBI will scrub all LEBs with bit-flips once they are read.
But if you have bit-flips in an LEB and it is never read, it will never
be scrubbed. And erasures of the neighboring PEBs may turn bit-flips
into hard errors.

To force scrubbing, the easies way is to just read all volumes, like

dd if=/dev/ubi0_i of=/dev/null bs=4096

for each i.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
  2011-06-17 21:34 ` Scott Wood
@ 2011-07-05 19:58     ` Matthew L. Creech
  2011-06-20 15:20   ` Matthew L. Creech
  2011-07-05 19:58     ` Matthew L. Creech
  2 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 19:58 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev

On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood@freescale.com> wrote=
:
>
> It seems that the generic code always passes -1 with PAGEPROG, and only
> provides the actual page address on SEQIN.
>
> I don't think the ECC readback is needed, and the fact that it looks like
> it has always been broken would seem to confirm that. =A0It's broken in
> other ways, too -- it assumes a particular ECC layout. =A0Let's get rid o=
f it.
>
> As for the corruption, could it be degradation from repeated reads of tha=
t
> one page?
>

I modified nanddump to do repeated reads, and compare the data
obtained from the first iteration with that obtained later (to detect
bit-flips).  I tried 3 different variations:

- one which reads the first page (2k) of the last block
- one which reads the second page (2k) of the last block
- one which reads the entire last block (128k), just for comparison

As I understand it, read-disturb would primarily come into play when
the second page is read, since it's adjacent to the first page (please
correct me if I'm wrong there).  Anyway, all 3 of these tests were run
for at least 50 million read cycles, with no bit-flips detected.  So
I'm somewhat doubtful that this is the cause of the BBT corruption
I've been seeing.

=3D=3D=3D=3D

Separately, I set up 2 test devices to run while I was away last week.
 One of them contained 2 patches:

- Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
- Adam Thomson's patch
(http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
which initializes oob_poi correctly

Upon my return, the device with these patches saw no problems at all,
and had no additional bad blocks.  The device without these patches
had some 200+ blocks which had been newly marked as bad in the BBT
over the course of 10 days.  After rebooting, this latter device then
failed to boot, as shown here:

http://mcreech.com/work/bbt-ecc-error4.txt

I'm currently running another test to verify which of the two patches
actually fixed this problem (which might take a few days), but it
seems like removing that block of code in fsl_elbc_nand.c is a good
idea.

--=20
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
@ 2011-07-05 19:58     ` Matthew L. Creech
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 19:58 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev

On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood@freescale.com> wrote:
>
> It seems that the generic code always passes -1 with PAGEPROG, and only
> provides the actual page address on SEQIN.
>
> I don't think the ECC readback is needed, and the fact that it looks like
> it has always been broken would seem to confirm that.  It's broken in
> other ways, too -- it assumes a particular ECC layout.  Let's get rid of it.
>
> As for the corruption, could it be degradation from repeated reads of that
> one page?
>

I modified nanddump to do repeated reads, and compare the data
obtained from the first iteration with that obtained later (to detect
bit-flips).  I tried 3 different variations:

- one which reads the first page (2k) of the last block
- one which reads the second page (2k) of the last block
- one which reads the entire last block (128k), just for comparison

As I understand it, read-disturb would primarily come into play when
the second page is read, since it's adjacent to the first page (please
correct me if I'm wrong there).  Anyway, all 3 of these tests were run
for at least 50 million read cycles, with no bit-flips detected.  So
I'm somewhat doubtful that this is the cause of the BBT corruption
I've been seeing.

====

Separately, I set up 2 test devices to run while I was away last week.
 One of them contained 2 patches:

- Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
- Adam Thomson's patch
(http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
which initializes oob_poi correctly

Upon my return, the device with these patches saw no problems at all,
and had no additional bad blocks.  The device without these patches
had some 200+ blocks which had been newly marked as bad in the BBT
over the course of 10 days.  After rebooting, this latter device then
failed to boot, as shown here:

http://mcreech.com/work/bbt-ecc-error4.txt

I'm currently running another test to verify which of the two patches
actually fixed this problem (which might take a few days), but it
seems like removing that block of code in fsl_elbc_nand.c is a good
idea.

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] mtd: eLBC NAND: remove bogus ECC read-back
  2011-07-05 19:58     ` Matthew L. Creech
  (?)
@ 2011-07-05 19:59     ` Matthew L. Creech
  2011-07-05 20:15       ` Scott Wood
  -1 siblings, 1 reply; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 19:59 UTC (permalink / raw)
  To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench

From: Mike Hench <mhench@elutions.com>

The eLBC NAND driver currently follows up each program/write operation with a
read-back of the page, in order to [ostensibly] fill in ECC data for the
caller. However, the page address used for this read is always -1, so the read
will never work correctly.  Remove this useless (and potentially problematic)
block of code.

Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
---
 drivers/mtd/nand/fsl_elbc_nand.c |   17 -----------------
 1 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index 0bb254c..050a2fc 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -455,23 +455,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 
 		fsl_elbc_run_command(mtd);
 
-		/* Read back the page in order to fill in the ECC for the
-		 * caller.  Is this really needed?
-		 */
-		if (full_page && elbc_fcm_ctrl->oob_poi) {
-			out_be32(&lbc->fbcr, 3);
-			set_addr(mtd, 6, page_addr, 1);
-
-			elbc_fcm_ctrl->read_bytes = mtd->writesize + 9;
-
-			fsl_elbc_do_read(chip, 1);
-			fsl_elbc_run_command(mtd);
-
-			memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6,
-				&elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3);
-			elbc_fcm_ctrl->index += 3;
-		}
-
 		elbc_fcm_ctrl->oob_poi = NULL;
 		return;
 	}
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mtd: eLBC NAND: remove bogus ECC read-back
  2011-07-05 19:59     ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech
@ 2011-07-05 20:15       ` Scott Wood
  2011-07-05 22:35         ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2011-07-05 20:15 UTC (permalink / raw)
  To: Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd, mhench, rick22

On Tue, 5 Jul 2011 15:59:57 -0400
"Matthew L. Creech" <mlcreech@gmail.com> wrote:

> From: Mike Hench <mhench@elutions.com>
> 
> The eLBC NAND driver currently follows up each program/write operation with a
> read-back of the page, in order to [ostensibly] fill in ECC data for the
> caller. However, the page address used for this read is always -1, so the read
> will never work correctly.  Remove this useless (and potentially problematic)
> block of code.
> 
> Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
> ---
>  drivers/mtd/nand/fsl_elbc_nand.c |   17 -----------------
>  1 files changed, 0 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
> index 0bb254c..050a2fc 100644
> --- a/drivers/mtd/nand/fsl_elbc_nand.c
> +++ b/drivers/mtd/nand/fsl_elbc_nand.c
> @@ -455,23 +455,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
>  
>  		fsl_elbc_run_command(mtd);
>  
> -		/* Read back the page in order to fill in the ECC for the
> -		 * caller.  Is this really needed?
> -		 */
> -		if (full_page && elbc_fcm_ctrl->oob_poi) {
> -			out_be32(&lbc->fbcr, 3);
> -			set_addr(mtd, 6, page_addr, 1);
> -
> -			elbc_fcm_ctrl->read_bytes = mtd->writesize + 9;
> -
> -			fsl_elbc_do_read(chip, 1);
> -			fsl_elbc_run_command(mtd);
> -
> -			memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6,
> -				&elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3);
> -			elbc_fcm_ctrl->index += 3;
> -		}
> -
>  		elbc_fcm_ctrl->oob_poi = NULL;
>  		return;
>  	}

All references to elbc_fcm_ctrl->oob_poi (not chip->oob_poi) can be removed
now.

-Scott

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi
  2011-07-05 20:15       ` Scott Wood
@ 2011-07-05 22:35         ` Matthew L. Creech
  2011-07-05 23:01           ` Scott Wood
  2011-07-05 23:14           ` [PATCH v3] " Matthew L. Creech
  0 siblings, 2 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 22:35 UTC (permalink / raw)
  To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench

From: Mike Hench <mhench@elutions.com>

The eLBC NAND driver currently follows up each program/write operation with a
read-back of the page, in order to [ostensibly] fill in ECC data for the
caller. However, the page address used for this read is always -1, so the read
will never work correctly.  Remove this useless (and potentially problematic)
block of code.

v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the
only place it was actually used.

Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
---
 drivers/mtd/nand/fsl_elbc_nand.c |   25 -------------------------
 1 files changed, 0 insertions(+), 25 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index 0bb254c..5e4fbf5 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -75,7 +75,6 @@ struct fsl_elbc_fcm_ctrl {
 	unsigned int use_mdr;    /* Non zero if the MDR is to be set      */
 	unsigned int oob;        /* Non zero if operating on OOB data     */
 	unsigned int counter;	 /* counter for the initializations	  */
-	char *oob_poi;           /* Place to write ECC after read back    */
 };
 
 /* These map to the positions used by the FCM hardware ECC generator */
@@ -454,25 +453,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		}
 
 		fsl_elbc_run_command(mtd);
-
-		/* Read back the page in order to fill in the ECC for the
-		 * caller.  Is this really needed?
-		 */
-		if (full_page && elbc_fcm_ctrl->oob_poi) {
-			out_be32(&lbc->fbcr, 3);
-			set_addr(mtd, 6, page_addr, 1);
-
-			elbc_fcm_ctrl->read_bytes = mtd->writesize + 9;
-
-			fsl_elbc_do_read(chip, 1);
-			fsl_elbc_run_command(mtd);
-
-			memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6,
-				&elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3);
-			elbc_fcm_ctrl->index += 3;
-		}
-
-		elbc_fcm_ctrl->oob_poi = NULL;
 		return;
 	}
 
@@ -752,13 +732,8 @@ static void fsl_elbc_write_page(struct mtd_info *mtd,
                                 struct nand_chip *chip,
                                 const uint8_t *buf)
 {
-	struct fsl_elbc_mtd *priv = chip->priv;
-	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = priv->ctrl->nand;
-
 	fsl_elbc_write_buf(mtd, buf, mtd->writesize);
 	fsl_elbc_write_buf(mtd, chip->oob_poi, mtd->oobsize);
-
-	elbc_fcm_ctrl->oob_poi = chip->oob_poi;
 }
 
 static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv)
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi
  2011-07-05 22:35         ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech
@ 2011-07-05 23:01           ` Scott Wood
  2011-07-05 23:14             ` Matthew L. Creech
  2011-07-05 23:14           ` [PATCH v3] " Matthew L. Creech
  1 sibling, 1 reply; 19+ messages in thread
From: Scott Wood @ 2011-07-05 23:01 UTC (permalink / raw)
  To: Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd, rick22, mhench

On Tue, 5 Jul 2011 18:35:02 -0400
"Matthew L. Creech" <mlcreech@gmail.com> wrote:

> From: Mike Hench <mhench@elutions.com>
> 
> The eLBC NAND driver currently follows up each program/write operation with a
> read-back of the page, in order to [ostensibly] fill in ECC data for the
> caller. However, the page address used for this read is always -1, so the read
> will never work correctly.  Remove this useless (and potentially problematic)
> block of code.
> 
> v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the
> only place it was actually used.

Just noticed, full_page can come out as well.

Otherwise, ACK

-Scott

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi
  2011-07-05 23:01           ` Scott Wood
@ 2011-07-05 23:14             ` Matthew L. Creech
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 23:14 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, mhench, rick22

On Tue, Jul 5, 2011 at 7:01 PM, Scott Wood <scottwood@freescale.com> wrote:
>
> Just noticed, full_page can come out as well.
>
> Otherwise, ACK
>

Oh right, didn't notice that - thanks.

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi
  2011-07-05 22:35         ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech
  2011-07-05 23:01           ` Scott Wood
@ 2011-07-05 23:14           ` Matthew L. Creech
  2011-07-06  7:23             ` Artem Bityutskiy
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-05 23:14 UTC (permalink / raw)
  To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench

From: Mike Hench <mhench@elutions.com>

The eLBC NAND driver currently follows up each program/write operation with a
read-back of the page, in order to [ostensibly] fill in ECC data for the
caller. However, the page address used for this read is always -1, so the read
will never work correctly.  Remove this useless (and potentially problematic)
block of code.

v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the
only place it was actually used.

v3: local 'full_page' variable is no longer used either.

Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
---
 drivers/mtd/nand/fsl_elbc_nand.c |   33 ++-------------------------------
 1 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index 0bb254c..b4d310f 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -75,7 +75,6 @@ struct fsl_elbc_fcm_ctrl {
 	unsigned int use_mdr;    /* Non zero if the MDR is to be set      */
 	unsigned int oob;        /* Non zero if operating on OOB data     */
 	unsigned int counter;	 /* counter for the initializations	  */
-	char *oob_poi;           /* Place to write ECC after read back    */
 };
 
 /* These map to the positions used by the FCM hardware ECC generator */
@@ -435,7 +434,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 
 	/* PAGEPROG reuses all of the setup from SEQIN and adds the length */
 	case NAND_CMD_PAGEPROG: {
-		int full_page;
 		dev_vdbg(priv->dev,
 		         "fsl_elbc_cmdfunc: NAND_CMD_PAGEPROG "
 			 "writing %d bytes.\n", elbc_fcm_ctrl->index);
@@ -445,34 +443,12 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		 * write so the HW generates the ECC.
 		 */
 		if (elbc_fcm_ctrl->oob || elbc_fcm_ctrl->column != 0 ||
-		    elbc_fcm_ctrl->index != mtd->writesize + mtd->oobsize) {
+		    elbc_fcm_ctrl->index != mtd->writesize + mtd->oobsize)
 			out_be32(&lbc->fbcr, elbc_fcm_ctrl->index);
-			full_page = 0;
-		} else {
+		else
 			out_be32(&lbc->fbcr, 0);
-			full_page = 1;
-		}
 
 		fsl_elbc_run_command(mtd);
-
-		/* Read back the page in order to fill in the ECC for the
-		 * caller.  Is this really needed?
-		 */
-		if (full_page && elbc_fcm_ctrl->oob_poi) {
-			out_be32(&lbc->fbcr, 3);
-			set_addr(mtd, 6, page_addr, 1);
-
-			elbc_fcm_ctrl->read_bytes = mtd->writesize + 9;
-
-			fsl_elbc_do_read(chip, 1);
-			fsl_elbc_run_command(mtd);
-
-			memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6,
-				&elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3);
-			elbc_fcm_ctrl->index += 3;
-		}
-
-		elbc_fcm_ctrl->oob_poi = NULL;
 		return;
 	}
 
@@ -752,13 +728,8 @@ static void fsl_elbc_write_page(struct mtd_info *mtd,
                                 struct nand_chip *chip,
                                 const uint8_t *buf)
 {
-	struct fsl_elbc_mtd *priv = chip->priv;
-	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = priv->ctrl->nand;
-
 	fsl_elbc_write_buf(mtd, buf, mtd->writesize);
 	fsl_elbc_write_buf(mtd, chip->oob_poi, mtd->oobsize);
-
-	elbc_fcm_ctrl->oob_poi = chip->oob_poi;
 }
 
 static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv)
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi
  2011-07-05 23:14           ` [PATCH v3] " Matthew L. Creech
@ 2011-07-06  7:23             ` Artem Bityutskiy
  0 siblings, 0 replies; 19+ messages in thread
From: Artem Bityutskiy @ 2011-07-06  7:23 UTC (permalink / raw)
  To: Matthew L. Creech; +Cc: scottwood, linuxppc-dev, linux-mtd, mhench, rick22

On Tue, 2011-07-05 at 19:14 -0400, Matthew L. Creech wrote:
> From: Mike Hench <mhench@elutions.com>
> 
> The eLBC NAND driver currently follows up each program/write operation with a
> read-back of the page, in order to [ostensibly] fill in ECC data for the
> caller. However, the page address used for this read is always -1, so the read
> will never work correctly.  Remove this useless (and potentially problematic)
> block of code.

Pushed to l2-mtd-2.6.git, thanks!

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
  2011-07-05 19:58     ` Matthew L. Creech
@ 2011-07-11 15:30       ` Matthew L. Creech
  -1 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-11 15:30 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, rick22, Mike Hench

On Tue, Jul 5, 2011 at 3:58 PM, Matthew L. Creech <mlcreech@gmail.com> wrot=
e:
>
> Separately, I set up 2 test devices to run while I was away last week.
> =A0One of them contained 2 patches:
>
> - Mike Hench's patch which eliminates this block of code in fsl_elbc_nand=
.c
> - Adam Thomson's patch
> (http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
> which initializes oob_poi correctly
>
> Upon my return, the device with these patches saw no problems at all,
> and had no additional bad blocks. =A0The device without these patches
> had some 200+ blocks which had been newly marked as bad in the BBT
> over the course of 10 days. =A0After rebooting, this latter device then
> failed to boot, as shown here:
>
> http://mcreech.com/work/bbt-ecc-error4.txt
>
> I'm currently running another test to verify which of the two patches
> actually fixed this problem (which might take a few days), but it
> seems like removing that block of code in fsl_elbc_nand.c is a good
> idea.
>

Just an update: my tests confirmed that the patch to fsl_elbc_nand.c
(http://lists.infradead.org/pipermail/linux-mtd/2011-July/036893.html)
seems to have fixed these BBT corruption problems.

I ran a torture test on 2 devices for several days: the one which had
only that patch had no further issues, while the one which didn't have
it (but did have the other oob_poi patch from Adam) experienced BBT
corruption.

Thanks everyone

--=20
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: NAND BBT corruption on MPC83xx
@ 2011-07-11 15:30       ` Matthew L. Creech
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew L. Creech @ 2011-07-11 15:30 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, rick22, Mike Hench

On Tue, Jul 5, 2011 at 3:58 PM, Matthew L. Creech <mlcreech@gmail.com> wrote:
>
> Separately, I set up 2 test devices to run while I was away last week.
>  One of them contained 2 patches:
>
> - Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
> - Adam Thomson's patch
> (http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
> which initializes oob_poi correctly
>
> Upon my return, the device with these patches saw no problems at all,
> and had no additional bad blocks.  The device without these patches
> had some 200+ blocks which had been newly marked as bad in the BBT
> over the course of 10 days.  After rebooting, this latter device then
> failed to boot, as shown here:
>
> http://mcreech.com/work/bbt-ecc-error4.txt
>
> I'm currently running another test to verify which of the two patches
> actually fixed this problem (which might take a few days), but it
> seems like removing that block of code in fsl_elbc_nand.c is a good
> idea.
>

Just an update: my tests confirmed that the patch to fsl_elbc_nand.c
(http://lists.infradead.org/pipermail/linux-mtd/2011-July/036893.html)
seems to have fixed these BBT corruption problems.

I ran a torture test on 2 devices for several days: the one which had
only that patch had no further issues, while the one which didn't have
it (but did have the other oob_poi patch from Adam) experienced BBT
corruption.

Thanks everyone

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-07-11 15:30 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-17 20:54 NAND BBT corruption on MPC83xx Matthew L. Creech
2011-06-17 21:34 ` Scott Wood
2011-06-18 17:55   ` Mike Hench
2011-06-18 17:55     ` Mike Hench
2011-06-20 11:22     ` Atlant Schmidt
2011-06-20 11:22       ` Atlant Schmidt
2011-06-23  8:31       ` Artem Bityutskiy
2011-06-20 15:20   ` Matthew L. Creech
2011-07-05 19:58   ` Matthew L. Creech
2011-07-05 19:58     ` Matthew L. Creech
2011-07-05 19:59     ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech
2011-07-05 20:15       ` Scott Wood
2011-07-05 22:35         ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech
2011-07-05 23:01           ` Scott Wood
2011-07-05 23:14             ` Matthew L. Creech
2011-07-05 23:14           ` [PATCH v3] " Matthew L. Creech
2011-07-06  7:23             ` Artem Bityutskiy
2011-07-11 15:30     ` NAND BBT corruption on MPC83xx Matthew L. Creech
2011-07-11 15:30       ` Matthew L. Creech

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.