linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* VF610+ColdFireM54418 controller.
@ 2013-11-21 17:01 Bill Pringlemeir
  2013-11-21 21:52 ` Bill Pringlemeir
       [not found] ` <1389222441-4322-1-git-send-email-bpringlemeir@nbsps.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Bill Pringlemeir @ 2013-11-21 17:01 UTC (permalink / raw)
  To: linux-mtd


There are some mtd drivers for this NAND flash controller on the web.

Eg,
 https://dev.openwrt.org/browser/trunk/target/linux/coldfire/patches/016-Add-nand-driver-support-for-M54418TWR-board.patch?rev=31546
 https://github.com/Timesys/linux-timesys/blob/3.0-mvf/drivers/mtd/nand/fsl_nfc.c

The device has 9K SRAM for main and spare areas.  The register layout
is,

  off name      desc
   00 NFC_CMD1    Flash command 1
   04 NFC_CMD2    Flash command 2
   08 NFC_CAR     Column address
   0C NFC_RAR     Row address 
   10 NFC_RPT     Flash command repeat
   14 NFC_RAI     Row address increment
   18 NFC_SR1     Flash status 1 *read only*
   1C NFC_SR2     Flash status 2 *read only*
   20 NFC_DMA_CH1 DMA channel 1 address
   24 NFC_DMACFG  DMA configuration
   28 NFC_SWAP    Cach swap
   2C NFC_SECSZ   Sector size
   30 NFC_CFG     Flash configuration
   34 NFC_DMA_CH2 DMA channel 2 address
   38 NFC_ISR     Interrupt status

All registers are 32bit R/W unless noted, from section 31.3 of the
Vybrid NAND chapter.

Is anyone working on support for this chip set?  
Is there an existing driver that can be adapted?  
Is the 'fsl_nfc' name appropriate?  If not, what name?
Is there any reason an updated driver won't be considered for the
mainline?

Thanks,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VF610+ColdFireM54418 controller.
  2013-11-21 17:01 VF610+ColdFireM54418 controller Bill Pringlemeir
@ 2013-11-21 21:52 ` Bill Pringlemeir
       [not found] ` <1389222441-4322-1-git-send-email-bpringlemeir@nbsps.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Bill Pringlemeir @ 2013-11-21 21:52 UTC (permalink / raw)
  To: linux-mtd

On 21 Nov 2013, bpringlemeir@nbsps.com wrote:

> All registers are 32bit R/W unless noted, from section 31.3 of the
> Vybrid NAND chapter.

> Is there an existing driver that can be adapted?  

It looks like the mpc5125 SOC has the same NAND controller; the CPU/SOC
has support in the tree, but no NAND controller in mpc5125twr.dts.  The
registers and buffers differ from the mpc5121 NAND controller which is
in the tree; the difference between the mpc5125+vf610+ColdFireM54418
NAND controller and the mpc5121 NAND controller look significant to me.

Fwiw,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc.
       [not found]   ` <1389222441-4322-2-git-send-email-bpringlemeir@nbsps.com>
@ 2014-04-28 14:41     ` Stefan Agner
  2014-04-28 16:51       ` Bill Pringlemeir
  2014-04-29 16:36       ` Bill Pringlemeir
  0 siblings, 2 replies; 16+ messages in thread
From: Stefan Agner @ 2014-04-28 14:41 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

Hi Bill,

The driver works fine for me using 3.14 on Colibri VF61 (8-Bit bus
width, Samsung NAND, 2k page size). Also tested with the Hardware ECC.
Do you plan to send an update patch of the driver?

FYI, I ported the driver to U-Boot and will send a patch to the U-Boot
mailing list soon.

Some minor comments below:

Am 2014-01-09 00:07, schrieb Bill Pringlemeir:
<snip>
> +static u8 bbt_pattern[] = {'B', 'b', 't', '0' };
> +static u8 mirror_pattern[] = {'1', 't', 'b', 'B' };
> +
> +static struct nand_bbt_descr bbt_main_descr = {
> +	.options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE |
> +		   NAND_BBT_2BIT | NAND_BBT_VERSION,
> +	.offs =	11,
> +	.len = 4,
> +	.veroffs = 15,
> +	.maxblocks = 4,
> +	.pattern = bbt_pattern,
> +};
> +
> +static struct nand_bbt_descr bbt_mirror_descr = {
> +	.options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE |
> +		   NAND_BBT_2BIT | NAND_BBT_VERSION,
> +	.offs =	11,
> +	.len = 4,
> +	.veroffs = 15,
> +	.maxblocks = 4,
> +	.pattern = mirror_pattern,
> +};

This is the same BBT descriptor as used on Timesys Vybrid BSP. Other
than this "backward compatibility", is there another reason to use non
default BBT descriptor? As far as I can tell, the ECC does not conflict
with the default BBT description.

> +/* Write data to NFC buffers */
> +static void nfc_write_buf(struct mtd_info *mtd, const u_char *buf, int len)
> +{
...
IMHO this type of commands are not really required, the function name is
descriptive enough.

> +/* Vybrid only.  MPC5125 has full RB and four CS. Assume boot loader
> + * has set this register for now.
> + */

Multi-line comment style (there are some malformed multi-line comments
in the ECC patch as well)

--
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc.
  2014-04-28 14:41     ` [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc Stefan Agner
@ 2014-04-28 16:51       ` Bill Pringlemeir
  2014-04-29  7:50         ` Stefan Agner
  2014-04-29 16:36       ` Bill Pringlemeir
  1 sibling, 1 reply; 16+ messages in thread
From: Bill Pringlemeir @ 2014-04-28 16:51 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

On 28 Apr 2014, stefan@agner.ch wrote:

> The driver works fine for me using 3.14 on Colibri VF61 (8-Bit bus
> width, Samsung NAND, 2k page size). Also tested with the Hardware ECC.
> Do you plan to send an update patch of the driver?

> FYI, I ported the driver to U-Boot and will send a patch to the U-Boot
> mailing list soon.

> Some minor comments below:

Well, I would love it if there are only 'minor comments'.  I don't think
people will like the 'nfc' name.  I wanted a better name.  Also, the
'linux-mtd' list bounced my post because I used some 'Ref:' to refer to
another message.  It also bounce on the ARM list, but some kind
moderator put it through.

 http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/226623.html
 http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/226627.html
 etc.

Besides the Vybrid, the controller can support several other SOCs (some
ARM, some note), Such as the MPC5125 (PowerPC), MCF54418 (ColdFire) and
the Kinetis K70 (ARM Cortex-M).

I also have some tickets open on the Hardware ECC with the Vybrid.

 https://community.freescale.com/message/358284 - booting
 https://community.freescale.com/message/368216 - ECC value
 https://community.freescale.com/message/384556 - clocking

[There are also non-public freescale PR tickets]

Especially, the ECC layout is important.  I think that an HW ECC layout
with sub-page support is best.  The Linux-MTD community will want this
to be right.  

The email "reference" was a previous email I sent some time ago to the
MTD mailing list.  I wondered if anyone was interested and I knew that
people would not like the name 'fsl_nfc'.  But I don't know what to call
it; it is a bike shed issue to me (specifics of what to call it), but I
see how people will want to avoid a generic ambigious name like
'fsl_nfc'.

I was waiting to see about the clocking with HW-ECC; it seems above
33MHz, the HW-ECC module doesn't seem to work (at least for me).

Fwiw,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc.
  2014-04-28 16:51       ` Bill Pringlemeir
@ 2014-04-29  7:50         ` Stefan Agner
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Agner @ 2014-04-29  7:50 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

Hi Bill,

Am 2014-04-28 18:51, schrieb Bill Pringlemeir:
> The email "reference" was a previous email I sent some time ago to the
> MTD mailing list.  I wondered if anyone was interested and I knew that
> people would not like the name 'fsl_nfc'.  But I don't know what to call
> it; it is a bike shed issue to me (specifics of what to call it), but I
> see how people will want to avoid a generic ambigious name like
> 'fsl_nfc'.

In all documentation and block diagram I've read Freescale itself calls
that IP "NAND Flash Controller", even with the abbreviation NFC. IMHO,
its a valid name then. The only alternative would probably be MCF5441x,
since this is the first device that controller appeared, but since this
is ColdFire microprocessor which isn't suited for Linux, a driver named
after it in the kernel tree would be somewhat weird.


> I was waiting to see about the clocking with HW-ECC; it seems above
> 33MHz, the HW-ECC module doesn't seem to work (at least for me).

Yes, I could also verify this (see my comment on that Freescale
Community post).

Appreciate your work, count me as interested!

This is my U-Boot port of the driver, I'm going to cleanup and test it
on the Tower:
http://git.toradex.com/gitweb/u-boot-toradex.git/shortlog/refs/heads/2014.04-colibri_vf

--
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc.
  2014-04-28 14:41     ` [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc Stefan Agner
  2014-04-28 16:51       ` Bill Pringlemeir
@ 2014-04-29 16:36       ` Bill Pringlemeir
  1 sibling, 0 replies; 16+ messages in thread
From: Bill Pringlemeir @ 2014-04-29 16:36 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

On 28 Apr 2014, stefan@agner.ch wrote:

> The driver works fine for me using 3.14 on Colibri VF61 (8-Bit bus
> width, Samsung NAND, 2k page size). Also tested with the Hardware ECC.
> Do you plan to send an update patch of the driver?

> FYI, I ported the driver to U-Boot and will send a patch to the U-Boot
> mailing list soon.

> Some minor comments below:

> Am 2014-01-09 00:07, schrieb Bill Pringlemeir:
> <snip>
>> +static u8 bbt_pattern[] = {'B', 'b', 't', '0' }; +static u8
>> mirror_pattern[] = {'1', 't', 'b', 'B' }; + +static struct
>> nand_bbt_descr bbt_main_descr = { + .options = NAND_BBT_LASTBLOCK |
>> NAND_BBT_CREATE | NAND_BBT_WRITE | + NAND_BBT_2BIT |
>> NAND_BBT_VERSION, + .offs = 11, + .len = 4, + .veroffs = 15, +
>> .maxblocks = 4, + .pattern = bbt_pattern, +}; + +static struct
>> nand_bbt_descr bbt_mirror_descr = { + .options = NAND_BBT_LASTBLOCK |
>> NAND_BBT_CREATE | NAND_BBT_WRITE | + NAND_BBT_2BIT |
>> NAND_BBT_VERSION, + .offs = 11, + .len = 4, + .veroffs = 15, +
>> .maxblocks = 4, + .pattern = mirror_pattern, +};

> This is the same BBT descriptor as used on Timesys Vybrid BSP. Other
> than this "backward compatibility", is there another reason to use non
> default BBT descriptor? As far as I can tell, the ECC does not
> conflict with the default BBT description.

Beyond the "TimeSys", there are OpenWRT projects and others that seem to
use this BBT structure.  Myself, I don't care.  The question would be
will someone ever get to use this driver with some other system?  It is
simple enough to patch the driver; although a device tree binding
supported by the driver might be more flexible to allow both from
multi-machine builds.  This particular chip is not really a candidate as
it always seems to be used with a different architecture; PowerPC,
ColdFire/68k, ARM Cortex-A or Cortex-M.

>> +/* Write data to NFC buffers */ 
>> +static void nfc_write_buf(struct mtd_info *mtd, const u_char *buf,
>> int len)
>> +{
> ...  IMHO this type of commands are not really required, the function
> name is descriptive enough.

The comments are from the original.

https://github.com/Timesys/linux-timesys/blob/3.0-mvf/drivers/mtd/nand/fsl_nfc.c#L170

I agree they are not needed.

>> +/* Vybrid only.  MPC5125 has full RB and four CS. Assume boot loader
>> + * has set this register for now.
>> + */

> Multi-line comment style (there are some malformed multi-line comments
> in the ECC patch as well)

This is my comment.  I must of missed the advice on multi-line comments.
I did all sorts of strange things with test emails, building with
different git trees, running different scripts, re-ordering the patches
to try and make them understandable, testing different variants,
benchmarking, etc.  White space is not a riveting issue for me.  I just
cribbed some emacs code and hope it works.

   (defun linux-c-mode ()
     "C mode with adjusted defaults for use with the Linux kernel."
     (interactive)
     (c-mode)
     (c-set-style "K&R")
     (setq tab-width 8)
     (setq indent-tabs-mode t)
     (setq truncate-lines t)
     (setq show-trailing-whitespace t)
     (setq c-basic-offset 8))

I looked again, you mean that I should have the first "star" line blank
or is there more?

There are other issues, like Shawn Guo's IMX tree has a better structure
for the IOMUX bindings.  I am also concerned that people don't like the
order of the patches and I have to fsck with git to re-arrange them
(again).  However, I think that having the MTD people willing to accept
is my major hurdle on working on it further.  I don't know if they want
yet another controller?  I am glad that you can use it too and have
tested it with 8-bit flash.

Thanks,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
       [not found]   ` <1389222441-4322-3-git-send-email-bpringlemeir@nbsps.com>
@ 2014-09-17 17:02     ` Stefan Agner
  2014-09-17 18:06       ` Bill Pringlemeir
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Agner @ 2014-09-17 17:02 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

Hi Bill,

On one of our Colibri VF61 modules I discovered an issue using this
driver:

[    0.758327] ECC failed to correct all errors (ebd9fd80)
[    0.767005] ECC failed to correct all errors (ebd9fd80)
[    0.775525] ECC failed to correct all errors (ebd9fd80)
[    0.784004] ECC failed to correct all errors (ebd9fd80)
[    0.791938] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 28:2048, 
read 2048 bytes

That page supposed to be empty, and I got several of this messages.
Hence I did not believe that they have really ECC errors, so I digged a
bit deeper:

@@ -673,11 +673,13 @@ static int fsl_nfc_check_ecc_status(struct
mtd_info *mtd, u_char *dat)
                return ecc_count;
 
        /* If 'ecc_count' zero or less then buffer is all 0xff or
erased. */
-       flip = count_written_bits(dat, chip->ecc.size, ecc_count);
+       flip = count_written_bits(dat, chip->ecc.size, 100);
 
 
        if (flip > ecc_count) {
-               printk("ECC failed to correct all errors (%08x)\n",
ecc_status);
+               printk("ECC failed to correct all errors (%08x, flips
%d)\n",
+                               ecc_status, flip);
                return -1;
        }


[    0.758624] ECC failed to correct all errors (ebdded80, flips 1)
[    0.768123] ECC failed to correct all errors (ebdded80, flips 1)
[    0.777468] ECC failed to correct all errors (ebdded80, flips 1)
[    0.786773] ECC failed to correct all errors (ebdded80, flips 1)
[    0.795535] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 28:2048, read 2048 bytes
[    1.462183] ECC failed to correct all errors (ebdded82, flips 3)
[    1.471623] ECC failed to correct all errors (ebdded82, flips 3)
[    1.481025] ECC failed to correct all errors (ebdded82, flips 3)
[    1.490336] ECC failed to correct all errors (ebdded82, flips 3)
[    1.498953] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 664:2048, read 2048 bytes
[    1.551421] ECC failed to correct all errors (ebdded80, flips 1)
[    1.560616] ECC failed to correct all errors (ebdded80, flips 1)
[    1.569695] ECC failed to correct all errors (ebdded80, flips 1)
[    1.578711] ECC failed to correct all errors (ebdded80, flips 1)
[    1.587192] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 666:2048, read 2048 bytes
[    1.744612] ECC failed to correct all errors (ebdded81, flips 2)
[    1.753943] ECC failed to correct all errors (ebdded81, flips 2)
[    1.763146] ECC failed to correct all errors (ebdded81, flips 2)
[    1.772247] ECC failed to correct all errors (ebdded81, flips 2)
[    1.780722] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 776:2048, read 2048 bytes


Interesting thought, the returned ecc_count is exactly one below the
actual flipped bytes counted...

One comment below regarding this...

Am 2014-01-09 00:07, schrieb Bill Pringlemeir:
> Signed-off-by: Bill Pringlemeir <bpringlemeir@nbsps.com>
> ---
>  drivers/mtd/nand/Kconfig   |   5 +-
>  drivers/mtd/nand/fsl_nfc.c | 114 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 117 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
> index 7e0c695..8ac9923 100644
> --- a/drivers/mtd/nand/Kconfig
> +++ b/drivers/mtd/nand/Kconfig
> @@ -458,8 +458,9 @@ config MTD_NAND_FSL_NFC
>  	help
>  	  Enables support for NAND Flash controller on some Freescale
>  	  processors like the MPC5125, VF610, MCF54418, and Kinetis K70.
> -	  The driver supports a maximum 2k page size.
> -	  The driver currently does not support hardware ECC.
> +	  The driver supports a maximum 2k page size.  With 2k pages and
> +	  64 bytes or more of OOB, hardware ECC with 24bit correction is
> +	  supported.  Only MTD_OF (device tree) can enable the hardware ECC.
>  
>  config MTD_NAND_MXC
>  	tristate "MXC NAND support"
> diff --git a/drivers/mtd/nand/fsl_nfc.c b/drivers/mtd/nand/fsl_nfc.c
> index eb4e353..703511d 100644
> --- a/drivers/mtd/nand/fsl_nfc.c
> +++ b/drivers/mtd/nand/fsl_nfc.c
> @@ -17,6 +17,7 @@
>   * - Untested on MPC5125 and M54418.
>   * - DMA not used.
>   * - 2K pages or less.
> + * - Only 2K page w. 64+OOB and hardware ECC.
>   */
>  
>  #include <linux/module.h>
> @@ -68,6 +69,7 @@
>  
>  /* NFC ECC mode define */
>  #define ECC_BYPASS			0
> +#define ECC_45_BYTE			6
>  
>  /*** Register Mask and bit definitions */
>  
> @@ -100,6 +102,8 @@
>  #define STATUS_BYTE1_MASK			0x000000FF
>  
>  /* NFC_FLASH_CONFIG Field */
> +#define CONFIG_ECC_SRAM_ADDR_MASK		0x7FC00000
> +#define CONFIG_ECC_SRAM_ADDR_SHIFT		22
>  #define CONFIG_ECC_SRAM_REQ_BIT			(1<<21)
>  #define CONFIG_DMA_REQ_BIT			(1<<20)
>  #define CONFIG_ECC_MODE_MASK			0x000E0000
> @@ -120,6 +124,11 @@
>  
>  #define NFC_TIMEOUT		(HZ)
>  
> +/* ECC status placed at end of buffers. */
> +#define ECC_SRAM_ADDR	((PAGE_2K+256-8) >> 3)
> +#define ECC_STATUS_MASK	0x80
> +#define ECC_ERR_COUNT	0x3F
> +
>  struct fsl_nfc {
>  	struct mtd_info	   mtd;
>  	struct nand_chip   chip;
> @@ -160,6 +169,19 @@ static struct nand_bbt_descr bbt_mirror_descr = {
>  	.pattern = mirror_pattern,
>  };
>  
> +static struct nand_ecclayout nfc_ecc45 = {
> +	.eccbytes = 45,
> +	.eccpos = {19, 20, 21, 22, 23,
> +		   24, 25, 26, 27, 28, 29, 30, 31,
> +		   32, 33, 34, 35, 36, 37, 38, 39,
> +		   40, 41, 42, 43, 44, 45, 46, 47,
> +		   48, 49, 50, 51, 52, 53, 54, 55,
> +		   56, 57, 58, 59, 60, 61, 62, 63},
> +	.oobfree = {
> +		{.offset = 8,
> +		 .length = 11} }
> +};
> +
>  static u32 nfc_read(struct mtd_info *mtd, uint reg)
>  {
>  	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
> @@ -458,7 +480,61 @@ nfc_select_chip(struct mtd_info *mtd, int chip)
>  #endif
>  }
>  
> +/* Count the number of 0's in buff upto max_bits */
> +static int count_written_bits(uint8_t *buff, int size, int max_bits)
> +{
> +	int k, written_bits = 0;
> +
> +	for (k = 0; k < size; k++) {
> +		written_bits += hweight8(~buff[k]);
> +		if (written_bits > max_bits)
> +			break;
> +	}
> +
> +	return written_bits;
> +}
> +
> +static int nfc_correct_data(struct mtd_info *mtd, u_char *dat,
> +				 u_char *read_ecc, u_char *calc_ecc)
> +{
> +	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
> +	u32 ecc_status;
> +	u8 ecc_count;
> +	int flip;
> +
> +	/* Errata: ECC status is stored at NFC_CFG[ECCADD] +4 for
> +	   little-endian and +7 for big-endian SOC.  Access as 32 bits
> +	   and use low byte.
> +	*/
> +	ecc_status = __raw_readl(nfc->regs + ECC_SRAM_ADDR * 8 + 4);
> +	ecc_count = ecc_status & ECC_ERR_COUNT;
> +	if (!(ecc_status & ECC_STATUS_MASK))
> +		return ecc_count;
> +
> +	/* If 'ecc_count' zero or less then buffer is all 0xff or erased. */
> +	flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);

Also, I could not find that the reference manual states that ecc_count
represents the amount of flipped byte in a empty page. Is this given by
the ECC algorithm?



> +
> +	/* ECC failed. */
> +	if (flip > ecc_count)
> +		return -1;
> +
> +	/* Erased page. */
> +	memset(dat, 0xff, nfc->chip.ecc.size);
> +	return 0;
> +}
> +
> +static int nfc_calculate_ecc(struct mtd_info *mtd, const u_char *dat,
> +				  u_char *ecc_code)
> +{
> +	return 0;
> +}
> +
> +static void nfc_enable_hwecc(struct mtd_info *mtd, int mode)
> +{
> +}
> +
>  struct nfc_config {
> +	int hardware_ecc;
>  	int width;
>  	int flash_bbt;
>  	u32 clkrate;
> @@ -483,6 +559,9 @@ static int __init nfc_probe_dt(struct device *dev,
> struct nfc_config *cfg)
>  	if (!np)
>  		return 1;
>  
> +	if (of_get_nand_ecc_mode(np) >= NAND_ECC_HW)
> +		cfg->hardware_ecc = 1;
> +
>  	cfg->flash_bbt = of_get_nand_on_flash_bbt(np);
>  
>  	if (!of_property_read_u32(np, "clock-frequency", &clkrate))
> @@ -608,6 +687,11 @@ static int nfc_probe(struct platform_device *pdev)
>  	nfc_set_field(mtd, NFC_FLASH_CONFIG, CONFIG_PAGE_CNT_MASK,
>  			CONFIG_PAGE_CNT_SHIFT, 1);
>  
> +	/* Set ECC_STATUS offset */
> +	nfc_set_field(mtd, NFC_FLASH_CONFIG,
> +		      CONFIG_ECC_SRAM_ADDR_MASK,
> +		      CONFIG_ECC_SRAM_ADDR_SHIFT, ECC_SRAM_ADDR);
> +
>  	/* first scan to find the device and get the page size */
>  	if (nand_scan_ident(mtd, 1, NULL)) {
>  		err = -ENXIO;
> @@ -627,6 +711,36 @@ static int nfc_probe(struct platform_device *pdev)
>  	page_sz += cfg.width == 16 ? 1 : 0;
>  	nfc_write(mtd, NFC_SECTOR_SIZE, page_sz);
>  
> +	if (cfg.hardware_ecc) {
> +		if (mtd->writesize != PAGE_2K && mtd->oobsize < 64) {
> +			dev_err(nfc->dev, "Unsupported flash with hwecc\n");
> +			err = -ENXIO;
> +			goto error;
> +		}
> +
> +		chip->ecc.layout = &nfc_ecc45;
> +
> +		/* propagate ecc.layout to mtd_info */
> +		mtd->ecclayout = chip->ecc.layout;
> +		chip->ecc.calculate = nfc_calculate_ecc;
> +		chip->ecc.hwctl = nfc_enable_hwecc;
> +		chip->ecc.correct = nfc_correct_data;
> +		chip->ecc.mode = NAND_ECC_HW;
> +
> +		chip->ecc.bytes = 45;
> +		chip->ecc.size = PAGE_2K;
> +		chip->ecc.strength = 24;
> +
> +		/* set ECC mode to 45 bytes OOB with 24 bits correction */
> +		nfc_set_field(mtd, NFC_FLASH_CONFIG,
> +				CONFIG_ECC_MODE_MASK,
> +				CONFIG_ECC_MODE_SHIFT, ECC_45_BYTE);
> +
> +		/* Enable ECC_STATUS */
> +		nfc_set(mtd, NFC_FLASH_CONFIG, CONFIG_ECC_SRAM_REQ_BIT);
> +
> +	}
> +
>  	/* second phase scan */
>  	if (nand_scan_tail(mtd)) {
>  		err = -ENXIO;


--
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-09-17 17:02     ` [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections Stefan Agner
@ 2014-09-17 18:06       ` Bill Pringlemeir
  2014-09-17 20:08         ` Stefan Agner
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Pringlemeir @ 2014-09-17 18:06 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

On 17 Sep 2014, stefan@agner.ch wrote:

> On one of our Colibri VF61 modules I discovered an issue using this
> driver:

> [    0.758327] ECC failed to correct all errors (ebd9fd80)
> [    0.767005] ECC failed to correct all errors (ebd9fd80)
> [    0.775525] ECC failed to correct all errors (ebd9fd80)
> [    0.784004] ECC failed to correct all errors (ebd9fd80)
> [    0.791938] UBI error: ubi_io_read: error -74 (ECC error) while
> reading 2048 bytes from PEB 28:2048, 
> read 2048 bytes

> That page supposed to be empty, and I got several of this messages.
> Hence I did not believe that they have really ECC errors, so I digged
> a bit deeper:

>>> -673,11 +673,13 @@ static int fsl_nfc_check_ecc_status(struct
> mtd_info *mtd, u_char *dat)
> return ecc_count;

> /* If 'ecc_count' zero or less then buffer is all 0xff or
> erased. */
> -       flip = count_written_bits(dat, chip->ecc.size, ecc_count);
> +       flip = count_written_bits(dat, chip->ecc.size, 100);

> if (flip > ecc_count) {
> -               printk("ECC failed to correct all errors (%08x)\n",
> ecc_status);
> +               printk("ECC failed to correct all errors (%08x, flips
> %d)\n",
> +                               ecc_status, flip);
> return -1;
> }

> [    0.758624] ECC failed to correct all errors (ebdded80, flips 1)
> [    0.768123] ECC failed to correct all errors (ebdded80, flips 1)
> [    0.777468] ECC failed to correct all errors (ebdded80, flips 1)
> [    0.786773] ECC failed to correct all errors (ebdded80, flips 1)
> [    0.795535] UBI error: ubi_io_read: error -74 (ECC error) while
> reading 2048 bytes from PEB 28:2048, read 2048 bytes

[snip]

> Interesting thought, the returned ecc_count is exactly one below the
> actual flipped bytes counted...
>
> One comment below regarding this...

[snip]

>> +
>> +	/* If 'ecc_count' zero or less then buffer is all 0xff or erased. */
>> +	flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);

> Also, I could not find that the reference manual states that ecc_count
> represents the amount of flipped byte in a empty page. Is this given
> by the ECC algorithm?

I was using this information.

   Table 31-18. ECC Status Word Field Definition

   7            0 Page has been successfully corrected
   CORFAIL      1 Page is uncorrectable

   5–0          Number of errors that have been corrected in this page
   ERROR_COUNT

This is from the Vybrid RM, but the MPC5125RM has the same information.

I have definitely tested the detection of 'erased pages'.  However, I
don't know that I ever had actual bit flips.

The ECC controller has no idea whether the page is empty or not.  Are
you saying you have an erased page with bit flips?  I have definitely
not tested this.

>> +	/* If 'ecc_count' zero or less then buffer is all 0xff or erased. */
>> +	flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);

There are two issues.

 1) erased page with some physical flipped bits (zero).
 2) erased page were controller flipped some bits.

Currently, the code is only handling case 2 (that is what the controller
counts).  I believe that your physical page has actual 'stuck at zero'
bits.  The current driver gives up.  If you want to handle this then we
should replace the lines,

> +	/* ECC failed. */
> +	if (flip > ecc_count)
> +		return -1;

With something like,

 +	/* Not completely empty. */
 +	if (flip > ecc_count) {
 +           re_read_page_w_ecc_off();
 +           if (count_written_bits() > strength) /* strength/2 ? */
 +		return -1;
 +      }

There is also a discussion on this here,

http://lists.infradead.org/pipermail/linux-mtd/2014-March/052507.html
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052508.html
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052509.html
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052526.html
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052527.html
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052619.html
... etc.  Use the view thread.

Especially,
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052647.html

Here Brian says that 3b applies to SLC NAND.  I guess this is you :).

It is fairly common to read an erased page.  Doing 'raw reads' all the
time on an erased page will slow the file system.  However, doing
re-reads for an erased page with bit flips is probably fairly uncommon.
A flash in this state maybe near EOL or the sector was bad from the
factory but not marked so.

I am chasing an DDR2 issue on another CPU and haven't had much more time
to the look at the Vybrid nor follow the MTD mailing list.  I don't know
if the 'NAND_ECC_NEED_CHECK_FF' feature has been merged?  It may also
solve the issue.

Regards,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-09-17 18:06       ` Bill Pringlemeir
@ 2014-09-17 20:08         ` Stefan Agner
  2014-09-17 22:21           ` Bill Pringlemeir
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Agner @ 2014-09-17 20:08 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

Am 2014-09-17 20:06, schrieb Bill Pringlemeir:
> On 17 Sep 2014, stefan@agner.ch wrote:
> 
>> On one of our Colibri VF61 modules I discovered an issue using this
>> driver:
> 
>> [    0.758327] ECC failed to correct all errors (ebd9fd80)
>> [    0.767005] ECC failed to correct all errors (ebd9fd80)
>> [    0.775525] ECC failed to correct all errors (ebd9fd80)
>> [    0.784004] ECC failed to correct all errors (ebd9fd80)
>> [    0.791938] UBI error: ubi_io_read: error -74 (ECC error) while
>> reading 2048 bytes from PEB 28:2048,
>> read 2048 bytes
> 
>> That page supposed to be empty, and I got several of this messages.
>> Hence I did not believe that they have really ECC errors, so I digged
>> a bit deeper:
> 
>>>> -673,11 +673,13 @@ static int fsl_nfc_check_ecc_status(struct
>> mtd_info *mtd, u_char *dat)
>> return ecc_count;
> 
>> /* If 'ecc_count' zero or less then buffer is all 0xff or
>> erased. */
>> -       flip = count_written_bits(dat, chip->ecc.size, ecc_count);
>> +       flip = count_written_bits(dat, chip->ecc.size, 100);
> 
>> if (flip > ecc_count) {
>> -               printk("ECC failed to correct all errors (%08x)\n",
>> ecc_status);
>> +               printk("ECC failed to correct all errors (%08x, flips
>> %d)\n",
>> +                               ecc_status, flip);
>> return -1;
>> }
> 
>> [    0.758624] ECC failed to correct all errors (ebdded80, flips 1)
>> [    0.768123] ECC failed to correct all errors (ebdded80, flips 1)
>> [    0.777468] ECC failed to correct all errors (ebdded80, flips 1)
>> [    0.786773] ECC failed to correct all errors (ebdded80, flips 1)
>> [    0.795535] UBI error: ubi_io_read: error -74 (ECC error) while
>> reading 2048 bytes from PEB 28:2048, read 2048 bytes
> 
> [snip]
> 
>> Interesting thought, the returned ecc_count is exactly one below the
>> actual flipped bytes counted...
>>
>> One comment below regarding this...
> 
> [snip]
> 
>>> +
>>> +	/* If 'ecc_count' zero or less then buffer is all 0xff or erased. */
>>> +	flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
> 
>> Also, I could not find that the reference manual states that ecc_count
>> represents the amount of flipped byte in a empty page. Is this given
>> by the ECC algorithm?
> 
> I was using this information.
> 
>    Table 31-18. ECC Status Word Field Definition
> 
>    7            0 Page has been successfully corrected
>    CORFAIL      1 Page is uncorrectable
> 
>    5–0          Number of errors that have been corrected in this page
>    ERROR_COUNT
> 
> This is from the Vybrid RM, but the MPC5125RM has the same information.
> 
> I have definitely tested the detection of 'erased pages'.  However, I
> don't know that I ever had actual bit flips.
> 
> The ECC controller has no idea whether the page is empty or not.  Are
> you saying you have an erased page with bit flips?  I have definitely
> not tested this.

Yes, that's what it looks like. Note that this output is on first boot
after flashing the device (with the UBI auto-grow option). I guess UBI
is reading all pages once to verify that they are empty.

>>> +	/* If 'ecc_count' zero or less then buffer is all 0xff or erased. */
>>> +	flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
> 
> There are two issues.
> 
>  1) erased page with some physical flipped bits (zero).
>  2) erased page were controller flipped some bits.

I did not know that issue 2 exists. How can 2 happen on a erased page?
With a 'stuck at zero' in OOB?

> Currently, the code is only handling case 2 (that is what the controller
> counts).  I believe that your physical page has actual 'stuck at zero'
> bits.  The current driver gives up.  If you want to handle this then we
> should replace the lines,
> 

I assumed that your code assumes that the controller returns the flipped
bit count when reading an erase page (case 1), which I did not found in
the documentation. 

>> +	/* ECC failed. */
>> +	if (flip > ecc_count)
>> +		return -1;
> 
> With something like,
> 
>  +	/* Not completely empty. */
>  +	if (flip > ecc_count) {
>  +           re_read_page_w_ecc_off();
>  +           if (count_written_bits() > strength) /* strength/2 ? */
>  +		return -1;
>  +      }
> 
> There is also a discussion on this here,
> 
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052507.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052508.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052509.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052526.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052527.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052619.html
> ... etc.  Use the view thread.
> 
> Especially,
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052647.html
> 
> Here Brian says that 3b applies to SLC NAND.  I guess this is you :).

Yes, we are using Macronix SLC NAND.

> 
> It is fairly common to read an erased page.  Doing 'raw reads' all the
> time on an erased page will slow the file system.  However, doing
> re-reads for an erased page with bit flips is probably fairly uncommon.
> A flash in this state maybe near EOL or the sector was bad from the
> factory but not marked so.

This is a new device, but its one out of several dozens. The device had
two factory marked bad page. This four page would then be 6 bad pages. I
would say that your guess is probably the case at hand (should be
considered bad, but were marked by factory).  

> 
> I am chasing an DDR2 issue on another CPU and haven't had much more time
> to the look at the Vybrid nor follow the MTD mailing list.  I don't know
> if the 'NAND_ECC_NEED_CHECK_FF' feature has been merged?  It may also
> solve the issue.

A quick search through 3.17-rc3 didn't found that string.

I will read the pages using raw again and check how many bit flips it
shows. But I guess regarding Brian's comment to 3b, the driver actually
behaves correct and return -1, because we have bit flips on erased page
which is not good...

UBI starts to scrub & torture those 4 pages then, but starts doing this
with other pages too. All the pages start to fail then! I'm still
investigating that issue.

[   30.901345] UBI: scrubbed PEB 1436 (LEB 0:356), data moved to PEB
3965
[   31.100247] UBI: run torture test for PEB 1436
[   31.280845] UBI error: torture_peb: read problems on freshly erased
PEB 1436, must be bad
[   31.300656] UBI error: erase_worker: failed to erase PEB 1436, error
-5
[   31.312626] UBI: mark PEB 1436 as bad
[   31.338216] UBI: 12 PEBs left in the reserve
[   31.519044] UBI: scrubbed PEB 1390 (LEB 0:313), data moved to PEB
3963
[   31.697812] UBI: run torture test for PEB 1390
[   31.880470] UBI error: torture_peb: read problems on freshly erased
PEB 1390, must be bad
[   31.898220] UBI error: erase_worker: failed to erase PEB 1390, error
-5
[   31.909687] UBI: mark PEB 1390 as bad
[   31.931620] UBI: 11 PEBs left in the reserve
[   32.170599] UBI: scrubbed PEB 1470 (LEB 0:389), data moved to PEB
3961
[   32.344564] UBI: run torture test for PEB 1470
[   32.522842] UBI error: torture_peb: read problems on freshly erased
PEB 1470, must be bad
[   32.540587] UBI error: erase_worker: failed to erase PEB 1470, error
-5
[   32.552060] UBI: mark PEB 1470 as bad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-09-17 20:08         ` Stefan Agner
@ 2014-09-17 22:21           ` Bill Pringlemeir
  2014-12-10 14:56             ` Stefan Agner
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Pringlemeir @ 2014-09-17 22:21 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

On 17 Sep 2014, stefan@agner.ch wrote:

> Am 2014-09-17 20:06, schrieb Bill Pringlemeir:
>> On 17 Sep 2014, stefan@agner.ch wrote:

>>> On one of our Colibri VF61 modules I discovered an issue using this
>>> driver:

>>> [    0.758327] ECC failed to correct all errors (ebd9fd80)
>>> [    0.767005] ECC failed to correct all errors (ebd9fd80)
>>> [    0.775525] ECC failed to correct all errors (ebd9fd80)
>>> [    0.784004] ECC failed to correct all errors (ebd9fd80)
>>> [    0.791938] UBI error: ubi_io_read: error -74 (ECC error) while
>>> reading 2048 bytes from PEB 28:2048,
>>> read 2048 bytes

[snip]

>>> Also, I could not find that the reference manual states that
>>> ecc_count represents the amount of flipped byte in a empty page. Is
>>> this given by the ECC algorithm?

>> I was using this information.
>>
>> Table 31-18. ECC Status Word Field Definition
>>
>> 7            0 Page has been successfully corrected
>> CORFAIL      1 Page is uncorrectable
>>
>> 5–0          Number of errors that have been corrected in this page
>> ERROR_COUNT
>>
>> This is from the Vybrid RM, but the MPC5125RM has the same
>> information.
>>
>> I have definitely tested the detection of 'erased pages'.  However, I
>> don't know that I ever had actual bit flips.

>> The ECC controller has no idea whether the page is empty or not.  Are
>> you saying you have an erased page with bit flips?  I have definitely
>> not tested this.

> Yes, that's what it looks like. Note that this output is on first boot
> after flashing the device (with the UBI auto-grow option). I guess UBI
> is reading all pages once to verify that they are empty.

>>>> + /* If 'ecc_count' zero or less then buffer is all 0xff or
>>>> erased. */ + flip = count_written_bits(dat, nfc->chip.ecc.size,
>>>> ecc_count);
>>
>> There are two issues.
>>
>> 1) erased page with some physical flipped bits (zero).
>> 2) erased page were controller flipped some bits.

> I did not know that issue 2 exists. How can 2 happen on a erased page?
> With a 'stuck at zero' in OOB?

Issue 2 is always an issue when reading an erased page with hardware
ECC.  So we have,

                    Main         Spare
    Good program:   Data    OOB-data+HW-ECC
    Good erased:    FF      FF  FF
    Stuck erased1:  FF      FF  xx
    Stuck erased2:  xx      xx  FF

For the 'good erased' the controller goes about like it is case 'good
program'.  It reads a sector (raw read or software ECC case) and a the
same time, the error corrector is running.  This examines the 'OOB
data', which is 0xff for the erased sector and it blindly begins to
apply the error correction to the main data section.  The 'correction'
for an erased block will always flip '1' to '0'.  So for 'good erased',
the 'error_count' will equal the flipped bits.

Now, your sectors aren't that bad.  I believe you only have one bit
error on an erase.  However, it is really hairy to know what the ECC
logic will do when you have bit flips in the ECC code.  It is also next
to impossible to examine the buffer and know if zero bits are due to the
ECC engine flipping the bits and/or the actual physical flash having the
stuck bit.  It appears that the ECC engine runs from start to finish.
So most of the ECC corrections should be at the start; use this info for
diagnostics, but not coding!

The best course IMHO is to do what Brian (and other) did and do a 'raw'
read, so you take the ECC engine out of the picture and stop it from
flipping bits.  You can experiment with always doing the raw read, but I
believe you will see a decrease in performance (as others mentioned and
I saw).

>> Currently, the code is only handling case 2 (that is what the
>> controller counts).  I believe that your physical page has actual
>> 'stuck at zero' bits.  The current driver gives up.  If you want to
>> handle this then we should replace the lines,

> I assumed that your code assumes that the controller returns the
> flipped bit count when reading an erase page (case 1), which I did not
> found in the documentation.

>>> +	/* ECC failed. */
>>> +	if (flip > ecc_count)
>>> +		return -1;

>> With something like,
>>
>> +	/* Not completely empty. */
>> +	if (flip > ecc_count) {
>> +           re_read_page_w_ecc_off();
>> +           if (count_written_bits() > strength) /* strength/2 ? */
>> +		return -1;
>> +      }
>>
>> There is also a discussion on this here,
>>
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052507.html
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052508.html
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052509.html
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052526.html
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052527.html
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052619.html
>> ... etc.  Use the view thread.
>>
>> Especially,
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052647.html
>>
>> Here Brian says that 3b applies to SLC NAND.  I guess this is you :).

> Yes, we are using Macronix SLC NAND.

Well, it is everyone.  Sorry, I made the assumption this wouldn't
happen, but apparently it does.

>> It is fairly common to read an erased page.  Doing 'raw reads' all
>> the time on an erased page will slow the file system.  However, doing
>> re-reads for an erased page with bit flips is probably fairly
>> uncommon.  A flash in this state maybe near EOL or the sector was bad
>> from the factory but not marked so.

> This is a new device, but its one out of several dozens. The device
> had two factory marked bad page. This four page would then be 6 bad
> pages. I would say that your guess is probably the case at hand
> (should be considered bad, but were marked by factory).

>> I am chasing an DDR2 issue on another CPU and haven't had much more
>> time to the look at the Vybrid nor follow the MTD mailing list.  I
>> don't know if the 'NAND_ECC_NEED_CHECK_FF' feature has been merged?
>> It may also solve the issue.
>
> A quick search through 3.17-rc3 didn't found that string.

> I will read the pages using raw again and check how many bit flips it
> shows. But I guess regarding Brian's comment to 3b, the driver
> actually behaves correct and return -1, because we have bit flips on
> erased page which is not good...

Hmm.  No, it should be ok; I think Brian meant we should accept and
expect this.  We had this conversation after the RFC.  However, we
should probably only accept strength/2 zero values at most; I believe
what to accept was kind of hanging in that conversation.  If the zeros
are below our threshold with a raw read, then we can report the page is
all '\xff' and return a 'bit flip' count as the number of zeros found in
a raw read.  Also, in the 'all FFs' case we should probably memset the
OOB data.  I see that other patches mentioned in the MTD threads above
do this.

 +	/* Not completely empty. */
 +	if (flip > ecc_count) {
 +           re_read_page_w_ecc_off();
 +           ecc_count = count_zero_bits();
 +           if (ecc_count > strength/2)
 +		return -1;
 +      } else
 +          ecc_count = 0;  /* 'normal' case all ff */
 +
 +	/* Erased page. */
 +	memset(dat, 0xff, nfc->chip.ecc.size);
 +	memset(oob, 0xff, ???);  /* ??? */
 +	return ecc_count;

I don't think that UBI/UbiFs use the OOB data at all, but that is
something that I don't think is completely correct in this path?

> UBI starts to scrub & torture those 4 pages then, but starts doing
> this with other pages too. All the pages start to fail then! I'm still
> investigating that issue.

Arg.  I am not quite sure.  It seems ok to have a few 'stuck at one'
errors.  As you move closer to the ECC strength this seems a little
crazy.  At some point the sector/erase block is becoming useless.  I am
not sure if this is an issue with this device; it seems a little
suspect.  Maybe you could code something that emulates the stuck bits on
a good device and see if it still behaves the same; triggers a software
condition.  You might also reduce the NFC bus clock just to see on the
'stuck zero' device.

Hth,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-09-17 22:21           ` Bill Pringlemeir
@ 2014-12-10 14:56             ` Stefan Agner
  2014-12-11 16:44               ` Bill Pringlemeir
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Agner @ 2014-12-10 14:56 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

Hi Bill,

On 2014-09-18 00:21, Bill Pringlemeir wrote:
> On 17 Sep 2014, stefan@agner.ch wrote:
> 
>> Am 2014-09-17 20:06, schrieb Bill Pringlemeir:
>>> On 17 Sep 2014, stefan@agner.ch wrote:
> 
>>>> On one of our Colibri VF61 modules I discovered an issue using this
>>>> driver:
> 
>>>> [    0.758327] ECC failed to correct all errors (ebd9fd80)
>>>> [    0.767005] ECC failed to correct all errors (ebd9fd80)
>>>> [    0.775525] ECC failed to correct all errors (ebd9fd80)
>>>> [    0.784004] ECC failed to correct all errors (ebd9fd80)
>>>> [    0.791938] UBI error: ubi_io_read: error -74 (ECC error) while
>>>> reading 2048 bytes from PEB 28:2048,
>>>> read 2048 bytes
> 
> [snip]
> 
>>>> Also, I could not find that the reference manual states that
>>>> ecc_count represents the amount of flipped byte in a empty page. Is
>>>> this given by the ECC algorithm?
> 
>>> I was using this information.
>>>
>>> Table 31-18. ECC Status Word Field Definition
>>>
>>> 7            0 Page has been successfully corrected
>>> CORFAIL      1 Page is uncorrectable
>>>
>>> 5–0          Number of errors that have been corrected in this page
>>> ERROR_COUNT
>>>
>>> This is from the Vybrid RM, but the MPC5125RM has the same
>>> information.
>>>
>>> I have definitely tested the detection of 'erased pages'.  However, I
>>> don't know that I ever had actual bit flips.
> 
>>> The ECC controller has no idea whether the page is empty or not.  Are
>>> you saying you have an erased page with bit flips?  I have definitely
>>> not tested this.
> 
>> Yes, that's what it looks like. Note that this output is on first boot
>> after flashing the device (with the UBI auto-grow option). I guess UBI
>> is reading all pages once to verify that they are empty.
> 
>>>>> + /* If 'ecc_count' zero or less then buffer is all 0xff or
>>>>> erased. */ + flip = count_written_bits(dat, nfc->chip.ecc.size,
>>>>> ecc_count);
>>>
>>> There are two issues.
>>>
>>> 1) erased page with some physical flipped bits (zero).
>>> 2) erased page were controller flipped some bits.
> 
>> I did not know that issue 2 exists. How can 2 happen on a erased page?
>> With a 'stuck at zero' in OOB?
> 
> Issue 2 is always an issue when reading an erased page with hardware
> ECC.  So we have,
> 
>                     Main         Spare
>     Good program:   Data    OOB-data+HW-ECC
>     Good erased:    FF      FF  FF
>     Stuck erased1:  FF      FF  xx
>     Stuck erased2:  xx      xx  FF
> 
> For the 'good erased' the controller goes about like it is case 'good
> program'.  It reads a sector (raw read or software ECC case) and a the
> same time, the error corrector is running.  This examines the 'OOB
> data', which is 0xff for the erased sector and it blindly begins to
> apply the error correction to the main data section.  The 'correction'
> for an erased block will always flip '1' to '0'.  So for 'good erased',
> the 'error_count' will equal the flipped bits.
> 
> Now, your sectors aren't that bad.  I believe you only have one bit
> error on an erase.  However, it is really hairy to know what the ECC
> logic will do when you have bit flips in the ECC code.  It is also next
> to impossible to examine the buffer and know if zero bits are due to the
> ECC engine flipping the bits and/or the actual physical flash having the
> stuck bit.  It appears that the ECC engine runs from start to finish.
> So most of the ECC corrections should be at the start; use this info for
> diagnostics, but not coding!

Two errors are at the beginning, one is almost at the end.

> The best course IMHO is to do what Brian (and other) did and do a 'raw'
> read, so you take the ECC engine out of the picture and stop it from
> flipping bits.  You can experiment with always doing the raw read, but I
> believe you will see a decrease in performance (as others mentioned and
> I saw).
> 

Hm, we currently don't have a non ECC read function as we enable ECC
unconditionally. I guess this would need a read_oob_raw implementation,
where we disable ECC, read the page, and reenable ECC...

>>> Currently, the code is only handling case 2 (that is what the
>>> controller counts).  I believe that your physical page has actual
>>> 'stuck at zero' bits.  The current driver gives up.  If you want to
>>> handle this then we should replace the lines,
> 
>> I assumed that your code assumes that the controller returns the
>> flipped bit count when reading an erase page (case 1), which I did not
>> found in the documentation.
> 
>>>> +	/* ECC failed. */
>>>> +	if (flip > ecc_count)
>>>> +		return -1;
> 
>>> With something like,
>>>
>>> +	/* Not completely empty. */
>>> +	if (flip > ecc_count) {
>>> +           re_read_page_w_ecc_off();
>>> +           if (count_written_bits() > strength) /* strength/2 ? */
>>> +		return -1;
>>> +      }
>>>
>>> There is also a discussion on this here,
>>>
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052507.html
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052508.html
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052509.html
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052526.html
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052527.html
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052619.html
>>> ... etc.  Use the view thread.
>>>
>>> Especially,
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052647.html
>>>
>>> Here Brian says that 3b applies to SLC NAND.  I guess this is you :).
> 
>> Yes, we are using Macronix SLC NAND.
> 
> Well, it is everyone.  Sorry, I made the assumption this wouldn't
> happen, but apparently it does.
> 
>>> It is fairly common to read an erased page.  Doing 'raw reads' all
>>> the time on an erased page will slow the file system.  However, doing
>>> re-reads for an erased page with bit flips is probably fairly
>>> uncommon.  A flash in this state maybe near EOL or the sector was bad
>>> from the factory but not marked so.
> 
>> This is a new device, but its one out of several dozens. The device
>> had two factory marked bad page. This four page would then be 6 bad
>> pages. I would say that your guess is probably the case at hand
>> (should be considered bad, but were marked by factory).
> 
>>> I am chasing an DDR2 issue on another CPU and haven't had much more
>>> time to the look at the Vybrid nor follow the MTD mailing list.  I
>>> don't know if the 'NAND_ECC_NEED_CHECK_FF' feature has been merged?
>>> It may also solve the issue.
>>
>> A quick search through 3.17-rc3 didn't found that string.
> 
>> I will read the pages using raw again and check how many bit flips it
>> shows. But I guess regarding Brian's comment to 3b, the driver
>> actually behaves correct and return -1, because we have bit flips on
>> erased page which is not good...
> 
> Hmm.  No, it should be ok; I think Brian meant we should accept and
> expect this.  We had this conversation after the RFC.  However, we
> should probably only accept strength/2 zero values at most; I believe
> what to accept was kind of hanging in that conversation.  If the zeros
> are below our threshold with a raw read, then we can report the page is
> all '\xff' and return a 'bit flip' count as the number of zeros found in
> a raw read.  Also, in the 'all FFs' case we should probably memset the
> OOB data.  I see that other patches mentioned in the MTD threads above
> do this.
> 
>  +	/* Not completely empty. */
>  +	if (flip > ecc_count) {
>  +           re_read_page_w_ecc_off();
>  +           ecc_count = count_zero_bits();
>  +           if (ecc_count > strength/2)
>  +		return -1;
>  +      } else
>  +          ecc_count = 0;  /* 'normal' case all ff */
>  +
>  +	/* Erased page. */
>  +	memset(dat, 0xff, nfc->chip.ecc.size);
>  +	memset(oob, 0xff, ???);  /* ??? */
>  +	return ecc_count;
> 

What I currently did, is just accept strength / 2 bits. This is not a
clean solution since it will also count the ECC bits, but it works for
now:
--- a/drivers/mtd/nand/fsl_nfc.c
+++ b/drivers/mtd/nand/fsl_nfc.c
@@ -524,7 +524,7 @@ static int nfc_correct_data(struct mtd_info *mtd,
u_char *dat,
        flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
 
        /* ECC failed. */
-       if (flip > ecc_count)
+       if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2))
                return -1;
 
        /* Erased page. */



> I don't think that UBI/UbiFs use the OOB data at all, but that is
> something that I don't think is completely correct in this path?
> 
>> UBI starts to scrub & torture those 4 pages then, but starts doing
>> this with other pages too. All the pages start to fail then! I'm still
>> investigating that issue.
> 
> Arg.  I am not quite sure.  It seems ok to have a few 'stuck at one'
> errors.  As you move closer to the ECC strength this seems a little
> crazy.  At some point the sector/erase block is becoming useless.  I am
> not sure if this is an issue with this device; it seems a little
> suspect.  Maybe you could code something that emulates the stuck bits on
> a good device and see if it still behaves the same; triggers a software
> condition.  You might also reduce the NFC bus clock just to see on the
> 'stuck zero' device.
> 

I think we are facing multiple issues here. One might contain general
software/hardware issues (non bit-flip related). I had this issue again
on a different module with 3.18-rc5 (without the "fix" above). The
kernel output looks like this:

[    1.744672] UBI: default fastmap pool size: 200
[    1.752573] UBI: default fastmap WL pool size: 25
[    1.760577] UBI: attaching mtd3 to ubi0
[    1.769093] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 0:2048, read only 2048 bytes, retry
[    1.787579] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 0:2048, read only 2048 bytes, retry
[    1.806383] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 0:2048, read only 2048 bytes, retry
[    1.825649] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 0:2048, read 2048 bytes
[    1.843804] CPU: 0 PID: 1 Comm: swapper Not tainted
3.18.0-rc5-00136-g0d91606-dirty #1607
[    1.860148] Backtrace:
[    1.866779] [<80011a04>] (dump_backtrace) from [<80011ce0>]
(show_stack+0x18/0x1c)
[    1.883061]  r6:00000000 r5:00000800 r4:ffffffb6 r3:00000000
[    1.893395] [<80011cc8>] (show_stack) from [<805a5018>]
(dump_stack+0x24/0x28)
[    1.909684] [<805a4ff4>] (dump_stack) from [<8034e9ac>]
(ubi_io_read+0x130/0x300)
[    1.926615] [<8034e87c>] (ubi_io_read) from [<8034efac>]
(ubi_io_read_vid_hdr+0x50/0x220)
[    1.944255]  r10:00000000 r9:00000000 r8:8ea4c000 r7:00000000
r6:8eaa3000 r5:00000000
[    1.961788]  r4:8ea4c000
[    1.969163] [<8034ef5c>] (ubi_io_read_vid_hdr) from [<803539ec>]
(scan_peb.part.9+0x128/0x6a4)
[    1.987614]  r10:00000000 r9:8e843dec r8:8ea4c000 r7:00000000
r6:8ead2900 r5:00000000
[    2.005493]  r4:80847a4c
[    2.012847] [<803538c4>] (scan_peb.part.9) from [<80354ed4>]
(ubi_attach+0x13c/0x398)
[    2.030220]  r10:ffffffff r9:80847a4c r8:ffffffff r7:7ffff000
r6:8ead2900 r5:8ea4c000
[    2.047761]  r4:00000000
[    2.054949] [<80354d98>] (ubi_attach) from [<80349504>]
(ubi_attach_mtd_dev+0x6a8/0xd00)
[    2.072405]  r10:00000050 r9:8ea13c00 r8:000007ff r7:8ea4c000
r6:00000000 r5:8ea13c00
[    2.089965]  r4:00000990
[    2.097179] [<80348e5c>] (ubi_attach_mtd_dev) from [<807933f0>]
(ubi_init+0x214/0x2b8)
[    2.114458]  r10:00000000 r9:8ead9200 r8:807785e8 r7:8ea13c00
r6:807a9700 r5:00000000
[    2.131856]  r4:807a9704
[    2.138962] [<807931dc>] (ubi_init) from [<80008770>]
(do_one_initcall+0x94/0x1d4)
[    2.155573]  r8:807785e8 r7:8e842038 r6:807931dc r5:807b9460
r4:807b9460
[    2.167084] [<800086dc>] (do_one_initcall) from [<80778e38>]
(kernel_init_freeable+0x130/0x1d0)
[    2.185090]  r10:807a44b0 r9:807a44a8 r8:807785e8 r7:807f4b80
r6:807f4b80 r5:00000007
[    2.202418]  r4:807ad4d8
[    2.209488] [<80778d08>] (kernel_init_freeable) from [<805a1b64>]
(kernel_init+0x10/0xf0)
[    2.226946]  r10:00000000 r9:00000000 r8:00000000 r7:00000000
r6:00000000 r5:805a1b54
[    2.244265]  r4:00000000
[    2.251337] [<805a1b54>] (kernel_init) from [<8000e638>]
(ret_from_fork+0x14/0x3c)
[    2.268179]  r4:00000000 r3:8e842000
[    2.297506] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 2:2048, read only 2048 bytes, retry
[    2.318557] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 2:2048, read only 2048 bytes, retry
[    2.340077] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 2:2048, read only 2048 bytes, retry
[    2.361438] UBI error: ubi_io_read: error -74 (ECC error) while
reading 2048 bytes from PEB 2:2048, read 2048 bytes
[    2.381479] CPU: 0 PID: 1 Comm: swapper Not tainted
3.18.0-rc5-00136-g0d91606-dirty #1607

Interesting is that this error happens every second PEB (every 128 page,
but erase block size is 64) and it is always the second page. On that
device, this is completely reproduceable, e.g. I can erase everything
and flash it again, the same happens. 


I dumped the block in question:

Page 00240800 dump:
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  f7 ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff fb ff ff ff ff ff
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
....
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff f7
....

I also printed flip count and ecc_count the values for all those pages
are: flip 3, ecc_count 2

Now the interesting part: When I erase the block, and dump that page
again, it is completely empty! No flips, no ecc_count anymore! UBI
attach writes something into the first page, hence it looks like this
write into the first page influences the values of the second page... I
verified this behavior this using U-Boot and the Linux kernel.

I digged a bit deeper, and wrote just zeros into the first page. In the
second page some bits are flipped. However, writing into the second page
does not influence the third page. But a bit in the first page is
flipped. And the third page influences the forth page. It looks like the
pages behave in pairs.... Any idea what kind of issue we are facing
here?

--
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-12-10 14:56             ` Stefan Agner
@ 2014-12-11 16:44               ` Bill Pringlemeir
  2015-03-01  0:38                 ` Stefan Agner
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Pringlemeir @ 2014-12-11 16:44 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel


>>>> On 17 Sep 2014, stefan@agner.ch wrote:

>>> Yes, we are using Macronix SLC NAND.

>>>> On 17 Sep 2014, stefan@agner.ch wrote:

>>> This is a new device, but its one out of several dozens. The device
>>> had two factory marked bad page. This four page would then be 6 bad
>>> pages. I would say that your guess is probably the case at hand
>>> (should be considered bad, but were marked by factory).

On 10 Dec 2014, stefan@agner.ch wrote:

> What I currently did, is just accept strength / 2 bits. This is not a
> clean solution since it will also count the ECC bits, but it works for
> now:
> --- a/drivers/mtd/nand/fsl_nfc.c
> +++ b/drivers/mtd/nand/fsl_nfc.c
> @@ -524,7 +524,7 @@ static int nfc_correct_data(struct mtd_info *mtd,
> u_char *dat,
> flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
>
> /* ECC failed. */
> -       if (flip > ecc_count)
> +       if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2))
> return -1;
>
> /* Erased page. */

> I think we are facing multiple issues here. One might contain general
> software/hardware issues (non bit-flip related). I had this issue
> again on a different module with 3.18-rc5 (without the "fix"
> above). The kernel output looks like this:

[snip]

> Interesting is that this error happens every second PEB (every 128
> page, but erase block size is 64) and it is always the second page. On
> that device, this is completely reproduceable, e.g. I can erase
> everything and flash it again, the same happens.

> I dumped the block in question:

> Page 00240800 dump:
> ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
> ....
> ff ff ff ff ff ff ff ff  f7 ff ff ff ff ff ff ff
> ....
> ff ff ff ff ff ff ff ff  ff ff fb ff ff ff ff ff
> ....
> ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff f7
> ....

> I also printed flip count and ecc_count the values for all those pages
> are: flip 3, ecc_count 2

> Now the interesting part: When I erase the block, and dump that page
> again, it is completely empty! No flips, no ecc_count anymore! UBI
> attach writes something into the first page, hence it looks like this
> write into the first page influences the values of the second
> page... I verified this behavior this using U-Boot and the Linux
> kernel.

> I digged a bit deeper, and wrote just zeros into the first page. In
> the second page some bits are flipped. However, writing into the
> second page does not influence the third page. But a bit in the first
> page is flipped. And the third page influences the forth page. It
> looks like the pages behave in pairs.... Any idea what kind of issue
> we are facing here?

Hmm.  It sounds like MLC flash, but you say you have SLC.  It could be
that some bus signalling is marginal?  Could you reduce the clocks a bit
on this device and see if the behaviour changes?  I am pretty sure that
stuck-at-zero errors will stay that way.

I would love to get back to this controller code to fix some issues you
noted and bring in the changes to the u-boot review.  Unfortunately, I
keep getting stuck with legacy hw issues.

fwiw,
Bill.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2014-12-11 16:44               ` Bill Pringlemeir
@ 2015-03-01  0:38                 ` Stefan Agner
  2015-03-02 15:05                   ` Bill Pringlemeir
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Agner @ 2015-03-01  0:38 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel

On 2014-12-11 17:44, Bill Pringlemeir wrote:
>>>>> On 17 Sep 2014, stefan@agner.ch wrote:
> 
>>>> Yes, we are using Macronix SLC NAND.
> 
>>>>> On 17 Sep 2014, stefan@agner.ch wrote:
> 
>>>> This is a new device, but its one out of several dozens. The device
>>>> had two factory marked bad page. This four page would then be 6 bad
>>>> pages. I would say that your guess is probably the case at hand
>>>> (should be considered bad, but were marked by factory).
> 
> On 10 Dec 2014, stefan@agner.ch wrote:
> 
>> What I currently did, is just accept strength / 2 bits. This is not a
>> clean solution since it will also count the ECC bits, but it works for
>> now:
>> --- a/drivers/mtd/nand/fsl_nfc.c
>> +++ b/drivers/mtd/nand/fsl_nfc.c
>> @@ -524,7 +524,7 @@ static int nfc_correct_data(struct mtd_info *mtd,
>> u_char *dat,
>> flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
>>
>> /* ECC failed. */
>> -       if (flip > ecc_count)
>> +       if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2))
>> return -1;
>>
>> /* Erased page. */
> 
>> I think we are facing multiple issues here. One might contain general
>> software/hardware issues (non bit-flip related). I had this issue
>> again on a different module with 3.18-rc5 (without the "fix"
>> above). The kernel output looks like this:
> 
> [snip]
> 
>> Interesting is that this error happens every second PEB (every 128
>> page, but erase block size is 64) and it is always the second page. On
>> that device, this is completely reproduceable, e.g. I can erase
>> everything and flash it again, the same happens.
> 
>> I dumped the block in question:
> 
>> Page 00240800 dump:
>> ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
>> ....
>> ff ff ff ff ff ff ff ff  f7 ff ff ff ff ff ff ff
>> ....
>> ff ff ff ff ff ff ff ff  ff ff fb ff ff ff ff ff
>> ....
>> ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff f7
>> ....
> 
>> I also printed flip count and ecc_count the values for all those pages
>> are: flip 3, ecc_count 2
> 
>> Now the interesting part: When I erase the block, and dump that page
>> again, it is completely empty! No flips, no ecc_count anymore! UBI
>> attach writes something into the first page, hence it looks like this
>> write into the first page influences the values of the second
>> page... I verified this behavior this using U-Boot and the Linux
>> kernel.
> 
>> I digged a bit deeper, and wrote just zeros into the first page. In
>> the second page some bits are flipped. However, writing into the
>> second page does not influence the third page. But a bit in the first
>> page is flipped. And the third page influences the forth page. It
>> looks like the pages behave in pairs.... Any idea what kind of issue
>> we are facing here?
> 
> Hmm.  It sounds like MLC flash, but you say you have SLC.  It could be
> that some bus signalling is marginal?  Could you reduce the clocks a bit
> on this device and see if the behaviour changes?  I am pretty sure that
> stuck-at-zero errors will stay that way.
> 
> I would love to get back to this controller code to fix some issues you
> noted and bring in the changes to the u-boot review.  Unfortunately, I
> keep getting stuck with legacy hw issues.
> 
> fwiw,
> Bill.

Hi Bill,

The flash chip mentioned above requires 8-bit error correction per 512
byte block, hence I increased the ECC to the maximum available level
(60-byte ECC, see page below). One thing which is not very nice, in
order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten the
BBT pattern and set it at the very beginning of the page. This works
fine, however this basically sets the page also to factory bad, I'm not
sure if this is ok? Otherwise, we also could use a BBT pattern of length
1 (used by cafe_nand.c too).

What do you think? I would like to respin the NFC patch, with my U-Boot
changes and this change included...

--
Stefan

diff --git a/drivers/mtd/nand/fsl_nfc.c b/drivers/mtd/nand/fsl_nfc.c
index bfc7b7b..3d6da80 100644
--- a/drivers/mtd/nand/fsl_nfc.c
+++ b/drivers/mtd/nand/fsl_nfc.c
@@ -70,6 +70,7 @@
 /* NFC ECC mode define */
 #define ECC_BYPASS                     0
 #define ECC_45_BYTE                    6
+#define ECC_60_BYTE                    7
 
 /*** Register Mask and bit definitions */
 
@@ -155,15 +156,15 @@ struct fsl_nfc {
 };
 #define mtd_to_nfc(_mtd) container_of(_mtd, struct fsl_nfc, mtd)
 
-static u8 bbt_pattern[] = {'B', 'b', 't', '0' };
-static u8 mirror_pattern[] = {'1', 't', 'b', 'B' };
+static u8 bbt_pattern[] = {'B', 'b', 't' };
+static u8 mirror_pattern[] = {'t', 'b', 'B' };
 
 static struct nand_bbt_descr bbt_main_descr = {
        .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE
|
                   NAND_BBT_2BIT | NAND_BBT_VERSION,
-       .offs = 11,
-       .len = 4,
-       .veroffs = 15,
+       .offs = 0,
+       .len = 3,
+       .veroffs = 3,
        .maxblocks = 4,
        .pattern = bbt_pattern,
 };
@@ -171,9 +172,9 @@ static struct nand_bbt_descr bbt_main_descr = {
 static struct nand_bbt_descr bbt_mirror_descr = {
        .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE
|
                   NAND_BBT_2BIT | NAND_BBT_VERSION,
-       .offs = 11,
-       .len = 4,
-       .veroffs = 15,
+       .offs = 0,
+       .len = 3,
+       .veroffs = 3,
        .maxblocks = 4,
        .pattern = mirror_pattern,
 };
@@ -191,6 +192,21 @@ static struct nand_ecclayout nfc_ecc45 = {
                 .length = 11} }
 };
 
+static struct nand_ecclayout nfc_ecc60 = {
+       .eccbytes = 60,
+       .eccpos = { 4,  5,  6,  7,  8,  9, 10, 11,
+                  12, 13, 14, 15, 16, 17, 18, 19,
+                  20, 21, 22, 23, 24, 25, 26, 27,
+                  28, 29, 30, 31, 32, 33, 34, 35,
+                  36, 37, 38, 39, 40, 41, 42, 43,
+                  44, 45, 46, 47, 48, 49, 50, 51,
diff --git a/drivers/mtd/nand/fsl_nfc.c b/drivers/mtd/nand/fsl_nfc.c
index bfc7b7b..3d6da80 100644
--- a/drivers/mtd/nand/fsl_nfc.c
+++ b/drivers/mtd/nand/fsl_nfc.c
@@ -70,6 +70,7 @@
 /* NFC ECC mode define */
 #define ECC_BYPASS                     0
 #define ECC_45_BYTE                    6
+#define ECC_60_BYTE                    7
 
 /*** Register Mask and bit definitions */
 
@@ -155,15 +156,15 @@ struct fsl_nfc {
 };
 #define mtd_to_nfc(_mtd) container_of(_mtd, struct fsl_nfc, mtd)
 
-static u8 bbt_pattern[] = {'B', 'b', 't', '0' };
-static u8 mirror_pattern[] = {'1', 't', 'b', 'B' };
+static u8 bbt_pattern[] = {'B', 'b', 't' };
+static u8 mirror_pattern[] = {'t', 'b', 'B' };
 
 static struct nand_bbt_descr bbt_main_descr = {
        .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE
|
                   NAND_BBT_2BIT | NAND_BBT_VERSION,
-       .offs = 11,
-       .len = 4,
-       .veroffs = 15,
+       .offs = 0,
+       .len = 3,
+       .veroffs = 3,
        .maxblocks = 4,
        .pattern = bbt_pattern,
 };
@@ -171,9 +172,9 @@ static struct nand_bbt_descr bbt_main_descr = {
 static struct nand_bbt_descr bbt_mirror_descr = {
        .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE
|
                   NAND_BBT_2BIT | NAND_BBT_VERSION,
-       .offs = 11,
-       .len = 4,
-       .veroffs = 15,
+       .offs = 0,
+       .len = 3,
+       .veroffs = 3,
        .maxblocks = 4,
        .pattern = mirror_pattern,
 };
@@ -191,6 +192,21 @@ static struct nand_ecclayout nfc_ecc45 = {
                 .length = 11} }
 };
 
+static struct nand_ecclayout nfc_ecc60 = {
+       .eccbytes = 60,
+       .eccpos = { 4,  5,  6,  7,  8,  9, 10, 11,
+                  12, 13, 14, 15, 16, 17, 18, 19,
+                  20, 21, 22, 23, 24, 25, 26, 27,
+                  28, 29, 30, 31, 32, 33, 34, 35,
+                  36, 37, 38, 39, 40, 41, 42, 43,
+                  44, 45, 46, 47, 48, 49, 50, 51,
+                  52, 53, 54, 55, 56, 57, 58, 59,
+                  60, 61, 62, 63 },
+       .oobfree = {
+               {.offset = 2,
+                .length = 2} }
+};
+
 static u32 nfc_read(struct mtd_info *mtd, uint reg)
 {
        struct fsl_nfc *nfc = mtd_to_nfc(mtd);
@@ -608,10 +624,10 @@ static int nfc_init_controller(struct mtd_info
*mtd, struct nfc_config *cfg, int
        nfc_write(mtd, NFC_SECTOR_SIZE, page_sz);
 
        if (cfg->hardware_ecc) {
-               /* set ECC mode to 45 bytes OOB with 24 bits correction
*/
+               /* set ECC mode to 60 bytes OOB with 32 bits correction
*/
                nfc_set_field(mtd, NFC_FLASH_CONFIG,
                                CONFIG_ECC_MODE_MASK,
-                               CONFIG_ECC_MODE_SHIFT, ECC_45_BYTE);
+                               CONFIG_ECC_MODE_SHIFT, ECC_60_BYTE);
 
                /* Enable ECC_STATUS */
                nfc_set(mtd, NFC_FLASH_CONFIG, CONFIG_ECC_SRAM_REQ_BIT);
@@ -742,7 +758,7 @@ static int nfc_probe(struct platform_device *pdev)
                        goto error;
                }
 
-               chip->ecc.layout = &nfc_ecc45;
+               chip->ecc.layout = &nfc_ecc60;
 
                /* propagate ecc.layout to mtd_info */
                mtd->ecclayout = chip->ecc.layout;
@@ -751,14 +767,14 @@ static int nfc_probe(struct platform_device *pdev)
                chip->ecc.correct = nfc_correct_data;
                chip->ecc.mode = NAND_ECC_HW;
 
-               chip->ecc.bytes = 45;
+               chip->ecc.bytes = 60;
                chip->ecc.size = PAGE_2K;
-               chip->ecc.strength = 24;
+               chip->ecc.strength = 32;
 
-               /* set ECC mode to 45 bytes OOB with 24 bits correction
*/
+               /* set ECC mode to 60 bytes OOB with 32 bits correction
*/
                nfc_set_field(mtd, NFC_FLASH_CONFIG,
                                CONFIG_ECC_MODE_MASK,
-                               CONFIG_ECC_MODE_SHIFT, ECC_45_BYTE);
+                               CONFIG_ECC_MODE_SHIFT, ECC_60_BYTE);
        }
 
        /* second phase scan */

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2015-03-01  0:38                 ` Stefan Agner
@ 2015-03-02 15:05                   ` Bill Pringlemeir
  2015-03-02 21:39                     ` Aaron Brice
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Pringlemeir @ 2015-03-02 15:05 UTC (permalink / raw)
  To: Stefan Agner; +Cc: b21989, linux-mtd, Jason.jin, linux-arm-kernel


On 28 Feb 2015, stefan@agner.ch wrote:

> The flash chip mentioned above requires 8-bit error correction per 512
> byte block, hence I increased the ECC to the maximum available level
> (60-byte ECC, see page below). One thing which is not very nice, in
> order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten
> the BBT pattern and set it at the very beginning of the page. This
> works fine, however this basically sets the page also to factory bad,
> I'm not sure if this is ok? Otherwise, we also could use a BBT pattern
> of length 1 (used by cafe_nand.c too).

I guess that is a DT option?  I wouldn't be an expert on this.  So
submitting it to the linux-mtd is good.  

I am also not sure if the HW ECC will work with 'sub-pages'.  I think a
college of your at Toradex submitted a patch to the u-boot.  I am pretty
sure that it could work with software ECC, but maybe disabling it is
easiest.

> What do you think? I would like to respin the NFC patch, with my
> U-Boot changes and this change included...

Please go ahead.  Markus M <marb@ixxat.de> is also using the fsl_nfc
driver on a Freescale MPC5125 board, so it is probably good to copy your
patches to him.  At least, he can test on a BE platform.

People also complained about JFFS and this version of the driver.  I
didn't investigate that.

Thanks,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2015-03-02 15:05                   ` Bill Pringlemeir
@ 2015-03-02 21:39                     ` Aaron Brice
  2015-03-02 21:44                       ` Stefan Agner
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Brice @ 2015-03-02 21:39 UTC (permalink / raw)
  To: Bill Pringlemeir, Stefan Agner
  Cc: linux-arm-kernel, linux-mtd, Jason.jin, b21989

On 03/02/2015 08:05 AM, Bill Pringlemeir wrote:
> On 28 Feb 2015, stefan@agner.ch wrote:
>
>> The flash chip mentioned above requires 8-bit error correction per 512
>> byte block, hence I increased the ECC to the maximum available level
>> (60-byte ECC, see page below). One thing which is not very nice, in
>> order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten
>> the BBT pattern and set it at the very beginning of the page. This
>> works fine, however this basically sets the page also to factory bad,
>> I'm not sure if this is ok? Otherwise, we also could use a BBT pattern
>> of length 1 (used by cafe_nand.c too).
> I guess that is a DT option?  I wouldn't be an expert on this.  So
> submitting it to the linux-mtd is good.
>
> I am also not sure if the HW ECC will work with 'sub-pages'.  I think a
> college of your at Toradex submitted a patch to the u-boot.  I am pretty
> sure that it could work with software ECC, but maybe disabling it is
> easiest.
>
>> What do you think? I would like to respin the NFC patch, with my
>> U-Boot changes and this change included...
> Please go ahead.  Markus M <marb@ixxat.de> is also using the fsl_nfc
> driver on a Freescale MPC5125 board, so it is probably good to copy your
> patches to him.  At least, he can test on a BE platform.
>
> People also complained about JFFS and this version of the driver.  I
> didn't investigate that.

I also noticed a problem with JFFS2 and the driver, where fs changes 
were lost after reboot.  Didn't investigate, switched to UBIFS..

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.
  2015-03-02 21:39                     ` Aaron Brice
@ 2015-03-02 21:44                       ` Stefan Agner
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Agner @ 2015-03-02 21:44 UTC (permalink / raw)
  To: Aaron Brice
  Cc: Jason.jin, linux-mtd, Bill Pringlemeir, b21989, linux-arm-kernel

On 2015-03-02 22:39, Aaron Brice wrote:
> On 03/02/2015 08:05 AM, Bill Pringlemeir wrote:
>> On 28 Feb 2015, stefan@agner.ch wrote:
>>
>>> The flash chip mentioned above requires 8-bit error correction per 512
>>> byte block, hence I increased the ECC to the maximum available level
>>> (60-byte ECC, see page below). One thing which is not very nice, in
>>> order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten
>>> the BBT pattern and set it at the very beginning of the page. This
>>> works fine, however this basically sets the page also to factory bad,
>>> I'm not sure if this is ok? Otherwise, we also could use a BBT pattern
>>> of length 1 (used by cafe_nand.c too).
>> I guess that is a DT option?  I wouldn't be an expert on this.  So
>> submitting it to the linux-mtd is good.
>>
>> I am also not sure if the HW ECC will work with 'sub-pages'.  I think a
>> college of your at Toradex submitted a patch to the u-boot.  I am pretty
>> sure that it could work with software ECC, but maybe disabling it is
>> easiest.
>>
>>> What do you think? I would like to respin the NFC patch, with my
>>> U-Boot changes and this change included...
>> Please go ahead.  Markus M <marb@ixxat.de> is also using the fsl_nfc
>> driver on a Freescale MPC5125 board, so it is probably good to copy your
>> patches to him.  At least, he can test on a BE platform.
>>
>> People also complained about JFFS and this version of the driver.  I
>> didn't investigate that.
> 
> I also noticed a problem with JFFS2 and the driver, where fs changes
> were lost after reboot.  Didn't investigate, switched to UBIFS..

Did you happen to use HW ECC? The controller seems to use byte 19
onwards to store the ECC bytes, where also JFFS2 stores it's meta data
when using a NAND chip with 64-byte OOB (see
http://www.linux-mtd.infradead.org/doc/nand.html). However, I think
UBI/UBIFS is a good choice anyway.

--
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-03-02 21:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-21 17:01 VF610+ColdFireM54418 controller Bill Pringlemeir
2013-11-21 21:52 ` Bill Pringlemeir
     [not found] ` <1389222441-4322-1-git-send-email-bpringlemeir@nbsps.com>
     [not found]   ` <1389222441-4322-2-git-send-email-bpringlemeir@nbsps.com>
2014-04-28 14:41     ` [RFC 1/5] mtd:fsl_nfc: Nand flash controller for VF610, MPC5125, etc Stefan Agner
2014-04-28 16:51       ` Bill Pringlemeir
2014-04-29  7:50         ` Stefan Agner
2014-04-29 16:36       ` Bill Pringlemeir
     [not found]   ` <1389222441-4322-3-git-send-email-bpringlemeir@nbsps.com>
2014-09-17 17:02     ` [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections Stefan Agner
2014-09-17 18:06       ` Bill Pringlemeir
2014-09-17 20:08         ` Stefan Agner
2014-09-17 22:21           ` Bill Pringlemeir
2014-12-10 14:56             ` Stefan Agner
2014-12-11 16:44               ` Bill Pringlemeir
2015-03-01  0:38                 ` Stefan Agner
2015-03-02 15:05                   ` Bill Pringlemeir
2015-03-02 21:39                     ` Aaron Brice
2015-03-02 21:44                       ` Stefan Agner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).