All of lore.kernel.org
 help / color / mirror / Atom feed
* imx27: No space left to write bad block table
@ 2021-04-17 15:59 Fabio Estevam
  2021-04-19  6:37 ` Miquel Raynal
  0 siblings, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2021-04-17 15:59 UTC (permalink / raw)
  To: Miquel Raynal, Sascha Hauer, linux-mtd

Hi,

I noticed this error recently on a imx27-phytec-phycard-s-rdk reported
on kernelci:

nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
nand: ST Micro NAND01GR3B2CZA6
nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
Bad block table not found for chip 0
Bad block table not found for chip 0
Scanning device for bad blocks
random: fast init done
Bad eraseblock 329 at 0x000002920000
Bad eraseblock 330 at 0x000002940000
Bad eraseblock 331 at 0x000002960000
Bad eraseblock 332 at 0x000002980000
Bad eraseblock 333 at 0x0000029a0000
Bad eraseblock 334 at 0x0000029c0000
Bad eraseblock 335 at 0x0000029e0000
Bad eraseblock 336 at 0x000002a00000
Bad eraseblock 337 at 0x000002a20000
Bad eraseblock 338 at 0x000002a40000
Bad eraseblock 339 at 0x000002a60000
Bad eraseblock 340 at 0x000002a80000
Bad eraseblock 341 at 0x000002aa0000
Bad eraseblock 342 at 0x000002ac0000
Bad eraseblock 343 at 0x000002ae0000
Bad eraseblock 344 at 0x000002b00000
Bad eraseblock 345 at 0x000002b20000
Bad eraseblock 1020 at 0x000007f80000
Bad eraseblock 1021 at 0x000007fa0000
Bad eraseblock 1022 at 0x000007fc0000
Bad eraseblock 1023 at 0x000007fe0000
No space left to write bad block table
nand_bbt: error while writing bad block table -28
mxc_nand: probe of d8000000.nand-controller failed with error -28

Full log:
https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

I don't have access to this board but just wanted to report it.

Regards,

Fabio Estevam

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-17 15:59 imx27: No space left to write bad block table Fabio Estevam
@ 2021-04-19  6:37 ` Miquel Raynal
  2021-04-19 11:47   ` Fabio Estevam
  0 siblings, 1 reply; 20+ messages in thread
From: Miquel Raynal @ 2021-04-19  6:37 UTC (permalink / raw)
  To: Fabio Estevam; +Cc: Sascha Hauer, linux-mtd

Hi Fabio,

Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22
-0300:

> Hi,
> 
> I noticed this error recently on a imx27-phytec-phycard-s-rdk reported
> on kernelci:
> 
> nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> nand: ST Micro NAND01GR3B2CZA6
> nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> Bad block table not found for chip 0
> Bad block table not found for chip 0
> Scanning device for bad blocks
> random: fast init done
> Bad eraseblock 329 at 0x000002920000
> Bad eraseblock 330 at 0x000002940000
> Bad eraseblock 331 at 0x000002960000
> Bad eraseblock 332 at 0x000002980000
> Bad eraseblock 333 at 0x0000029a0000
> Bad eraseblock 334 at 0x0000029c0000
> Bad eraseblock 335 at 0x0000029e0000
> Bad eraseblock 336 at 0x000002a00000
> Bad eraseblock 337 at 0x000002a20000
> Bad eraseblock 338 at 0x000002a40000
> Bad eraseblock 339 at 0x000002a60000
> Bad eraseblock 340 at 0x000002a80000
> Bad eraseblock 341 at 0x000002aa0000
> Bad eraseblock 342 at 0x000002ac0000
> Bad eraseblock 343 at 0x000002ae0000
> Bad eraseblock 344 at 0x000002b00000
> Bad eraseblock 345 at 0x000002b20000
> Bad eraseblock 1020 at 0x000007f80000
> Bad eraseblock 1021 at 0x000007fa0000
> Bad eraseblock 1022 at 0x000007fc0000
> Bad eraseblock 1023 at 0x000007fe0000
> No space left to write bad block table
> nand_bbt: error while writing bad block table -28
> mxc_nand: probe of d8000000.nand-controller failed with error -28
> 
> Full log:
> https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> 
> I don't have access to this board but just wanted to report it.

Thanks for the report!

Indeed that's a misbehavior, this happens when *something* is not
happening correctly and the board boots over and over, each time
decrementing the block supposed to contain the BBT until there are none
available anymore. However I'm not sure this has been caused by a
recent issue as there have not been major changes in the core nor in
this driver since your last fix. Maybe this is a leftover of the
previous situation. Would this be possible? Do you have a mean to find
out the day/kernel version which started failing?

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19  6:37 ` Miquel Raynal
@ 2021-04-19 11:47   ` Fabio Estevam
  2021-04-19 12:27     ` Miquel Raynal
  0 siblings, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2021-04-19 11:47 UTC (permalink / raw)
  To: Miquel Raynal, Guillaume Tucker; +Cc: Sascha Hauer, linux-mtd

Hi Miquel,

On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Fabio,
>
> Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22
> -0300:
>
> > Hi,
> >
> > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported
> > on kernelci:
> >
> > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > nand: ST Micro NAND01GR3B2CZA6
> > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> > Bad block table not found for chip 0
> > Bad block table not found for chip 0
> > Scanning device for bad blocks
> > random: fast init done
> > Bad eraseblock 329 at 0x000002920000
> > Bad eraseblock 330 at 0x000002940000
> > Bad eraseblock 331 at 0x000002960000
> > Bad eraseblock 332 at 0x000002980000
> > Bad eraseblock 333 at 0x0000029a0000
> > Bad eraseblock 334 at 0x0000029c0000
> > Bad eraseblock 335 at 0x0000029e0000
> > Bad eraseblock 336 at 0x000002a00000
> > Bad eraseblock 337 at 0x000002a20000
> > Bad eraseblock 338 at 0x000002a40000
> > Bad eraseblock 339 at 0x000002a60000
> > Bad eraseblock 340 at 0x000002a80000
> > Bad eraseblock 341 at 0x000002aa0000
> > Bad eraseblock 342 at 0x000002ac0000
> > Bad eraseblock 343 at 0x000002ae0000
> > Bad eraseblock 344 at 0x000002b00000
> > Bad eraseblock 345 at 0x000002b20000
> > Bad eraseblock 1020 at 0x000007f80000
> > Bad eraseblock 1021 at 0x000007fa0000
> > Bad eraseblock 1022 at 0x000007fc0000
> > Bad eraseblock 1023 at 0x000007fe0000
> > No space left to write bad block table
> > nand_bbt: error while writing bad block table -28
> > mxc_nand: probe of d8000000.nand-controller failed with error -28
> >
> > Full log:
> > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> >
> > I don't have access to this board but just wanted to report it.
>
> Thanks for the report!
>
> Indeed that's a misbehavior, this happens when *something* is not
> happening correctly and the board boots over and over, each time
> decrementing the block supposed to contain the BBT until there are none
> available anymore. However I'm not sure this has been caused by a
> recent issue as there have not been major changes in the core nor in
> this driver since your last fix. Maybe this is a leftover of the
> previous situation. Would this be possible? Do you have a mean to find
> out the day/kernel version which started failing?

I know it does not happen on master, only on linux-next.

The oldest linux-next log I see listed for the
imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401,
which is also affected.

Adding Guillaume in case kernelci could help to find the commit that
causes the "No space left to write bad block table" message to appear.

Thanks

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 11:47   ` Fabio Estevam
@ 2021-04-19 12:27     ` Miquel Raynal
  2021-04-19 12:41       ` Fabio Estevam
  2021-04-19 13:04       ` Stefan Riedmüller
  0 siblings, 2 replies; 20+ messages in thread
From: Miquel Raynal @ 2021-04-19 12:27 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

Hi Fabio, Guillaume,

+Stephan

Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 08:47:56
-0300:

> Hi Miquel,
> 
> On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Fabio,
> >
> > Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22
> > -0300:
> >  
> > > Hi,
> > >
> > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported
> > > on kernelci:
> > >
> > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > > nand: ST Micro NAND01GR3B2CZA6
> > > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> > > Bad block table not found for chip 0
> > > Bad block table not found for chip 0
> > > Scanning device for bad blocks
> > > random: fast init done
> > > Bad eraseblock 329 at 0x000002920000
> > > Bad eraseblock 330 at 0x000002940000
> > > Bad eraseblock 331 at 0x000002960000
> > > Bad eraseblock 332 at 0x000002980000
> > > Bad eraseblock 333 at 0x0000029a0000
> > > Bad eraseblock 334 at 0x0000029c0000
> > > Bad eraseblock 335 at 0x0000029e0000
> > > Bad eraseblock 336 at 0x000002a00000
> > > Bad eraseblock 337 at 0x000002a20000
> > > Bad eraseblock 338 at 0x000002a40000
> > > Bad eraseblock 339 at 0x000002a60000
> > > Bad eraseblock 340 at 0x000002a80000
> > > Bad eraseblock 341 at 0x000002aa0000
> > > Bad eraseblock 342 at 0x000002ac0000
> > > Bad eraseblock 343 at 0x000002ae0000
> > > Bad eraseblock 344 at 0x000002b00000
> > > Bad eraseblock 345 at 0x000002b20000
> > > Bad eraseblock 1020 at 0x000007f80000
> > > Bad eraseblock 1021 at 0x000007fa0000
> > > Bad eraseblock 1022 at 0x000007fc0000
> > > Bad eraseblock 1023 at 0x000007fe0000
> > > No space left to write bad block table
> > > nand_bbt: error while writing bad block table -28
> > > mxc_nand: probe of d8000000.nand-controller failed with error -28
> > >
> > > Full log:
> > > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> > >
> > > I don't have access to this board but just wanted to report it.  
> >
> > Thanks for the report!
> >
> > Indeed that's a misbehavior, this happens when *something* is not
> > happening correctly and the board boots over and over, each time
> > decrementing the block supposed to contain the BBT until there are none
> > available anymore. However I'm not sure this has been caused by a
> > recent issue as there have not been major changes in the core nor in
> > this driver since your last fix. Maybe this is a leftover of the
> > previous situation. Would this be possible? Do you have a mean to find
> > out the day/kernel version which started failing?  
> 
> I know it does not happen on master, only on linux-next.
> 
> The oldest linux-next log I see listed for the
> imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401,
> which is also affected.
> 
> Adding Guillaume in case kernelci could help to find the commit that
> causes the "No space left to write bad block table" message to appear.

Interesting. Maybe I overlooked the below commit when applying. Indeed,
BBT may be considered as bad blocks, so I wonder if the below change is
valid now...

Guillaume, would you have a way to revert this patch on top of
linux-next? Stefan, would you mind giving more details on the testing
procedure?

---8<---

commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
Author: Stefan Riedmueller <s.riedmueller@phytec.de>
Date:   Thu Mar 25 11:23:37 2021 +0100

    mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND
    
    The blocks containing the bad block table can become bad as well. So
    make sure to skip any blocks that are marked bad when searching for the
    bad block table.
    
    Otherwise in very rare cases where two BBT blocks wear out it might
    happen that an obsolete BBT is used instead of a newer available
    version.
    
    Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
    Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Link: https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de

diff --git a/drivers/mtd/nand/raw/nand_bbt.c b/drivers/mtd/nand/raw/nand_bbt.c
index dced32a126d9..6e25a5ce5ba9 100644
--- a/drivers/mtd/nand/raw/nand_bbt.c
+++ b/drivers/mtd/nand/raw/nand_bbt.c
@@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t *buf,
 {
        u64 targetsize = nanddev_target_size(&this->base);
        struct mtd_info *mtd = nand_to_mtd(this);
+       struct nand_bbt_descr *bd = this->badblock_pattern;
        int i, chips;
        int startblock, block, dir;
        int scanlen = mtd->writesize + mtd->oobsize;
@@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t *buf,
                        int actblock = startblock + dir * block;
                        loff_t offs = (loff_t)actblock << this->bbt_erase_shift;
 
+                       /* Check if block is marked bad */
+                       if (scan_block_fast(this, bd, offs, buf))
+                               continue;
+
                        /* Read first page */
                        scan_read(this, buf, offs, mtd->writesize, td);
                        if (!check_pattern(buf, scanlen, mtd->writesize, td)) {


Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 12:27     ` Miquel Raynal
@ 2021-04-19 12:41       ` Fabio Estevam
  2021-04-19 12:48         ` Fabio Estevam
  2021-04-19 13:04       ` Stefan Riedmüller
  1 sibling, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2021-04-19 12:41 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

Hi Miquel,

On Mon, Apr 19, 2021 at 9:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> Date:   Thu Mar 25 11:23:37 2021 +0100
>
>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND
>
>     The blocks containing the bad block table can become bad as well. So
>     make sure to skip any blocks that are marked bad when searching for the
>     bad block table.
>
>     Otherwise in very rare cases where two BBT blocks wear out it might
>     happen that an obsolete BBT is used instead of a newer available
>     version.
>
>     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
>     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>     Link: https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de

This commit landed in linux-next 20210329. I was able to find the
kernelci log for this version and NAND is correctly probed:
https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

The first NAND error starts with 20210330:
https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

Regards,

Fabio Estevam

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 12:41       ` Fabio Estevam
@ 2021-04-19 12:48         ` Fabio Estevam
  2021-04-19 13:01           ` Fabio Estevam
  2021-04-19 13:40           ` Miquel Raynal
  0 siblings, 2 replies; 20+ messages in thread
From: Fabio Estevam @ 2021-04-19 12:48 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote:

> This commit landed in linux-next 20210329. I was able to find the
> kernelci log for this version and NAND is correctly probed:
> https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
>
> The first NAND error starts with 20210330:
> https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

linux-next 20210329 introduced the following logs that were not
present previously:

Bad block table written to 0x000007fa0000, version 0x01
Bad block table written to 0x000007f80000, version 0x01

Maybe this new 'two Bad block tables' will confuse the subsequent boots?

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 12:48         ` Fabio Estevam
@ 2021-04-19 13:01           ` Fabio Estevam
  2021-04-19 13:40           ` Miquel Raynal
  1 sibling, 0 replies; 20+ messages in thread
From: Fabio Estevam @ 2021-04-19 13:01 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

On Mon, Apr 19, 2021 at 9:48 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> > This commit landed in linux-next 20210329. I was able to find the
> > kernelci log for this version and NAND is correctly probed:
> > https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> >
> > The first NAND error starts with 20210330:
> > https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
>
> linux-next 20210329 introduced the following logs that were not
> present previously:
>
> Bad block table written to 0x000007fa0000, version 0x01
> Bad block table written to 0x000007f80000, version 0x01
>
> Maybe this new 'two Bad block tables' will confuse the subsequent boots?

Also, prior to linux-next 20210329 the Bad Block table could be
correctly located:
https://storage.kernelci.org/next/master/next-20210324/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

Bad block table found at page 65472, version 0x01
Bad block table found at page 65408, version 0x01

which matches the Bad block table reported by Barebox.

However, in linux-next 20210329 the bad block table cannot be found anymore:

Bad block table not found for chip 0
Bad block table not found for chip 0

So in fact there is a regression starting with linux-next 20210329.

Could it be caused by bd9c9fe2ad04 ("mtd: rawnand: bbt: Skip bad
blocks when searching for the BBT in NAND")?

Thanks

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 12:27     ` Miquel Raynal
  2021-04-19 12:41       ` Fabio Estevam
@ 2021-04-19 13:04       ` Stefan Riedmüller
  2021-04-19 15:36         ` Miquel Raynal
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Riedmüller @ 2021-04-19 13:04 UTC (permalink / raw)
  To: festevam, miquel.raynal; +Cc: guillaume.tucker, kernel, linux-mtd

Hi Miquel, Fabio,

On Mon, 2021-04-19 at 14:27 +0200, Miquel Raynal wrote:
> Hi Fabio, Guillaume,
> 
> +Stephan
> 
> Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 08:47:56
> -0300:
> 
> > Hi Miquel,
> > 
> > On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com>
> > wrote:
> > > Hi Fabio,
> > > 
> > > Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22
> > > -0300:
> > >  
> > > > Hi,
> > > > 
> > > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported
> > > > on kernelci:
> > > > 
> > > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > > > nand: ST Micro NAND01GR3B2CZA6
> > > > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> > > > Bad block table not found for chip 0
> > > > Bad block table not found for chip 0
> > > > Scanning device for bad blocks
> > > > random: fast init done
> > > > Bad eraseblock 329 at 0x000002920000
> > > > Bad eraseblock 330 at 0x000002940000
> > > > Bad eraseblock 331 at 0x000002960000
> > > > Bad eraseblock 332 at 0x000002980000
> > > > Bad eraseblock 333 at 0x0000029a0000
> > > > Bad eraseblock 334 at 0x0000029c0000
> > > > Bad eraseblock 335 at 0x0000029e0000
> > > > Bad eraseblock 336 at 0x000002a00000
> > > > Bad eraseblock 337 at 0x000002a20000
> > > > Bad eraseblock 338 at 0x000002a40000
> > > > Bad eraseblock 339 at 0x000002a60000
> > > > Bad eraseblock 340 at 0x000002a80000
> > > > Bad eraseblock 341 at 0x000002aa0000
> > > > Bad eraseblock 342 at 0x000002ac0000
> > > > Bad eraseblock 343 at 0x000002ae0000
> > > > Bad eraseblock 344 at 0x000002b00000
> > > > Bad eraseblock 345 at 0x000002b20000
> > > > Bad eraseblock 1020 at 0x000007f80000
> > > > Bad eraseblock 1021 at 0x000007fa0000
> > > > Bad eraseblock 1022 at 0x000007fc0000
> > > > Bad eraseblock 1023 at 0x000007fe0000
> > > > No space left to write bad block table
> > > > nand_bbt: error while writing bad block table -28
> > > > mxc_nand: probe of d8000000.nand-controller failed with error -28
> > > > 
> > > > Full log:
> > > > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> > > > 
> > > > I don't have access to this board but just wanted to report it.  
> > > 
> > > Thanks for the report!
> > > 
> > > Indeed that's a misbehavior, this happens when *something* is not
> > > happening correctly and the board boots over and over, each time
> > > decrementing the block supposed to contain the BBT until there are none
> > > available anymore. However I'm not sure this has been caused by a
> > > recent issue as there have not been major changes in the core nor in
> > > this driver since your last fix. Maybe this is a leftover of the
> > > previous situation. Would this be possible? Do you have a mean to find
> > > out the day/kernel version which started failing?  
> > 
> > I know it does not happen on master, only on linux-next.
> > 
> > The oldest linux-next log I see listed for the
> > imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401,
> > which is also affected.
> > 
> > Adding Guillaume in case kernelci could help to find the commit that
> > causes the "No space left to write bad block table" message to appear.
> 
> Interesting. Maybe I overlooked the below commit when applying. Indeed,
> BBT may be considered as bad blocks, so I wonder if the below change is
> valid now...
> 
> Guillaume, would you have a way to revert this patch on top of
> linux-next? Stefan, would you mind giving more details on the testing
> procedure?

I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
returning -EIO in nand_erase_nand when the block to be erased is one of the
first two BBT blocks.

I have seen this once on a customer board but were not able to reproduce it
anymore, thus the simulation of the two bad blocks.

Without the patch below new versions of the BBT can no longer be written to
the first two blocks reserved for the BBT but they are still evaluated to read
the BBT from during boot due the lack of a test if these blocks are bad. So
changes to the BBT after these two blocks turn bad are only kept and used
until the next reboot where again the old version of the two worn blocks is
used as a basis.

I tried to use the same mechanism that is used to identify bad blocks during a
scan for bad blocks. But maybe I missed something there? Or were my
assumptions wrong in the first place?

Regards,
Stefan

> 
> ---8<---
> 
> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> Date:   Thu Mar 25 11:23:37 2021 +0100
> 
>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND
>     
>     The blocks containing the bad block table can become bad as well. So
>     make sure to skip any blocks that are marked bad when searching for the
>     bad block table.
>     
>     Otherwise in very rare cases where two BBT blocks wear out it might
>     happen that an obsolete BBT is used instead of a newer available
>     version.
>     
>     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
>     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>     Link: 
> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> 
> diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> b/drivers/mtd/nand/raw/nand_bbt.c
> index dced32a126d9..6e25a5ce5ba9 100644
> --- a/drivers/mtd/nand/raw/nand_bbt.c
> +++ b/drivers/mtd/nand/raw/nand_bbt.c
> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t
> *buf,
>  {
>         u64 targetsize = nanddev_target_size(&this->base);
>         struct mtd_info *mtd = nand_to_mtd(this);
> +       struct nand_bbt_descr *bd = this->badblock_pattern;
>         int i, chips;
>         int startblock, block, dir;
>         int scanlen = mtd->writesize + mtd->oobsize;
> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t
> *buf,
>                         int actblock = startblock + dir * block;
>                         loff_t offs = (loff_t)actblock << this-
> >bbt_erase_shift;
>  
> +                       /* Check if block is marked bad */
> +                       if (scan_block_fast(this, bd, offs, buf))
> +                               continue;
> +
>                         /* Read first page */
>                         scan_read(this, buf, offs, mtd->writesize, td);
>                         if (!check_pattern(buf, scanlen, mtd->writesize,
> td)) {
> 
> 
> Thanks,
> Miquèl
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 12:48         ` Fabio Estevam
  2021-04-19 13:01           ` Fabio Estevam
@ 2021-04-19 13:40           ` Miquel Raynal
  2021-04-19 13:56             ` Fabio Estevam
  1 sibling, 1 reply; 20+ messages in thread
From: Miquel Raynal @ 2021-04-19 13:40 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

Hi Fabio,

Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 09:48:20
-0300:

> On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote:
> 
> > This commit landed in linux-next 20210329. I was able to find the
> > kernelci log for this version and NAND is correctly probed:
> > https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html
> >
> > The first NAND error starts with 20210330:
> > https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html  
> 
> linux-next 20210329 introduced the following logs that were not
> present previously:
> 
> Bad block table written to 0x000007fa0000, version 0x01
> Bad block table written to 0x000007f80000, version 0x01
> 
> Maybe this new 'two Bad block tables' will confuse the subsequent boots?

I am pretty sure now the commit I pointed earlier today is the root
cause (but I don't know why, yet). Somehow it skips the bad block table
which is more or less declared like a bad block from a 'low level'
point of view (so that the user cannot erase/overwrite it). Here,
the kernel does not find the valid BBT. It then creates a new couple of
BBT. But doing so at the next boot, the recently created BBT won't be
detected anymore... until there are no more free blocks reserved for
that and that's where the probe fails.

So yes, the NAND controller driver probes correctly with next-20210329
but in fact the real bad block table is not found and this is the root
cause.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 13:40           ` Miquel Raynal
@ 2021-04-19 13:56             ` Fabio Estevam
  0 siblings, 0 replies; 20+ messages in thread
From: Fabio Estevam @ 2021-04-19 13:56 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller

Hi Miquel,

On Mon, Apr 19, 2021 at 10:40 AM Miquel Raynal
<miquel.raynal@bootlin.com> wrote:

> I am pretty sure now the commit I pointed earlier today is the root
> cause (but I don't know why, yet). Somehow it skips the bad block table
> which is more or less declared like a bad block from a 'low level'
> point of view (so that the user cannot erase/overwrite it). Here,
> the kernel does not find the valid BBT. It then creates a new couple of
> BBT. But doing so at the next boot, the recently created BBT won't be
> detected anymore... until there are no more free blocks reserved for
> that and that's where the probe fails.
>
> So yes, the NAND controller driver probes correctly with next-20210329
> but in fact the real bad block table is not found and this is the root
> cause.

Ok, good. I will submit a revert then.

Thanks

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 13:04       ` Stefan Riedmüller
@ 2021-04-19 15:36         ` Miquel Raynal
  2021-04-20  6:26           ` Stefan Riedmüller
  2021-04-26 15:53           ` Stefan Riedmüller
  0 siblings, 2 replies; 20+ messages in thread
From: Miquel Raynal @ 2021-04-19 15:36 UTC (permalink / raw)
  To: Stefan Riedmüller; +Cc: festevam, guillaume.tucker, kernel, linux-mtd

Hi Stefan,

> > Interesting. Maybe I overlooked the below commit when applying. Indeed,
> > BBT may be considered as bad blocks, so I wonder if the below change is
> > valid now...
> > 
> > Guillaume, would you have a way to revert this patch on top of
> > linux-next? Stefan, would you mind giving more details on the testing
> > procedure?  
> 
> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> returning -EIO in nand_erase_nand when the block to be erased is one of the
> first two BBT blocks.
> 
> I have seen this once on a customer board but were not able to reproduce it
> anymore, thus the simulation of the two bad blocks.
> 
> Without the patch below new versions of the BBT can no longer be written to
> the first two blocks reserved for the BBT but they are still evaluated to read
> the BBT from during boot due the lack of a test if these blocks are bad. So
> changes to the BBT after these two blocks turn bad are only kept and used
> until the next reboot where again the old version of the two worn blocks is
> used as a basis.
> 
> I tried to use the same mechanism that is used to identify bad blocks during a
> scan for bad blocks. But maybe I missed something there? Or were my
> assumptions wrong in the first place?

Honestly I don't know what is wrong exactly in this patch.

We will revert the commit as it clearly breaks something fundamental
and the merge window is too close to adopt a hackish attitude.

I would propose the following tests with your board:
- Hack the core to allow yourself to access bad blocks from userspace
  for testing purposes.
- With the below commit, you should have the same behavior than
  reported by Fabio.
- Revert the commit.
- Manually change the bad block markers (nanddump, flash_erase,
  nandwrite) to declare the two tables bad. Reboot and observe if there
  are any issues. You can try to work from there.

> > ---8<---
> > 
> > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> > Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> > Date:   Thu Mar 25 11:23:37 2021 +0100
> > 
> >     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND
> >     
> >     The blocks containing the bad block table can become bad as well. So
> >     make sure to skip any blocks that are marked bad when searching for the
> >     bad block table.
> >     
> >     Otherwise in very rare cases where two BBT blocks wear out it might
> >     happen that an obsolete BBT is used instead of a newer available
> >     version.
> >     
> >     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> >     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >     Link: 
> > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> > b/drivers/mtd/nand/raw/nand_bbt.c
> > index dced32a126d9..6e25a5ce5ba9 100644
> > --- a/drivers/mtd/nand/raw/nand_bbt.c
> > +++ b/drivers/mtd/nand/raw/nand_bbt.c
> > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t
> > *buf,
> >  {
> >         u64 targetsize = nanddev_target_size(&this->base);
> >         struct mtd_info *mtd = nand_to_mtd(this);
> > +       struct nand_bbt_descr *bd = this->badblock_pattern;
> >         int i, chips;
> >         int startblock, block, dir;
> >         int scanlen = mtd->writesize + mtd->oobsize;
> > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t
> > *buf,
> >                         int actblock = startblock + dir * block;
> >                         loff_t offs = (loff_t)actblock << this-  
> > >bbt_erase_shift;  
> >  
> > +                       /* Check if block is marked bad */
> > +                       if (scan_block_fast(this, bd, offs, buf))
> > +                               continue;
> > +
> >                         /* Read first page */
> >                         scan_read(this, buf, offs, mtd->writesize, td);
> >                         if (!check_pattern(buf, scanlen, mtd->writesize,
> > td)) {
> > 
> > 
> > Thanks,
> > Miquèl  

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 15:36         ` Miquel Raynal
@ 2021-04-20  6:26           ` Stefan Riedmüller
  2021-04-21 20:44             ` Guillaume Tucker
  2021-04-26 15:53           ` Stefan Riedmüller
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Riedmüller @ 2021-04-20  6:26 UTC (permalink / raw)
  To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd

Hi Miquel,

On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> Hi Stefan,
> 
> > > Interesting. Maybe I overlooked the below commit when applying. Indeed,
> > > BBT may be considered as bad blocks, so I wonder if the below change is
> > > valid now...
> > > 
> > > Guillaume, would you have a way to revert this patch on top of
> > > linux-next? Stefan, would you mind giving more details on the testing
> > > procedure?  
> > 
> > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> > returning -EIO in nand_erase_nand when the block to be erased is one of
> > the
> > first two BBT blocks.
> > 
> > I have seen this once on a customer board but were not able to reproduce
> > it
> > anymore, thus the simulation of the two bad blocks.
> > 
> > Without the patch below new versions of the BBT can no longer be written
> > to
> > the first two blocks reserved for the BBT but they are still evaluated to
> > read
> > the BBT from during boot due the lack of a test if these blocks are bad.
> > So
> > changes to the BBT after these two blocks turn bad are only kept and used
> > until the next reboot where again the old version of the two worn blocks
> > is
> > used as a basis.
> > 
> > I tried to use the same mechanism that is used to identify bad blocks
> > during a
> > scan for bad blocks. But maybe I missed something there? Or were my
> > assumptions wrong in the first place?
> 
> Honestly I don't know what is wrong exactly in this patch.
> 
> We will revert the commit as it clearly breaks something fundamental
> and the merge window is too close to adopt a hackish attitude.
> 
> I would propose the following tests with your board:
> - Hack the core to allow yourself to access bad blocks from userspace
>   for testing purposes.
> - With the below commit, you should have the same behavior than
>   reported by Fabio.
> - Revert the commit.
> - Manually change the bad block markers (nanddump, flash_erase,
>   nandwrite) to declare the two tables bad. Reboot and observe if there
>   are any issues. You can try to work from there.

Thanks for the input! I will follow your suggestions and let you guys know my
findings.

Regards,
Stefan

> 
> > > ---8<---
> > > 
> > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> > > Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> > > Date:   Thu Mar 25 11:23:37 2021 +0100
> > > 
> > >     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
> > > NAND
> > >     
> > >     The blocks containing the bad block table can become bad as well. So
> > >     make sure to skip any blocks that are marked bad when searching for
> > > the
> > >     bad block table.
> > >     
> > >     Otherwise in very rare cases where two BBT blocks wear out it might
> > >     happen that an obsolete BBT is used instead of a newer available
> > >     version.
> > >     
> > >     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> > >     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > >     Link: 
> > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> > > 
> > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> > > b/drivers/mtd/nand/raw/nand_bbt.c
> > > index dced32a126d9..6e25a5ce5ba9 100644
> > > --- a/drivers/mtd/nand/raw/nand_bbt.c
> > > +++ b/drivers/mtd/nand/raw/nand_bbt.c
> > > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
> > > uint8_t
> > > *buf,
> > >  {
> > >         u64 targetsize = nanddev_target_size(&this->base);
> > >         struct mtd_info *mtd = nand_to_mtd(this);
> > > +       struct nand_bbt_descr *bd = this->badblock_pattern;
> > >         int i, chips;
> > >         int startblock, block, dir;
> > >         int scanlen = mtd->writesize + mtd->oobsize;
> > > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
> > > uint8_t
> > > *buf,
> > >                         int actblock = startblock + dir * block;
> > >                         loff_t offs = (loff_t)actblock << this-  
> > > > bbt_erase_shift;  
> > >  
> > > +                       /* Check if block is marked bad */
> > > +                       if (scan_block_fast(this, bd, offs, buf))
> > > +                               continue;
> > > +
> > >                         /* Read first page */
> > >                         scan_read(this, buf, offs, mtd->writesize, td);
> > >                         if (!check_pattern(buf, scanlen, mtd->writesize,
> > > td)) {
> > > 
> > > 
> > > Thanks,
> > > Miquèl  
> 
> Thanks,
> Miquèl
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-20  6:26           ` Stefan Riedmüller
@ 2021-04-21 20:44             ` Guillaume Tucker
  2021-04-21 23:29               ` Fabio Estevam
  0 siblings, 1 reply; 20+ messages in thread
From: Guillaume Tucker @ 2021-04-21 20:44 UTC (permalink / raw)
  To: Stefan Riedmüller, miquel.raynal; +Cc: festevam, kernel, linux-mtd

On 20/04/2021 07:26, Stefan Riedmüller wrote:
> Hi Miquel,
> 
> On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
>> Hi Stefan,
>>
>>>> Interesting. Maybe I overlooked the below commit when applying. Indeed,
>>>> BBT may be considered as bad blocks, so I wonder if the below change is
>>>> valid now...
>>>>
>>>> Guillaume, would you have a way to revert this patch on top of
>>>> linux-next? Stefan, would you mind giving more details on the testing
>>>> procedure?  

Sorry I'm late to the party, was busy with some other kernelci
issues.  I gather this is being reverted anyway now, but please
let me know if you still need to check anything.  As far as I can
tell, there hasn't been any automated bisection landing on this
commit.

It's generally possible to re-run anything, i.e. make a kernel
build with a custom patchset and run one given test on any of the
platforms in KernelCI.  There just isn't any public self-service
for doing that (yet).

Best wishes,
Guillaume

>>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
>>> returning -EIO in nand_erase_nand when the block to be erased is one of
>>> the
>>> first two BBT blocks.
>>>
>>> I have seen this once on a customer board but were not able to reproduce
>>> it
>>> anymore, thus the simulation of the two bad blocks.
>>>
>>> Without the patch below new versions of the BBT can no longer be written
>>> to
>>> the first two blocks reserved for the BBT but they are still evaluated to
>>> read
>>> the BBT from during boot due the lack of a test if these blocks are bad.
>>> So
>>> changes to the BBT after these two blocks turn bad are only kept and used
>>> until the next reboot where again the old version of the two worn blocks
>>> is
>>> used as a basis.
>>>
>>> I tried to use the same mechanism that is used to identify bad blocks
>>> during a
>>> scan for bad blocks. But maybe I missed something there? Or were my
>>> assumptions wrong in the first place?
>>
>> Honestly I don't know what is wrong exactly in this patch.
>>
>> We will revert the commit as it clearly breaks something fundamental
>> and the merge window is too close to adopt a hackish attitude.
>>
>> I would propose the following tests with your board:
>> - Hack the core to allow yourself to access bad blocks from userspace
>>   for testing purposes.
>> - With the below commit, you should have the same behavior than
>>   reported by Fabio.
>> - Revert the commit.
>> - Manually change the bad block markers (nanddump, flash_erase,
>>   nandwrite) to declare the two tables bad. Reboot and observe if there
>>   are any issues. You can try to work from there.
> 
> Thanks for the input! I will follow your suggestions and let you guys know my
> findings.
> 
> Regards,
> Stefan
> 
>>
>>>> ---8<---
>>>>
>>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
>>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de>
>>>> Date:   Thu Mar 25 11:23:37 2021 +0100
>>>>
>>>>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
>>>> NAND
>>>>     
>>>>     The blocks containing the bad block table can become bad as well. So
>>>>     make sure to skip any blocks that are marked bad when searching for
>>>> the
>>>>     bad block table.
>>>>     
>>>>     Otherwise in very rare cases where two BBT blocks wear out it might
>>>>     happen that an obsolete BBT is used instead of a newer available
>>>>     version.
>>>>     
>>>>     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
>>>>     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>>     Link: 
>>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
>>>>
>>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c
>>>> b/drivers/mtd/nand/raw/nand_bbt.c
>>>> index dced32a126d9..6e25a5ce5ba9 100644
>>>> --- a/drivers/mtd/nand/raw/nand_bbt.c
>>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c
>>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
>>>> uint8_t
>>>> *buf,
>>>>  {
>>>>         u64 targetsize = nanddev_target_size(&this->base);
>>>>         struct mtd_info *mtd = nand_to_mtd(this);
>>>> +       struct nand_bbt_descr *bd = this->badblock_pattern;
>>>>         int i, chips;
>>>>         int startblock, block, dir;
>>>>         int scanlen = mtd->writesize + mtd->oobsize;
>>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
>>>> uint8_t
>>>> *buf,
>>>>                         int actblock = startblock + dir * block;
>>>>                         loff_t offs = (loff_t)actblock << this-  
>>>>> bbt_erase_shift;  
>>>>  
>>>> +                       /* Check if block is marked bad */
>>>> +                       if (scan_block_fast(this, bd, offs, buf))
>>>> +                               continue;
>>>> +
>>>>                         /* Read first page */
>>>>                         scan_read(this, buf, offs, mtd->writesize, td);
>>>>                         if (!check_pattern(buf, scanlen, mtd->writesize,
>>>> td)) {
>>>>
>>>>
>>>> Thanks,
>>>> Miquèl  
>>
>> Thanks,
>> Miquèl


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-21 20:44             ` Guillaume Tucker
@ 2021-04-21 23:29               ` Fabio Estevam
  2021-04-22 13:16                 ` Guillaume Tucker
  0 siblings, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2021-04-21 23:29 UTC (permalink / raw)
  To: Guillaume Tucker; +Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd

Hi Guillaume,

On Wed, Apr 21, 2021 at 5:44 PM Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:

> Sorry I'm late to the party, was busy with some other kernelci
> issues.  I gather this is being reverted anyway now, but please
> let me know if you still need to check anything.  As far as I can
> tell, there hasn't been any automated bisection landing on this
> commit.

Thanks. Yes, we did the revert in linux-next, but I could not see the
next-20210421 boot log for the imx27-phytec-phycard-s-rdk board to
confirm that the NAND bad block table can be found again.

Thanks for your help




>
> It's generally possible to re-run anything, i.e. make a kernel
> build with a custom patchset and run one given test on any of the
> platforms in KernelCI.  There just isn't any public self-service
> for doing that (yet).
>
> Best wishes,
> Guillaume
>
> >>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> >>> returning -EIO in nand_erase_nand when the block to be erased is one of
> >>> the
> >>> first two BBT blocks.
> >>>
> >>> I have seen this once on a customer board but were not able to reproduce
> >>> it
> >>> anymore, thus the simulation of the two bad blocks.
> >>>
> >>> Without the patch below new versions of the BBT can no longer be written
> >>> to
> >>> the first two blocks reserved for the BBT but they are still evaluated to
> >>> read
> >>> the BBT from during boot due the lack of a test if these blocks are bad.
> >>> So
> >>> changes to the BBT after these two blocks turn bad are only kept and used
> >>> until the next reboot where again the old version of the two worn blocks
> >>> is
> >>> used as a basis.
> >>>
> >>> I tried to use the same mechanism that is used to identify bad blocks
> >>> during a
> >>> scan for bad blocks. But maybe I missed something there? Or were my
> >>> assumptions wrong in the first place?
> >>
> >> Honestly I don't know what is wrong exactly in this patch.
> >>
> >> We will revert the commit as it clearly breaks something fundamental
> >> and the merge window is too close to adopt a hackish attitude.
> >>
> >> I would propose the following tests with your board:
> >> - Hack the core to allow yourself to access bad blocks from userspace
> >>   for testing purposes.
> >> - With the below commit, you should have the same behavior than
> >>   reported by Fabio.
> >> - Revert the commit.
> >> - Manually change the bad block markers (nanddump, flash_erase,
> >>   nandwrite) to declare the two tables bad. Reboot and observe if there
> >>   are any issues. You can try to work from there.
> >
> > Thanks for the input! I will follow your suggestions and let you guys know my
> > findings.
> >
> > Regards,
> > Stefan
> >
> >>
> >>>> ---8<---
> >>>>
> >>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> >>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> >>>> Date:   Thu Mar 25 11:23:37 2021 +0100
> >>>>
> >>>>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
> >>>> NAND
> >>>>
> >>>>     The blocks containing the bad block table can become bad as well. So
> >>>>     make sure to skip any blocks that are marked bad when searching for
> >>>> the
> >>>>     bad block table.
> >>>>
> >>>>     Otherwise in very rare cases where two BBT blocks wear out it might
> >>>>     happen that an obsolete BBT is used instead of a newer available
> >>>>     version.
> >>>>
> >>>>     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> >>>>     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >>>>     Link:
> >>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> >>>>
> >>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> >>>> b/drivers/mtd/nand/raw/nand_bbt.c
> >>>> index dced32a126d9..6e25a5ce5ba9 100644
> >>>> --- a/drivers/mtd/nand/raw/nand_bbt.c
> >>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c
> >>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
> >>>> uint8_t
> >>>> *buf,
> >>>>  {
> >>>>         u64 targetsize = nanddev_target_size(&this->base);
> >>>>         struct mtd_info *mtd = nand_to_mtd(this);
> >>>> +       struct nand_bbt_descr *bd = this->badblock_pattern;
> >>>>         int i, chips;
> >>>>         int startblock, block, dir;
> >>>>         int scanlen = mtd->writesize + mtd->oobsize;
> >>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
> >>>> uint8_t
> >>>> *buf,
> >>>>                         int actblock = startblock + dir * block;
> >>>>                         loff_t offs = (loff_t)actblock << this-
> >>>>> bbt_erase_shift;
> >>>>
> >>>> +                       /* Check if block is marked bad */
> >>>> +                       if (scan_block_fast(this, bd, offs, buf))
> >>>> +                               continue;
> >>>> +
> >>>>                         /* Read first page */
> >>>>                         scan_read(this, buf, offs, mtd->writesize, td);
> >>>>                         if (!check_pattern(buf, scanlen, mtd->writesize,
> >>>> td)) {
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Miquèl
> >>
> >> Thanks,
> >> Miquèl
>

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-21 23:29               ` Fabio Estevam
@ 2021-04-22 13:16                 ` Guillaume Tucker
  2021-04-22 13:28                   ` Fabio Estevam
  0 siblings, 1 reply; 20+ messages in thread
From: Guillaume Tucker @ 2021-04-22 13:16 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd,
	Jan Lübbe, kernelci-results

+Jan, +kernelci-results

On 22/04/2021 00:29, Fabio Estevam wrote:
> Hi Guillaume,
> 
> On Wed, Apr 21, 2021 at 5:44 PM Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> 
>> Sorry I'm late to the party, was busy with some other kernelci
>> issues.  I gather this is being reverted anyway now, but please
>> let me know if you still need to check anything.  As far as I can
>> tell, there hasn't been any automated bisection landing on this
>> commit.
> 
> Thanks. Yes, we did the revert in linux-next, but I could not see the
> next-20210421 boot log for the imx27-phytec-phycard-s-rdk board to
> confirm that the NAND bad block table can be found again.

This device is only available in Pengutronix's lab which is
currently being moved to a new location, so I'm being told.  I
guess we'll check again when it's back online.

Are you aware of any other platform in KernelCI showing the same
issue?  I could take a look but there's been more boot failure
regressions than usual on linux-next recently...

Best wishes,
Guillaume

>> It's generally possible to re-run anything, i.e. make a kernel
>> build with a custom patchset and run one given test on any of the
>> platforms in KernelCI.  There just isn't any public self-service
>> for doing that (yet).
>>
>> Best wishes,
>> Guillaume
>>
>>>>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
>>>>> returning -EIO in nand_erase_nand when the block to be erased is one of
>>>>> the
>>>>> first two BBT blocks.
>>>>>
>>>>> I have seen this once on a customer board but were not able to reproduce
>>>>> it
>>>>> anymore, thus the simulation of the two bad blocks.
>>>>>
>>>>> Without the patch below new versions of the BBT can no longer be written
>>>>> to
>>>>> the first two blocks reserved for the BBT but they are still evaluated to
>>>>> read
>>>>> the BBT from during boot due the lack of a test if these blocks are bad.
>>>>> So
>>>>> changes to the BBT after these two blocks turn bad are only kept and used
>>>>> until the next reboot where again the old version of the two worn blocks
>>>>> is
>>>>> used as a basis.
>>>>>
>>>>> I tried to use the same mechanism that is used to identify bad blocks
>>>>> during a
>>>>> scan for bad blocks. But maybe I missed something there? Or were my
>>>>> assumptions wrong in the first place?
>>>>
>>>> Honestly I don't know what is wrong exactly in this patch.
>>>>
>>>> We will revert the commit as it clearly breaks something fundamental
>>>> and the merge window is too close to adopt a hackish attitude.
>>>>
>>>> I would propose the following tests with your board:
>>>> - Hack the core to allow yourself to access bad blocks from userspace
>>>>   for testing purposes.
>>>> - With the below commit, you should have the same behavior than
>>>>   reported by Fabio.
>>>> - Revert the commit.
>>>> - Manually change the bad block markers (nanddump, flash_erase,
>>>>   nandwrite) to declare the two tables bad. Reboot and observe if there
>>>>   are any issues. You can try to work from there.
>>>
>>> Thanks for the input! I will follow your suggestions and let you guys know my
>>> findings.
>>>
>>> Regards,
>>> Stefan
>>>
>>>>
>>>>>> ---8<---
>>>>>>
>>>>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
>>>>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de>
>>>>>> Date:   Thu Mar 25 11:23:37 2021 +0100
>>>>>>
>>>>>>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
>>>>>> NAND
>>>>>>
>>>>>>     The blocks containing the bad block table can become bad as well. So
>>>>>>     make sure to skip any blocks that are marked bad when searching for
>>>>>> the
>>>>>>     bad block table.
>>>>>>
>>>>>>     Otherwise in very rare cases where two BBT blocks wear out it might
>>>>>>     happen that an obsolete BBT is used instead of a newer available
>>>>>>     version.
>>>>>>
>>>>>>     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
>>>>>>     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>>>>     Link:
>>>>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
>>>>>>
>>>>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c
>>>>>> b/drivers/mtd/nand/raw/nand_bbt.c
>>>>>> index dced32a126d9..6e25a5ce5ba9 100644
>>>>>> --- a/drivers/mtd/nand/raw/nand_bbt.c
>>>>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c
>>>>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
>>>>>> uint8_t
>>>>>> *buf,
>>>>>>  {
>>>>>>         u64 targetsize = nanddev_target_size(&this->base);
>>>>>>         struct mtd_info *mtd = nand_to_mtd(this);
>>>>>> +       struct nand_bbt_descr *bd = this->badblock_pattern;
>>>>>>         int i, chips;
>>>>>>         int startblock, block, dir;
>>>>>>         int scanlen = mtd->writesize + mtd->oobsize;
>>>>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
>>>>>> uint8_t
>>>>>> *buf,
>>>>>>                         int actblock = startblock + dir * block;
>>>>>>                         loff_t offs = (loff_t)actblock << this-
>>>>>>> bbt_erase_shift;
>>>>>>
>>>>>> +                       /* Check if block is marked bad */
>>>>>> +                       if (scan_block_fast(this, bd, offs, buf))
>>>>>> +                               continue;
>>>>>> +
>>>>>>                         /* Read first page */
>>>>>>                         scan_read(this, buf, offs, mtd->writesize, td);
>>>>>>                         if (!check_pattern(buf, scanlen, mtd->writesize,
>>>>>> td)) {
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Miquèl
>>>>
>>>> Thanks,
>>>> Miquèl
>>


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-22 13:16                 ` Guillaume Tucker
@ 2021-04-22 13:28                   ` Fabio Estevam
  2021-04-23 21:04                     ` Fabio Estevam
  0 siblings, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2021-04-22 13:28 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd,
	Jan Lübbe, kernelci-results

Hi Guillaume,

On Thu, Apr 22, 2021 at 10:16 AM Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:

> This device is only available in Pengutronix's lab which is
> currently being moved to a new location, so I'm being told.  I
> guess we'll check again when it's back online.

Ok, no problem. Yes, we can check again when it is back online.

> Are you aware of any other platform in KernelCI showing the same
> issue?  I could take a look but there's been more boot failure
> regressions than usual on linux-next recently...

There should probably be other platforms affected, but I am not aware
of a different platform than imx27-phytec-phycard-s-rdk in KernelCI at
the moment.

Thanks

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-22 13:28                   ` Fabio Estevam
@ 2021-04-23 21:04                     ` Fabio Estevam
  0 siblings, 0 replies; 20+ messages in thread
From: Fabio Estevam @ 2021-04-23 21:04 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd,
	Jan Lübbe, kernelci-results

Hi Guillaume,

On Thu, Apr 22, 2021 at 10:28 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Guillaume,
>
> On Thu, Apr 22, 2021 at 10:16 AM Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>
> > This device is only available in Pengutronix's lab which is
> > currently being moved to a new location, so I'm being told.  I
> > guess we'll check again when it's back online.
>
> Ok, no problem. Yes, we can check again when it is back online.

I see it is back:
https://storage.kernelci.org/next/master/next-20210423/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html

NAND is correctly probed now, so we are all good!

Thanks for the good work with kernelci! It is super helpful :-)

Cheers

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-19 15:36         ` Miquel Raynal
  2021-04-20  6:26           ` Stefan Riedmüller
@ 2021-04-26 15:53           ` Stefan Riedmüller
  2021-05-04  8:34             ` Miquel Raynal
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Riedmüller @ 2021-04-26 15:53 UTC (permalink / raw)
  To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd

Hi Miquel,

On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> Hi Stefan,
> 
> > > Interesting. Maybe I overlooked the below commit when applying. Indeed,
> > > BBT may be considered as bad blocks, so I wonder if the below change is
> > > valid now...
> > > 
> > > Guillaume, would you have a way to revert this patch on top of
> > > linux-next? Stefan, would you mind giving more details on the testing
> > > procedure?  
> > 
> > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> > returning -EIO in nand_erase_nand when the block to be erased is one of
> > the
> > first two BBT blocks.
> > 
> > I have seen this once on a customer board but were not able to reproduce
> > it
> > anymore, thus the simulation of the two bad blocks.
> > 
> > Without the patch below new versions of the BBT can no longer be written
> > to
> > the first two blocks reserved for the BBT but they are still evaluated to
> > read
> > the BBT from during boot due the lack of a test if these blocks are bad.
> > So
> > changes to the BBT after these two blocks turn bad are only kept and used
> > until the next reboot where again the old version of the two worn blocks
> > is
> > used as a basis.
> > 
> > I tried to use the same mechanism that is used to identify bad blocks
> > during a
> > scan for bad blocks. But maybe I missed something there? Or were my
> > assumptions wrong in the first place?
> 
> Honestly I don't know what is wrong exactly in this patch.
> 
> We will revert the commit as it clearly breaks something fundamental
> and the merge window is too close to adopt a hackish attitude.
> 
> I would propose the following tests with your board:
> - Hack the core to allow yourself to access bad blocks from userspace
>   for testing purposes.
> - With the below commit, you should have the same behavior than
>   reported by Fabio.

On my imx6 board the patch does not lead to the behavior reported by Fabio.
The BBT is found and can be read:

[    1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3
[    1.526944] nand: Macronix MX60LF8G18AC
[    1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
size: 64
[    1.539412] Bad block table found at page 524224, version 0x01
[    1.545790] Bad block table found at page 524160, version 0x01
[    1.551796] nand_read_bbt: bad block at 0x000001b60000
[    1.557032] nand_read_bbt: bad block at 0x000008cc0000
[    1.562204] nand_read_bbt: bad block at 0x00000f480000
[    1.567395] nand_read_bbt: bad block at 0x0000111c0000
[    1.572588] nand_read_bbt: bad block at 0x0000205c0000
[    1.577802] nand_read_bbt: bad block at 0x00002dfc0000

I dug a little deeper and I think I found the cause for the failure on the
imx27 board.

The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with an
offset of 0 in the OOB area. This is the same place the bad block marker is
located on worn or factory bad blocks.

This explains why the BBT is no longer found with my patch. scan_block_fast
checks if there is anything else than 0xff in the bad block marker and finds
the 'B' from 'Bbt0'. The same occurs for the mirrored version where it finds
the '1' from '1tbB'. 

This also explains why the original BBT is detected as bad blocks in the scan
after the BBT was not found, which results in the BBT being written to the
remaining two blocks reserved for the BBT.

19:38:23.001385  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
19:38:23.002635  nand: ST Micro NAND01GR3B2CZA6
19:38:23.006666  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
size: 64
19:38:23.028413  Bad block table not found for chip 0
19:38:23.035625  random: fast init done
19:38:23.049144  Bad block table not found for chip 0
19:38:23.050024  Scanning device for bad blocks
19:38:23.330999  Bad eraseblock 329 at 0x000002920000
19:38:23.345958  Bad eraseblock 330 at 0x000002940000
19:38:23.356024  Bad eraseblock 331 at 0x000002960000
19:38:23.365738  Bad eraseblock 332 at 0x000002980000
19:38:23.375590  Bad eraseblock 333 at 0x0000029a0000
19:38:23.385505  Bad eraseblock 334 at 0x0000029c0000
19:38:23.395548  Bad eraseblock 335 at 0x0000029e0000
19:38:23.405501  Bad eraseblock 336 at 0x000002a00000
19:38:23.415551  Bad eraseblock 337 at 0x000002a20000
19:38:23.425937  Bad eraseblock 338 at 0x000002a40000
19:38:23.436028  Bad eraseblock 339 at 0x000002a60000
19:38:23.445959  Bad eraseblock 340 at 0x000002a80000
19:38:23.456008  Bad eraseblock 341 at 0x000002aa0000
19:38:23.466006  Bad eraseblock 342 at 0x000002ac0000
19:38:23.475912  Bad eraseblock 343 at 0x000002ae0000
19:38:23.486064  Bad eraseblock 344 at 0x000002b00000
19:38:23.495925  Bad eraseblock 345 at 0x000002b20000
19:38:24.048053  Bad eraseblock 1022 at 0x000007fc0000
19:38:24.056117  Bad eraseblock 1023 at 0x000007fe0000
19:38:24.067953  Bad block table written to 0x000007fa0000, version 0x01
19:38:24.087637  Bad block table written to 0x000007f80000, version 0x01


On the next boot all four BBT version in flash are skipped for the same reason
as before and the two blocks containing the latest BBT are also detected as
bad blocks. The result is no more remaining blocks to write the BBT to.


21:22:55.032595  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
21:22:55.033333  nand: ST Micro NAND01GR3B2CZA6
21:22:55.037804  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
size: 64
21:22:55.088475  Bad block table not found for chip 0
21:22:55.093807  Bad block table not found for chip 0
21:22:55.105995  Scanning device for bad blocks
21:22:55.109049  random: fast init done
21:22:55.395488  Bad eraseblock 329 at 0x000002920000
21:22:55.406832  Bad eraseblock 330 at 0x000002940000
21:22:55.416885  Bad eraseblock 331 at 0x000002960000
21:22:55.426736  Bad eraseblock 332 at 0x000002980000
21:22:55.436732  Bad eraseblock 333 at 0x0000029a0000
21:22:55.446864  Bad eraseblock 334 at 0x0000029c0000
21:22:55.456662  Bad eraseblock 335 at 0x0000029e0000
21:22:55.466785  Bad eraseblock 336 at 0x000002a00000
21:22:55.476801  Bad eraseblock 337 at 0x000002a20000
21:22:55.486772  Bad eraseblock 338 at 0x000002a40000
21:22:55.496768  Bad eraseblock 339 at 0x000002a60000
21:22:55.506607  Bad eraseblock 340 at 0x000002a80000
21:22:55.516965  Bad eraseblock 341 at 0x000002aa0000
21:22:55.526621  Bad eraseblock 342 at 0x000002ac0000
21:22:55.536702  Bad eraseblock 343 at 0x000002ae0000
21:22:55.546660  Bad eraseblock 344 at 0x000002b00000
21:22:55.556745  Bad eraseblock 345 at 0x000002b20000
21:22:56.172928  Bad eraseblock 1020 at 0x000007f80000
21:22:56.187043  Bad eraseblock 1021 at 0x000007fa0000
21:22:56.197437  Bad eraseblock 1022 at 0x000007fc0000
21:22:56.212665  Bad eraseblock 1023 at 0x000007fe0000
21:22:56.213356  No space left to write bad block table
21:22:56.215012  nand_bbt: error while writing bad block table -28
21:22:56.239353  mxc_nand: probe of d8000000.nand-controller failed with error
-28

I'm not sure of the best way to address this issue. A few ideas came into my
mind:

- Shift the offset of the nand_bbt_descr of mxc_nand to make room for the bad
block marker. But I'm not sure if this would already conflict with the ECC
hardware but the ooblayout functions would suggest that it could work. 

---8<---
static int mxc_v1_ooblayout_free(struct mtd_info *mtd, int section,
                                 struct mtd_oob_region *oobregion)
{   
        struct nand_chip *nand_chip = mtd_to_nand(mtd);
   
        if (section > nand_chip->ecc.steps)
                return -ERANGE;
   
        if (!section) {
                if (mtd->writesize <= 512) {
                        oobregion->offset = 0;
                        oobregion->length = 5;
                } else {
                        oobregion->offset = 2;
                        oobregion->length = 4;
                }
        } else {
                oobregion->offset = ((section - 1) * 16) + MXC_V1_ECCBYTES +
6;
                if (section < nand_chip->ecc.steps)
                        oobregion->length = (section * 16) + 6 -
                                            oobregion->offset;
                else
                        oobregion->length = mtd->oobsize - oobregion->offset;
        }   
   
        return 0;
}
---8<---

Unfortunately I don't have any hardware at hand at the moment to test it. I
think the distinction between small and large pagesizes needs to be reflected
on the bbt_descr as well.

- Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment saying
there is an overlap between the generic bbt descriptors and the ECC hardware.
I'm not sure what other effects it might have to set NAND_BBT_NO_OOB.

- Explicitly check for the bad block marker during a search for the BBT
instead of using scan_block_fast

Any suggestions?

Regards,
Stefan


> - Revert the commit.
> - Manually change the bad block markers (nanddump, flash_erase,
>   nandwrite) to declare the two tables bad. Reboot and observe if there
>   are any issues. You can try to work from there.
> 
> > > ---8<---
> > > 
> > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> > > Author: Stefan Riedmueller <s.riedmueller@phytec.de>
> > > Date:   Thu Mar 25 11:23:37 2021 +0100
> > > 
> > >     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
> > > NAND
> > >     
> > >     The blocks containing the bad block table can become bad as well. So
> > >     make sure to skip any blocks that are marked bad when searching for
> > > the
> > >     bad block table.
> > >     
> > >     Otherwise in very rare cases where two BBT blocks wear out it might
> > >     happen that an obsolete BBT is used instead of a newer available
> > >     version.
> > >     
> > >     Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> > >     Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > >     Link: 
> > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> > > 
> > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> > > b/drivers/mtd/nand/raw/nand_bbt.c
> > > index dced32a126d9..6e25a5ce5ba9 100644
> > > --- a/drivers/mtd/nand/raw/nand_bbt.c
> > > +++ b/drivers/mtd/nand/raw/nand_bbt.c
> > > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
> > > uint8_t
> > > *buf,
> > >  {
> > >         u64 targetsize = nanddev_target_size(&this->base);
> > >         struct mtd_info *mtd = nand_to_mtd(this);
> > > +       struct nand_bbt_descr *bd = this->badblock_pattern;
> > >         int i, chips;
> > >         int startblock, block, dir;
> > >         int scanlen = mtd->writesize + mtd->oobsize;
> > > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
> > > uint8_t
> > > *buf,
> > >                         int actblock = startblock + dir * block;
> > >                         loff_t offs = (loff_t)actblock << this-  
> > > > bbt_erase_shift;  
> > >  
> > > +                       /* Check if block is marked bad */
> > > +                       if (scan_block_fast(this, bd, offs, buf))
> > > +                               continue;
> > > +
> > >                         /* Read first page */
> > >                         scan_read(this, buf, offs, mtd->writesize, td);
> > >                         if (!check_pattern(buf, scanlen, mtd->writesize,
> > > td)) {
> > > 
> > > 
> > > Thanks,
> > > Miquèl  
> 
> Thanks,
> Miquèl
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-04-26 15:53           ` Stefan Riedmüller
@ 2021-05-04  8:34             ` Miquel Raynal
  2021-05-10  8:38               ` Stefan Riedmüller
  0 siblings, 1 reply; 20+ messages in thread
From: Miquel Raynal @ 2021-05-04  8:34 UTC (permalink / raw)
  To: Stefan Riedmüller; +Cc: festevam, guillaume.tucker, kernel, linux-mtd

Hi Stefan,

Stefan Riedmüller <S.Riedmueller@phytec.de> wrote on Mon, 26 Apr 2021
15:53:39 +0000:

> Hi Miquel,
> 
> On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> > Hi Stefan,
> >   
> > > > Interesting. Maybe I overlooked the below commit when applying. Indeed,
> > > > BBT may be considered as bad blocks, so I wonder if the below change is
> > > > valid now...
> > > > 
> > > > Guillaume, would you have a way to revert this patch on top of
> > > > linux-next? Stefan, would you mind giving more details on the testing
> > > > procedure?    
> > > 
> > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> > > returning -EIO in nand_erase_nand when the block to be erased is one of
> > > the
> > > first two BBT blocks.
> > > 
> > > I have seen this once on a customer board but were not able to reproduce
> > > it
> > > anymore, thus the simulation of the two bad blocks.
> > > 
> > > Without the patch below new versions of the BBT can no longer be written
> > > to
> > > the first two blocks reserved for the BBT but they are still evaluated to
> > > read
> > > the BBT from during boot due the lack of a test if these blocks are bad.
> > > So
> > > changes to the BBT after these two blocks turn bad are only kept and used
> > > until the next reboot where again the old version of the two worn blocks
> > > is
> > > used as a basis.
> > > 
> > > I tried to use the same mechanism that is used to identify bad blocks
> > > during a
> > > scan for bad blocks. But maybe I missed something there? Or were my
> > > assumptions wrong in the first place?  
> > 
> > Honestly I don't know what is wrong exactly in this patch.
> > 
> > We will revert the commit as it clearly breaks something fundamental
> > and the merge window is too close to adopt a hackish attitude.
> > 
> > I would propose the following tests with your board:
> > - Hack the core to allow yourself to access bad blocks from userspace
> >   for testing purposes.
> > - With the below commit, you should have the same behavior than
> >   reported by Fabio.  
> 
> On my imx6 board the patch does not lead to the behavior reported by Fabio.
> The BBT is found and can be read:
> 
> [    1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3
> [    1.526944] nand: Macronix MX60LF8G18AC
> [    1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> [    1.539412] Bad block table found at page 524224, version 0x01
> [    1.545790] Bad block table found at page 524160, version 0x01
> [    1.551796] nand_read_bbt: bad block at 0x000001b60000
> [    1.557032] nand_read_bbt: bad block at 0x000008cc0000
> [    1.562204] nand_read_bbt: bad block at 0x00000f480000
> [    1.567395] nand_read_bbt: bad block at 0x0000111c0000
> [    1.572588] nand_read_bbt: bad block at 0x0000205c0000
> [    1.577802] nand_read_bbt: bad block at 0x00002dfc0000
> 
> I dug a little deeper and I think I found the cause for the failure on the
> imx27 board.
> 
> The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with an
> offset of 0 in the OOB area. This is the same place the bad block marker is
> located on worn or factory bad blocks.
> 
> This explains why the BBT is no longer found with my patch. scan_block_fast
> checks if there is anything else than 0xff in the bad block marker and finds
> the 'B' from 'Bbt0'. The same occurs for the mirrored version where it finds
> the '1' from '1tbB'. 

Ok, that's the reason why the original logic failed, thanks for looking
for it.

> This also explains why the original BBT is detected as bad blocks in the scan
> after the BBT was not found, which results in the BBT being written to the
> remaining two blocks reserved for the BBT.
> 
> 19:38:23.001385  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> 19:38:23.002635  nand: ST Micro NAND01GR3B2CZA6
> 19:38:23.006666  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> 19:38:23.028413  Bad block table not found for chip 0
> 19:38:23.035625  random: fast init done
> 19:38:23.049144  Bad block table not found for chip 0
> 19:38:23.050024  Scanning device for bad blocks
> 19:38:23.330999  Bad eraseblock 329 at 0x000002920000
> 19:38:23.345958  Bad eraseblock 330 at 0x000002940000
> 19:38:23.356024  Bad eraseblock 331 at 0x000002960000
> 19:38:23.365738  Bad eraseblock 332 at 0x000002980000
> 19:38:23.375590  Bad eraseblock 333 at 0x0000029a0000
> 19:38:23.385505  Bad eraseblock 334 at 0x0000029c0000
> 19:38:23.395548  Bad eraseblock 335 at 0x0000029e0000
> 19:38:23.405501  Bad eraseblock 336 at 0x000002a00000
> 19:38:23.415551  Bad eraseblock 337 at 0x000002a20000
> 19:38:23.425937  Bad eraseblock 338 at 0x000002a40000
> 19:38:23.436028  Bad eraseblock 339 at 0x000002a60000
> 19:38:23.445959  Bad eraseblock 340 at 0x000002a80000
> 19:38:23.456008  Bad eraseblock 341 at 0x000002aa0000
> 19:38:23.466006  Bad eraseblock 342 at 0x000002ac0000
> 19:38:23.475912  Bad eraseblock 343 at 0x000002ae0000
> 19:38:23.486064  Bad eraseblock 344 at 0x000002b00000
> 19:38:23.495925  Bad eraseblock 345 at 0x000002b20000
> 19:38:24.048053  Bad eraseblock 1022 at 0x000007fc0000
> 19:38:24.056117  Bad eraseblock 1023 at 0x000007fe0000
> 19:38:24.067953  Bad block table written to 0x000007fa0000, version 0x01
> 19:38:24.087637  Bad block table written to 0x000007f80000, version 0x01
> 
> 
> On the next boot all four BBT version in flash are skipped for the same reason
> as before and the two blocks containing the latest BBT are also detected as
> bad blocks. The result is no more remaining blocks to write the BBT to.
> 
> 
> 21:22:55.032595  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> 21:22:55.033333  nand: ST Micro NAND01GR3B2CZA6
> 21:22:55.037804  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> 21:22:55.088475  Bad block table not found for chip 0
> 21:22:55.093807  Bad block table not found for chip 0
> 21:22:55.105995  Scanning device for bad blocks
> 21:22:55.109049  random: fast init done
> 21:22:55.395488  Bad eraseblock 329 at 0x000002920000
> 21:22:55.406832  Bad eraseblock 330 at 0x000002940000
> 21:22:55.416885  Bad eraseblock 331 at 0x000002960000
> 21:22:55.426736  Bad eraseblock 332 at 0x000002980000
> 21:22:55.436732  Bad eraseblock 333 at 0x0000029a0000
> 21:22:55.446864  Bad eraseblock 334 at 0x0000029c0000
> 21:22:55.456662  Bad eraseblock 335 at 0x0000029e0000
> 21:22:55.466785  Bad eraseblock 336 at 0x000002a00000
> 21:22:55.476801  Bad eraseblock 337 at 0x000002a20000
> 21:22:55.486772  Bad eraseblock 338 at 0x000002a40000
> 21:22:55.496768  Bad eraseblock 339 at 0x000002a60000
> 21:22:55.506607  Bad eraseblock 340 at 0x000002a80000
> 21:22:55.516965  Bad eraseblock 341 at 0x000002aa0000
> 21:22:55.526621  Bad eraseblock 342 at 0x000002ac0000
> 21:22:55.536702  Bad eraseblock 343 at 0x000002ae0000
> 21:22:55.546660  Bad eraseblock 344 at 0x000002b00000
> 21:22:55.556745  Bad eraseblock 345 at 0x000002b20000
> 21:22:56.172928  Bad eraseblock 1020 at 0x000007f80000
> 21:22:56.187043  Bad eraseblock 1021 at 0x000007fa0000
> 21:22:56.197437  Bad eraseblock 1022 at 0x000007fc0000
> 21:22:56.212665  Bad eraseblock 1023 at 0x000007fe0000
> 21:22:56.213356  No space left to write bad block table
> 21:22:56.215012  nand_bbt: error while writing bad block table -28
> 21:22:56.239353  mxc_nand: probe of d8000000.nand-controller failed with error
> -28
> 
> I'm not sure of the best way to address this issue. A few ideas came into my
> mind:
> 
> - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the bad
> block marker. But I'm not sure if this would already conflict with the ECC
> hardware but the ooblayout functions would suggest that it could work. 

There are thousands of boards out there that would be broken with such
change: it's too late to do changes in this driver, unfortunately.

> Unfortunately I don't have any hardware at hand at the moment to test it. I
> think the distinction between small and large pagesizes needs to be reflected
> on the bbt_descr as well.
> 
> - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment saying
> there is an overlap between the generic bbt descriptors and the ECC hardware.
> I'm not sure what other effects it might have to set NAND_BBT_NO_OOB.

Same here: that's not an option.

> - Explicitly check for the bad block marker during a search for the BBT
> instead of using scan_block_fast

This look more reasonable. You can create a helper which does the
scan_block_fast(), then eventually checks the beginning of the OOB
buffer and tries to match with the ->td and ->md descriptors. This
should work with all the legacy drivers implementing their own
descriptors - hopefully.

Other drivers are impacted as well, so maybe you'll find a board for
testing (or someone gentle enough that will test it for you).

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: imx27: No space left to write bad block table
  2021-05-04  8:34             ` Miquel Raynal
@ 2021-05-10  8:38               ` Stefan Riedmüller
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Riedmüller @ 2021-05-10  8:38 UTC (permalink / raw)
  To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd

Hi Miquel,

On Tue, 2021-05-04 at 10:34 +0200, Miquel Raynal wrote:
> Hi Stefan,
> 
> Stefan Riedmüller <S.Riedmueller@phytec.de> wrote on Mon, 26 Apr 2021
> 15:53:39 +0000:
> 
> > Hi Miquel,
> > 
> > On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> > > Hi Stefan,
> > >   
> > > > > Interesting. Maybe I overlooked the below commit when applying.
> > > > > Indeed,
> > > > > BBT may be considered as bad blocks, so I wonder if the below change
> > > > > is
> > > > > valid now...
> > > > > 
> > > > > Guillaume, would you have a way to revert this patch on top of
> > > > > linux-next? Stefan, would you mind giving more details on the
> > > > > testing
> > > > > procedure?    
> > > > 
> > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by
> > > > simply
> > > > returning -EIO in nand_erase_nand when the block to be erased is one
> > > > of
> > > > the
> > > > first two BBT blocks.
> > > > 
> > > > I have seen this once on a customer board but were not able to
> > > > reproduce
> > > > it
> > > > anymore, thus the simulation of the two bad blocks.
> > > > 
> > > > Without the patch below new versions of the BBT can no longer be
> > > > written
> > > > to
> > > > the first two blocks reserved for the BBT but they are still evaluated
> > > > to
> > > > read
> > > > the BBT from during boot due the lack of a test if these blocks are
> > > > bad.
> > > > So
> > > > changes to the BBT after these two blocks turn bad are only kept and
> > > > used
> > > > until the next reboot where again the old version of the two worn
> > > > blocks
> > > > is
> > > > used as a basis.
> > > > 
> > > > I tried to use the same mechanism that is used to identify bad blocks
> > > > during a
> > > > scan for bad blocks. But maybe I missed something there? Or were my
> > > > assumptions wrong in the first place?  
> > > 
> > > Honestly I don't know what is wrong exactly in this patch.
> > > 
> > > We will revert the commit as it clearly breaks something fundamental
> > > and the merge window is too close to adopt a hackish attitude.
> > > 
> > > I would propose the following tests with your board:
> > > - Hack the core to allow yourself to access bad blocks from userspace
> > >   for testing purposes.
> > > - With the below commit, you should have the same behavior than
> > >   reported by Fabio.  
> > 
> > On my imx6 board the patch does not lead to the behavior reported by
> > Fabio.
> > The BBT is found and can be read:
> > 
> > [    1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3
> > [    1.526944] nand: Macronix MX60LF8G18AC
> > [    1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > [    1.539412] Bad block table found at page 524224, version 0x01
> > [    1.545790] Bad block table found at page 524160, version 0x01
> > [    1.551796] nand_read_bbt: bad block at 0x000001b60000
> > [    1.557032] nand_read_bbt: bad block at 0x000008cc0000
> > [    1.562204] nand_read_bbt: bad block at 0x00000f480000
> > [    1.567395] nand_read_bbt: bad block at 0x0000111c0000
> > [    1.572588] nand_read_bbt: bad block at 0x0000205c0000
> > [    1.577802] nand_read_bbt: bad block at 0x00002dfc0000
> > 
> > I dug a little deeper and I think I found the cause for the failure on the
> > imx27 board.
> > 
> > The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with
> > an
> > offset of 0 in the OOB area. This is the same place the bad block marker
> > is
> > located on worn or factory bad blocks.
> > 
> > This explains why the BBT is no longer found with my patch.
> > scan_block_fast
> > checks if there is anything else than 0xff in the bad block marker and
> > finds
> > the 'B' from 'Bbt0'. The same occurs for the mirrored version where it
> > finds
> > the '1' from '1tbB'. 
> 
> Ok, that's the reason why the original logic failed, thanks for looking
> for it.
> 
> > This also explains why the original BBT is detected as bad blocks in the
> > scan
> > after the BBT was not found, which results in the BBT being written to the
> > remaining two blocks reserved for the BBT.
> > 
> > 19:38:23.001385  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > 19:38:23.002635  nand: ST Micro NAND01GR3B2CZA6
> > 19:38:23.006666  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > 19:38:23.028413  Bad block table not found for chip 0
> > 19:38:23.035625  random: fast init done
> > 19:38:23.049144  Bad block table not found for chip 0
> > 19:38:23.050024  Scanning device for bad blocks
> > 19:38:23.330999  Bad eraseblock 329 at 0x000002920000
> > 19:38:23.345958  Bad eraseblock 330 at 0x000002940000
> > 19:38:23.356024  Bad eraseblock 331 at 0x000002960000
> > 19:38:23.365738  Bad eraseblock 332 at 0x000002980000
> > 19:38:23.375590  Bad eraseblock 333 at 0x0000029a0000
> > 19:38:23.385505  Bad eraseblock 334 at 0x0000029c0000
> > 19:38:23.395548  Bad eraseblock 335 at 0x0000029e0000
> > 19:38:23.405501  Bad eraseblock 336 at 0x000002a00000
> > 19:38:23.415551  Bad eraseblock 337 at 0x000002a20000
> > 19:38:23.425937  Bad eraseblock 338 at 0x000002a40000
> > 19:38:23.436028  Bad eraseblock 339 at 0x000002a60000
> > 19:38:23.445959  Bad eraseblock 340 at 0x000002a80000
> > 19:38:23.456008  Bad eraseblock 341 at 0x000002aa0000
> > 19:38:23.466006  Bad eraseblock 342 at 0x000002ac0000
> > 19:38:23.475912  Bad eraseblock 343 at 0x000002ae0000
> > 19:38:23.486064  Bad eraseblock 344 at 0x000002b00000
> > 19:38:23.495925  Bad eraseblock 345 at 0x000002b20000
> > 19:38:24.048053  Bad eraseblock 1022 at 0x000007fc0000
> > 19:38:24.056117  Bad eraseblock 1023 at 0x000007fe0000
> > 19:38:24.067953  Bad block table written to 0x000007fa0000, version 0x01
> > 19:38:24.087637  Bad block table written to 0x000007f80000, version 0x01
> > 
> > 
> > On the next boot all four BBT version in flash are skipped for the same
> > reason
> > as before and the two blocks containing the latest BBT are also detected
> > as
> > bad blocks. The result is no more remaining blocks to write the BBT to.
> > 
> > 
> > 21:22:55.032595  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > 21:22:55.033333  nand: ST Micro NAND01GR3B2CZA6
> > 21:22:55.037804  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > 21:22:55.088475  Bad block table not found for chip 0
> > 21:22:55.093807  Bad block table not found for chip 0
> > 21:22:55.105995  Scanning device for bad blocks
> > 21:22:55.109049  random: fast init done
> > 21:22:55.395488  Bad eraseblock 329 at 0x000002920000
> > 21:22:55.406832  Bad eraseblock 330 at 0x000002940000
> > 21:22:55.416885  Bad eraseblock 331 at 0x000002960000
> > 21:22:55.426736  Bad eraseblock 332 at 0x000002980000
> > 21:22:55.436732  Bad eraseblock 333 at 0x0000029a0000
> > 21:22:55.446864  Bad eraseblock 334 at 0x0000029c0000
> > 21:22:55.456662  Bad eraseblock 335 at 0x0000029e0000
> > 21:22:55.466785  Bad eraseblock 336 at 0x000002a00000
> > 21:22:55.476801  Bad eraseblock 337 at 0x000002a20000
> > 21:22:55.486772  Bad eraseblock 338 at 0x000002a40000
> > 21:22:55.496768  Bad eraseblock 339 at 0x000002a60000
> > 21:22:55.506607  Bad eraseblock 340 at 0x000002a80000
> > 21:22:55.516965  Bad eraseblock 341 at 0x000002aa0000
> > 21:22:55.526621  Bad eraseblock 342 at 0x000002ac0000
> > 21:22:55.536702  Bad eraseblock 343 at 0x000002ae0000
> > 21:22:55.546660  Bad eraseblock 344 at 0x000002b00000
> > 21:22:55.556745  Bad eraseblock 345 at 0x000002b20000
> > 21:22:56.172928  Bad eraseblock 1020 at 0x000007f80000
> > 21:22:56.187043  Bad eraseblock 1021 at 0x000007fa0000
> > 21:22:56.197437  Bad eraseblock 1022 at 0x000007fc0000
> > 21:22:56.212665  Bad eraseblock 1023 at 0x000007fe0000
> > 21:22:56.213356  No space left to write bad block table
> > 21:22:56.215012  nand_bbt: error while writing bad block table -28
> > 21:22:56.239353  mxc_nand: probe of d8000000.nand-controller failed with
> > error
> > -28
> > 
> > I'm not sure of the best way to address this issue. A few ideas came into
> > my
> > mind:
> > 
> > - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the
> > bad
> > block marker. But I'm not sure if this would already conflict with the ECC
> > hardware but the ooblayout functions would suggest that it could work. 
> 
> There are thousands of boards out there that would be broken with such
> change: it's too late to do changes in this driver, unfortunately.
> 
> > Unfortunately I don't have any hardware at hand at the moment to test it.
> > I
> > think the distinction between small and large pagesizes needs to be
> > reflected
> > on the bbt_descr as well.
> > 
> > - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment
> > saying
> > there is an overlap between the generic bbt descriptors and the ECC
> > hardware.
> > I'm not sure what other effects it might have to set NAND_BBT_NO_OOB.
> 
> Same here: that's not an option.
> 
> > - Explicitly check for the bad block marker during a search for the BBT
> > instead of using scan_block_fast
> 
> This look more reasonable. You can create a helper which does the
> scan_block_fast(), then eventually checks the beginning of the OOB
> buffer and tries to match with the ->td and ->md descriptors. This
> should work with all the legacy drivers implementing their own
> descriptors - hopefully.

Thanks for your input. I will take another spin at it.

> 
> Other drivers are impacted as well, so maybe you'll find a board for
> testing (or someone gentle enough that will test it for you).

I hope I'll get my hands at least on one of the imx27 boards.

Thanks,
Stefan

> 
> Thanks,
> Miquèl
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-10  8:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-17 15:59 imx27: No space left to write bad block table Fabio Estevam
2021-04-19  6:37 ` Miquel Raynal
2021-04-19 11:47   ` Fabio Estevam
2021-04-19 12:27     ` Miquel Raynal
2021-04-19 12:41       ` Fabio Estevam
2021-04-19 12:48         ` Fabio Estevam
2021-04-19 13:01           ` Fabio Estevam
2021-04-19 13:40           ` Miquel Raynal
2021-04-19 13:56             ` Fabio Estevam
2021-04-19 13:04       ` Stefan Riedmüller
2021-04-19 15:36         ` Miquel Raynal
2021-04-20  6:26           ` Stefan Riedmüller
2021-04-21 20:44             ` Guillaume Tucker
2021-04-21 23:29               ` Fabio Estevam
2021-04-22 13:16                 ` Guillaume Tucker
2021-04-22 13:28                   ` Fabio Estevam
2021-04-23 21:04                     ` Fabio Estevam
2021-04-26 15:53           ` Stefan Riedmüller
2021-05-04  8:34             ` Miquel Raynal
2021-05-10  8:38               ` Stefan Riedmüller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.