* imx27: No space left to write bad block table @ 2021-04-17 15:59 Fabio Estevam 2021-04-19 6:37 ` Miquel Raynal 0 siblings, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2021-04-17 15:59 UTC (permalink / raw) To: Miquel Raynal, Sascha Hauer, linux-mtd Hi, I noticed this error recently on a imx27-phytec-phycard-s-rdk reported on kernelci: nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 nand: ST Micro NAND01GR3B2CZA6 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 Bad block table not found for chip 0 Bad block table not found for chip 0 Scanning device for bad blocks random: fast init done Bad eraseblock 329 at 0x000002920000 Bad eraseblock 330 at 0x000002940000 Bad eraseblock 331 at 0x000002960000 Bad eraseblock 332 at 0x000002980000 Bad eraseblock 333 at 0x0000029a0000 Bad eraseblock 334 at 0x0000029c0000 Bad eraseblock 335 at 0x0000029e0000 Bad eraseblock 336 at 0x000002a00000 Bad eraseblock 337 at 0x000002a20000 Bad eraseblock 338 at 0x000002a40000 Bad eraseblock 339 at 0x000002a60000 Bad eraseblock 340 at 0x000002a80000 Bad eraseblock 341 at 0x000002aa0000 Bad eraseblock 342 at 0x000002ac0000 Bad eraseblock 343 at 0x000002ae0000 Bad eraseblock 344 at 0x000002b00000 Bad eraseblock 345 at 0x000002b20000 Bad eraseblock 1020 at 0x000007f80000 Bad eraseblock 1021 at 0x000007fa0000 Bad eraseblock 1022 at 0x000007fc0000 Bad eraseblock 1023 at 0x000007fe0000 No space left to write bad block table nand_bbt: error while writing bad block table -28 mxc_nand: probe of d8000000.nand-controller failed with error -28 Full log: https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html I don't have access to this board but just wanted to report it. Regards, Fabio Estevam ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-17 15:59 imx27: No space left to write bad block table Fabio Estevam @ 2021-04-19 6:37 ` Miquel Raynal 2021-04-19 11:47 ` Fabio Estevam 0 siblings, 1 reply; 20+ messages in thread From: Miquel Raynal @ 2021-04-19 6:37 UTC (permalink / raw) To: Fabio Estevam; +Cc: Sascha Hauer, linux-mtd Hi Fabio, Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22 -0300: > Hi, > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported > on kernelci: > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > nand: ST Micro NAND01GR3B2CZA6 > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > Bad block table not found for chip 0 > Bad block table not found for chip 0 > Scanning device for bad blocks > random: fast init done > Bad eraseblock 329 at 0x000002920000 > Bad eraseblock 330 at 0x000002940000 > Bad eraseblock 331 at 0x000002960000 > Bad eraseblock 332 at 0x000002980000 > Bad eraseblock 333 at 0x0000029a0000 > Bad eraseblock 334 at 0x0000029c0000 > Bad eraseblock 335 at 0x0000029e0000 > Bad eraseblock 336 at 0x000002a00000 > Bad eraseblock 337 at 0x000002a20000 > Bad eraseblock 338 at 0x000002a40000 > Bad eraseblock 339 at 0x000002a60000 > Bad eraseblock 340 at 0x000002a80000 > Bad eraseblock 341 at 0x000002aa0000 > Bad eraseblock 342 at 0x000002ac0000 > Bad eraseblock 343 at 0x000002ae0000 > Bad eraseblock 344 at 0x000002b00000 > Bad eraseblock 345 at 0x000002b20000 > Bad eraseblock 1020 at 0x000007f80000 > Bad eraseblock 1021 at 0x000007fa0000 > Bad eraseblock 1022 at 0x000007fc0000 > Bad eraseblock 1023 at 0x000007fe0000 > No space left to write bad block table > nand_bbt: error while writing bad block table -28 > mxc_nand: probe of d8000000.nand-controller failed with error -28 > > Full log: > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > I don't have access to this board but just wanted to report it. Thanks for the report! Indeed that's a misbehavior, this happens when *something* is not happening correctly and the board boots over and over, each time decrementing the block supposed to contain the BBT until there are none available anymore. However I'm not sure this has been caused by a recent issue as there have not been major changes in the core nor in this driver since your last fix. Maybe this is a leftover of the previous situation. Would this be possible? Do you have a mean to find out the day/kernel version which started failing? Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 6:37 ` Miquel Raynal @ 2021-04-19 11:47 ` Fabio Estevam 2021-04-19 12:27 ` Miquel Raynal 0 siblings, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2021-04-19 11:47 UTC (permalink / raw) To: Miquel Raynal, Guillaume Tucker; +Cc: Sascha Hauer, linux-mtd Hi Miquel, On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Fabio, > > Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22 > -0300: > > > Hi, > > > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported > > on kernelci: > > > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > > nand: ST Micro NAND01GR3B2CZA6 > > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > Bad block table not found for chip 0 > > Bad block table not found for chip 0 > > Scanning device for bad blocks > > random: fast init done > > Bad eraseblock 329 at 0x000002920000 > > Bad eraseblock 330 at 0x000002940000 > > Bad eraseblock 331 at 0x000002960000 > > Bad eraseblock 332 at 0x000002980000 > > Bad eraseblock 333 at 0x0000029a0000 > > Bad eraseblock 334 at 0x0000029c0000 > > Bad eraseblock 335 at 0x0000029e0000 > > Bad eraseblock 336 at 0x000002a00000 > > Bad eraseblock 337 at 0x000002a20000 > > Bad eraseblock 338 at 0x000002a40000 > > Bad eraseblock 339 at 0x000002a60000 > > Bad eraseblock 340 at 0x000002a80000 > > Bad eraseblock 341 at 0x000002aa0000 > > Bad eraseblock 342 at 0x000002ac0000 > > Bad eraseblock 343 at 0x000002ae0000 > > Bad eraseblock 344 at 0x000002b00000 > > Bad eraseblock 345 at 0x000002b20000 > > Bad eraseblock 1020 at 0x000007f80000 > > Bad eraseblock 1021 at 0x000007fa0000 > > Bad eraseblock 1022 at 0x000007fc0000 > > Bad eraseblock 1023 at 0x000007fe0000 > > No space left to write bad block table > > nand_bbt: error while writing bad block table -28 > > mxc_nand: probe of d8000000.nand-controller failed with error -28 > > > > Full log: > > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > > > I don't have access to this board but just wanted to report it. > > Thanks for the report! > > Indeed that's a misbehavior, this happens when *something* is not > happening correctly and the board boots over and over, each time > decrementing the block supposed to contain the BBT until there are none > available anymore. However I'm not sure this has been caused by a > recent issue as there have not been major changes in the core nor in > this driver since your last fix. Maybe this is a leftover of the > previous situation. Would this be possible? Do you have a mean to find > out the day/kernel version which started failing? I know it does not happen on master, only on linux-next. The oldest linux-next log I see listed for the imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401, which is also affected. Adding Guillaume in case kernelci could help to find the commit that causes the "No space left to write bad block table" message to appear. Thanks ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 11:47 ` Fabio Estevam @ 2021-04-19 12:27 ` Miquel Raynal 2021-04-19 12:41 ` Fabio Estevam 2021-04-19 13:04 ` Stefan Riedmüller 0 siblings, 2 replies; 20+ messages in thread From: Miquel Raynal @ 2021-04-19 12:27 UTC (permalink / raw) To: Fabio Estevam Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller Hi Fabio, Guillaume, +Stephan Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 08:47:56 -0300: > Hi Miquel, > > On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Fabio, > > > > Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22 > > -0300: > > > > > Hi, > > > > > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported > > > on kernelci: > > > > > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > > > nand: ST Micro NAND01GR3B2CZA6 > > > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > Bad block table not found for chip 0 > > > Bad block table not found for chip 0 > > > Scanning device for bad blocks > > > random: fast init done > > > Bad eraseblock 329 at 0x000002920000 > > > Bad eraseblock 330 at 0x000002940000 > > > Bad eraseblock 331 at 0x000002960000 > > > Bad eraseblock 332 at 0x000002980000 > > > Bad eraseblock 333 at 0x0000029a0000 > > > Bad eraseblock 334 at 0x0000029c0000 > > > Bad eraseblock 335 at 0x0000029e0000 > > > Bad eraseblock 336 at 0x000002a00000 > > > Bad eraseblock 337 at 0x000002a20000 > > > Bad eraseblock 338 at 0x000002a40000 > > > Bad eraseblock 339 at 0x000002a60000 > > > Bad eraseblock 340 at 0x000002a80000 > > > Bad eraseblock 341 at 0x000002aa0000 > > > Bad eraseblock 342 at 0x000002ac0000 > > > Bad eraseblock 343 at 0x000002ae0000 > > > Bad eraseblock 344 at 0x000002b00000 > > > Bad eraseblock 345 at 0x000002b20000 > > > Bad eraseblock 1020 at 0x000007f80000 > > > Bad eraseblock 1021 at 0x000007fa0000 > > > Bad eraseblock 1022 at 0x000007fc0000 > > > Bad eraseblock 1023 at 0x000007fe0000 > > > No space left to write bad block table > > > nand_bbt: error while writing bad block table -28 > > > mxc_nand: probe of d8000000.nand-controller failed with error -28 > > > > > > Full log: > > > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > > > > > I don't have access to this board but just wanted to report it. > > > > Thanks for the report! > > > > Indeed that's a misbehavior, this happens when *something* is not > > happening correctly and the board boots over and over, each time > > decrementing the block supposed to contain the BBT until there are none > > available anymore. However I'm not sure this has been caused by a > > recent issue as there have not been major changes in the core nor in > > this driver since your last fix. Maybe this is a leftover of the > > previous situation. Would this be possible? Do you have a mean to find > > out the day/kernel version which started failing? > > I know it does not happen on master, only on linux-next. > > The oldest linux-next log I see listed for the > imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401, > which is also affected. > > Adding Guillaume in case kernelci could help to find the commit that > causes the "No space left to write bad block table" message to appear. Interesting. Maybe I overlooked the below commit when applying. Indeed, BBT may be considered as bad blocks, so I wonder if the below change is valid now... Guillaume, would you have a way to revert this patch on top of linux-next? Stefan, would you mind giving more details on the testing procedure? ---8<--- commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 Author: Stefan Riedmueller <s.riedmueller@phytec.de> Date: Thu Mar 25 11:23:37 2021 +0100 mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND The blocks containing the bad block table can become bad as well. So make sure to skip any blocks that are marked bad when searching for the bad block table. Otherwise in very rare cases where two BBT blocks wear out it might happen that an obsolete BBT is used instead of a newer available version. Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de diff --git a/drivers/mtd/nand/raw/nand_bbt.c b/drivers/mtd/nand/raw/nand_bbt.c index dced32a126d9..6e25a5ce5ba9 100644 --- a/drivers/mtd/nand/raw/nand_bbt.c +++ b/drivers/mtd/nand/raw/nand_bbt.c @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t *buf, { u64 targetsize = nanddev_target_size(&this->base); struct mtd_info *mtd = nand_to_mtd(this); + struct nand_bbt_descr *bd = this->badblock_pattern; int i, chips; int startblock, block, dir; int scanlen = mtd->writesize + mtd->oobsize; @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t *buf, int actblock = startblock + dir * block; loff_t offs = (loff_t)actblock << this->bbt_erase_shift; + /* Check if block is marked bad */ + if (scan_block_fast(this, bd, offs, buf)) + continue; + /* Read first page */ scan_read(this, buf, offs, mtd->writesize, td); if (!check_pattern(buf, scanlen, mtd->writesize, td)) { Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 12:27 ` Miquel Raynal @ 2021-04-19 12:41 ` Fabio Estevam 2021-04-19 12:48 ` Fabio Estevam 2021-04-19 13:04 ` Stefan Riedmüller 1 sibling, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2021-04-19 12:41 UTC (permalink / raw) To: Miquel Raynal Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller Hi Miquel, On Mon, Apr 19, 2021 at 9:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > Author: Stefan Riedmueller <s.riedmueller@phytec.de> > Date: Thu Mar 25 11:23:37 2021 +0100 > > mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND > > The blocks containing the bad block table can become bad as well. So > make sure to skip any blocks that are marked bad when searching for the > bad block table. > > Otherwise in very rare cases where two BBT blocks wear out it might > happen that an obsolete BBT is used instead of a newer available > version. > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > Link: https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de This commit landed in linux-next 20210329. I was able to find the kernelci log for this version and NAND is correctly probed: https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html The first NAND error starts with 20210330: https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html Regards, Fabio Estevam ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 12:41 ` Fabio Estevam @ 2021-04-19 12:48 ` Fabio Estevam 2021-04-19 13:01 ` Fabio Estevam 2021-04-19 13:40 ` Miquel Raynal 0 siblings, 2 replies; 20+ messages in thread From: Fabio Estevam @ 2021-04-19 12:48 UTC (permalink / raw) To: Miquel Raynal Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote: > This commit landed in linux-next 20210329. I was able to find the > kernelci log for this version and NAND is correctly probed: > https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > The first NAND error starts with 20210330: > https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html linux-next 20210329 introduced the following logs that were not present previously: Bad block table written to 0x000007fa0000, version 0x01 Bad block table written to 0x000007f80000, version 0x01 Maybe this new 'two Bad block tables' will confuse the subsequent boots? ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 12:48 ` Fabio Estevam @ 2021-04-19 13:01 ` Fabio Estevam 2021-04-19 13:40 ` Miquel Raynal 1 sibling, 0 replies; 20+ messages in thread From: Fabio Estevam @ 2021-04-19 13:01 UTC (permalink / raw) To: Miquel Raynal Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller On Mon, Apr 19, 2021 at 9:48 AM Fabio Estevam <festevam@gmail.com> wrote: > > On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote: > > > This commit landed in linux-next 20210329. I was able to find the > > kernelci log for this version and NAND is correctly probed: > > https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > > > The first NAND error starts with 20210330: > > https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > linux-next 20210329 introduced the following logs that were not > present previously: > > Bad block table written to 0x000007fa0000, version 0x01 > Bad block table written to 0x000007f80000, version 0x01 > > Maybe this new 'two Bad block tables' will confuse the subsequent boots? Also, prior to linux-next 20210329 the Bad Block table could be correctly located: https://storage.kernelci.org/next/master/next-20210324/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html Bad block table found at page 65472, version 0x01 Bad block table found at page 65408, version 0x01 which matches the Bad block table reported by Barebox. However, in linux-next 20210329 the bad block table cannot be found anymore: Bad block table not found for chip 0 Bad block table not found for chip 0 So in fact there is a regression starting with linux-next 20210329. Could it be caused by bd9c9fe2ad04 ("mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND")? Thanks ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 12:48 ` Fabio Estevam 2021-04-19 13:01 ` Fabio Estevam @ 2021-04-19 13:40 ` Miquel Raynal 2021-04-19 13:56 ` Fabio Estevam 1 sibling, 1 reply; 20+ messages in thread From: Miquel Raynal @ 2021-04-19 13:40 UTC (permalink / raw) To: Fabio Estevam Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller Hi Fabio, Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 09:48:20 -0300: > On Mon, Apr 19, 2021 at 9:41 AM Fabio Estevam <festevam@gmail.com> wrote: > > > This commit landed in linux-next 20210329. I was able to find the > > kernelci log for this version and NAND is correctly probed: > > https://storage.kernelci.org/next/master/next-20210329/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > > > The first NAND error starts with 20210330: > > https://storage.kernelci.org/next/master/next-20210330/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > linux-next 20210329 introduced the following logs that were not > present previously: > > Bad block table written to 0x000007fa0000, version 0x01 > Bad block table written to 0x000007f80000, version 0x01 > > Maybe this new 'two Bad block tables' will confuse the subsequent boots? I am pretty sure now the commit I pointed earlier today is the root cause (but I don't know why, yet). Somehow it skips the bad block table which is more or less declared like a bad block from a 'low level' point of view (so that the user cannot erase/overwrite it). Here, the kernel does not find the valid BBT. It then creates a new couple of BBT. But doing so at the next boot, the recently created BBT won't be detected anymore... until there are no more free blocks reserved for that and that's where the probe fails. So yes, the NAND controller driver probes correctly with next-20210329 but in fact the real bad block table is not found and this is the root cause. Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 13:40 ` Miquel Raynal @ 2021-04-19 13:56 ` Fabio Estevam 0 siblings, 0 replies; 20+ messages in thread From: Fabio Estevam @ 2021-04-19 13:56 UTC (permalink / raw) To: Miquel Raynal Cc: Guillaume Tucker, Sascha Hauer, linux-mtd, Stefan Riedmueller Hi Miquel, On Mon, Apr 19, 2021 at 10:40 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > I am pretty sure now the commit I pointed earlier today is the root > cause (but I don't know why, yet). Somehow it skips the bad block table > which is more or less declared like a bad block from a 'low level' > point of view (so that the user cannot erase/overwrite it). Here, > the kernel does not find the valid BBT. It then creates a new couple of > BBT. But doing so at the next boot, the recently created BBT won't be > detected anymore... until there are no more free blocks reserved for > that and that's where the probe fails. > > So yes, the NAND controller driver probes correctly with next-20210329 > but in fact the real bad block table is not found and this is the root > cause. Ok, good. I will submit a revert then. Thanks ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 12:27 ` Miquel Raynal 2021-04-19 12:41 ` Fabio Estevam @ 2021-04-19 13:04 ` Stefan Riedmüller 2021-04-19 15:36 ` Miquel Raynal 1 sibling, 1 reply; 20+ messages in thread From: Stefan Riedmüller @ 2021-04-19 13:04 UTC (permalink / raw) To: festevam, miquel.raynal; +Cc: guillaume.tucker, kernel, linux-mtd Hi Miquel, Fabio, On Mon, 2021-04-19 at 14:27 +0200, Miquel Raynal wrote: > Hi Fabio, Guillaume, > > +Stephan > > Fabio Estevam <festevam@gmail.com> wrote on Mon, 19 Apr 2021 08:47:56 > -0300: > > > Hi Miquel, > > > > On Mon, Apr 19, 2021 at 3:37 AM Miquel Raynal <miquel.raynal@bootlin.com> > > wrote: > > > Hi Fabio, > > > > > > Fabio Estevam <festevam@gmail.com> wrote on Sat, 17 Apr 2021 12:59:22 > > > -0300: > > > > > > > Hi, > > > > > > > > I noticed this error recently on a imx27-phytec-phycard-s-rdk reported > > > > on kernelci: > > > > > > > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > > > > nand: ST Micro NAND01GR3B2CZA6 > > > > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > > Bad block table not found for chip 0 > > > > Bad block table not found for chip 0 > > > > Scanning device for bad blocks > > > > random: fast init done > > > > Bad eraseblock 329 at 0x000002920000 > > > > Bad eraseblock 330 at 0x000002940000 > > > > Bad eraseblock 331 at 0x000002960000 > > > > Bad eraseblock 332 at 0x000002980000 > > > > Bad eraseblock 333 at 0x0000029a0000 > > > > Bad eraseblock 334 at 0x0000029c0000 > > > > Bad eraseblock 335 at 0x0000029e0000 > > > > Bad eraseblock 336 at 0x000002a00000 > > > > Bad eraseblock 337 at 0x000002a20000 > > > > Bad eraseblock 338 at 0x000002a40000 > > > > Bad eraseblock 339 at 0x000002a60000 > > > > Bad eraseblock 340 at 0x000002a80000 > > > > Bad eraseblock 341 at 0x000002aa0000 > > > > Bad eraseblock 342 at 0x000002ac0000 > > > > Bad eraseblock 343 at 0x000002ae0000 > > > > Bad eraseblock 344 at 0x000002b00000 > > > > Bad eraseblock 345 at 0x000002b20000 > > > > Bad eraseblock 1020 at 0x000007f80000 > > > > Bad eraseblock 1021 at 0x000007fa0000 > > > > Bad eraseblock 1022 at 0x000007fc0000 > > > > Bad eraseblock 1023 at 0x000007fe0000 > > > > No space left to write bad block table > > > > nand_bbt: error while writing bad block table -28 > > > > mxc_nand: probe of d8000000.nand-controller failed with error -28 > > > > > > > > Full log: > > > > https://storage.kernelci.org/next/master/next-20210416/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html > > > > > > > > I don't have access to this board but just wanted to report it. > > > > > > Thanks for the report! > > > > > > Indeed that's a misbehavior, this happens when *something* is not > > > happening correctly and the board boots over and over, each time > > > decrementing the block supposed to contain the BBT until there are none > > > available anymore. However I'm not sure this has been caused by a > > > recent issue as there have not been major changes in the core nor in > > > this driver since your last fix. Maybe this is a leftover of the > > > previous situation. Would this be possible? Do you have a mean to find > > > out the day/kernel version which started failing? > > > > I know it does not happen on master, only on linux-next. > > > > The oldest linux-next log I see listed for the > > imx27-phytec-phycard-s-rdk board that I see on kernelci is 20210401, > > which is also affected. > > > > Adding Guillaume in case kernelci could help to find the commit that > > causes the "No space left to write bad block table" message to appear. > > Interesting. Maybe I overlooked the below commit when applying. Indeed, > BBT may be considered as bad blocks, so I wonder if the below change is > valid now... > > Guillaume, would you have a way to revert this patch on top of > linux-next? Stefan, would you mind giving more details on the testing > procedure? I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply returning -EIO in nand_erase_nand when the block to be erased is one of the first two BBT blocks. I have seen this once on a customer board but were not able to reproduce it anymore, thus the simulation of the two bad blocks. Without the patch below new versions of the BBT can no longer be written to the first two blocks reserved for the BBT but they are still evaluated to read the BBT from during boot due the lack of a test if these blocks are bad. So changes to the BBT after these two blocks turn bad are only kept and used until the next reboot where again the old version of the two worn blocks is used as a basis. I tried to use the same mechanism that is used to identify bad blocks during a scan for bad blocks. But maybe I missed something there? Or were my assumptions wrong in the first place? Regards, Stefan > > ---8<--- > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > Author: Stefan Riedmueller <s.riedmueller@phytec.de> > Date: Thu Mar 25 11:23:37 2021 +0100 > > mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND > > The blocks containing the bad block table can become bad as well. So > make sure to skip any blocks that are marked bad when searching for the > bad block table. > > Otherwise in very rare cases where two BBT blocks wear out it might > happen that an obsolete BBT is used instead of a newer available > version. > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > Link: > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c > b/drivers/mtd/nand/raw/nand_bbt.c > index dced32a126d9..6e25a5ce5ba9 100644 > --- a/drivers/mtd/nand/raw/nand_bbt.c > +++ b/drivers/mtd/nand/raw/nand_bbt.c > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t > *buf, > { > u64 targetsize = nanddev_target_size(&this->base); > struct mtd_info *mtd = nand_to_mtd(this); > + struct nand_bbt_descr *bd = this->badblock_pattern; > int i, chips; > int startblock, block, dir; > int scanlen = mtd->writesize + mtd->oobsize; > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t > *buf, > int actblock = startblock + dir * block; > loff_t offs = (loff_t)actblock << this- > >bbt_erase_shift; > > + /* Check if block is marked bad */ > + if (scan_block_fast(this, bd, offs, buf)) > + continue; > + > /* Read first page */ > scan_read(this, buf, offs, mtd->writesize, td); > if (!check_pattern(buf, scanlen, mtd->writesize, > td)) { > > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 13:04 ` Stefan Riedmüller @ 2021-04-19 15:36 ` Miquel Raynal 2021-04-20 6:26 ` Stefan Riedmüller 2021-04-26 15:53 ` Stefan Riedmüller 0 siblings, 2 replies; 20+ messages in thread From: Miquel Raynal @ 2021-04-19 15:36 UTC (permalink / raw) To: Stefan Riedmüller; +Cc: festevam, guillaume.tucker, kernel, linux-mtd Hi Stefan, > > Interesting. Maybe I overlooked the below commit when applying. Indeed, > > BBT may be considered as bad blocks, so I wonder if the below change is > > valid now... > > > > Guillaume, would you have a way to revert this patch on top of > > linux-next? Stefan, would you mind giving more details on the testing > > procedure? > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply > returning -EIO in nand_erase_nand when the block to be erased is one of the > first two BBT blocks. > > I have seen this once on a customer board but were not able to reproduce it > anymore, thus the simulation of the two bad blocks. > > Without the patch below new versions of the BBT can no longer be written to > the first two blocks reserved for the BBT but they are still evaluated to read > the BBT from during boot due the lack of a test if these blocks are bad. So > changes to the BBT after these two blocks turn bad are only kept and used > until the next reboot where again the old version of the two worn blocks is > used as a basis. > > I tried to use the same mechanism that is used to identify bad blocks during a > scan for bad blocks. But maybe I missed something there? Or were my > assumptions wrong in the first place? Honestly I don't know what is wrong exactly in this patch. We will revert the commit as it clearly breaks something fundamental and the merge window is too close to adopt a hackish attitude. I would propose the following tests with your board: - Hack the core to allow yourself to access bad blocks from userspace for testing purposes. - With the below commit, you should have the same behavior than reported by Fabio. - Revert the commit. - Manually change the bad block markers (nanddump, flash_erase, nandwrite) to declare the two tables bad. Reboot and observe if there are any issues. You can try to work from there. > > ---8<--- > > > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > > Author: Stefan Riedmueller <s.riedmueller@phytec.de> > > Date: Thu Mar 25 11:23:37 2021 +0100 > > > > mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND > > > > The blocks containing the bad block table can become bad as well. So > > make sure to skip any blocks that are marked bad when searching for the > > bad block table. > > > > Otherwise in very rare cases where two BBT blocks wear out it might > > happen that an obsolete BBT is used instead of a newer available > > version. > > > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > Link: > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de > > > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c > > b/drivers/mtd/nand/raw/nand_bbt.c > > index dced32a126d9..6e25a5ce5ba9 100644 > > --- a/drivers/mtd/nand/raw/nand_bbt.c > > +++ b/drivers/mtd/nand/raw/nand_bbt.c > > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, uint8_t > > *buf, > > { > > u64 targetsize = nanddev_target_size(&this->base); > > struct mtd_info *mtd = nand_to_mtd(this); > > + struct nand_bbt_descr *bd = this->badblock_pattern; > > int i, chips; > > int startblock, block, dir; > > int scanlen = mtd->writesize + mtd->oobsize; > > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, uint8_t > > *buf, > > int actblock = startblock + dir * block; > > loff_t offs = (loff_t)actblock << this- > > >bbt_erase_shift; > > > > + /* Check if block is marked bad */ > > + if (scan_block_fast(this, bd, offs, buf)) > > + continue; > > + > > /* Read first page */ > > scan_read(this, buf, offs, mtd->writesize, td); > > if (!check_pattern(buf, scanlen, mtd->writesize, > > td)) { > > > > > > Thanks, > > Miquèl Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 15:36 ` Miquel Raynal @ 2021-04-20 6:26 ` Stefan Riedmüller 2021-04-21 20:44 ` Guillaume Tucker 2021-04-26 15:53 ` Stefan Riedmüller 1 sibling, 1 reply; 20+ messages in thread From: Stefan Riedmüller @ 2021-04-20 6:26 UTC (permalink / raw) To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd Hi Miquel, On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote: > Hi Stefan, > > > > Interesting. Maybe I overlooked the below commit when applying. Indeed, > > > BBT may be considered as bad blocks, so I wonder if the below change is > > > valid now... > > > > > > Guillaume, would you have a way to revert this patch on top of > > > linux-next? Stefan, would you mind giving more details on the testing > > > procedure? > > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply > > returning -EIO in nand_erase_nand when the block to be erased is one of > > the > > first two BBT blocks. > > > > I have seen this once on a customer board but were not able to reproduce > > it > > anymore, thus the simulation of the two bad blocks. > > > > Without the patch below new versions of the BBT can no longer be written > > to > > the first two blocks reserved for the BBT but they are still evaluated to > > read > > the BBT from during boot due the lack of a test if these blocks are bad. > > So > > changes to the BBT after these two blocks turn bad are only kept and used > > until the next reboot where again the old version of the two worn blocks > > is > > used as a basis. > > > > I tried to use the same mechanism that is used to identify bad blocks > > during a > > scan for bad blocks. But maybe I missed something there? Or were my > > assumptions wrong in the first place? > > Honestly I don't know what is wrong exactly in this patch. > > We will revert the commit as it clearly breaks something fundamental > and the merge window is too close to adopt a hackish attitude. > > I would propose the following tests with your board: > - Hack the core to allow yourself to access bad blocks from userspace > for testing purposes. > - With the below commit, you should have the same behavior than > reported by Fabio. > - Revert the commit. > - Manually change the bad block markers (nanddump, flash_erase, > nandwrite) to declare the two tables bad. Reboot and observe if there > are any issues. You can try to work from there. Thanks for the input! I will follow your suggestions and let you guys know my findings. Regards, Stefan > > > > ---8<--- > > > > > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > > > Author: Stefan Riedmueller <s.riedmueller@phytec.de> > > > Date: Thu Mar 25 11:23:37 2021 +0100 > > > > > > mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in > > > NAND > > > > > > The blocks containing the bad block table can become bad as well. So > > > make sure to skip any blocks that are marked bad when searching for > > > the > > > bad block table. > > > > > > Otherwise in very rare cases where two BBT blocks wear out it might > > > happen that an obsolete BBT is used instead of a newer available > > > version. > > > > > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > Link: > > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de > > > > > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c > > > b/drivers/mtd/nand/raw/nand_bbt.c > > > index dced32a126d9..6e25a5ce5ba9 100644 > > > --- a/drivers/mtd/nand/raw/nand_bbt.c > > > +++ b/drivers/mtd/nand/raw/nand_bbt.c > > > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, > > > uint8_t > > > *buf, > > > { > > > u64 targetsize = nanddev_target_size(&this->base); > > > struct mtd_info *mtd = nand_to_mtd(this); > > > + struct nand_bbt_descr *bd = this->badblock_pattern; > > > int i, chips; > > > int startblock, block, dir; > > > int scanlen = mtd->writesize + mtd->oobsize; > > > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, > > > uint8_t > > > *buf, > > > int actblock = startblock + dir * block; > > > loff_t offs = (loff_t)actblock << this- > > > > bbt_erase_shift; > > > > > > + /* Check if block is marked bad */ > > > + if (scan_block_fast(this, bd, offs, buf)) > > > + continue; > > > + > > > /* Read first page */ > > > scan_read(this, buf, offs, mtd->writesize, td); > > > if (!check_pattern(buf, scanlen, mtd->writesize, > > > td)) { > > > > > > > > > Thanks, > > > Miquèl > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-20 6:26 ` Stefan Riedmüller @ 2021-04-21 20:44 ` Guillaume Tucker 2021-04-21 23:29 ` Fabio Estevam 0 siblings, 1 reply; 20+ messages in thread From: Guillaume Tucker @ 2021-04-21 20:44 UTC (permalink / raw) To: Stefan Riedmüller, miquel.raynal; +Cc: festevam, kernel, linux-mtd On 20/04/2021 07:26, Stefan Riedmüller wrote: > Hi Miquel, > > On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote: >> Hi Stefan, >> >>>> Interesting. Maybe I overlooked the below commit when applying. Indeed, >>>> BBT may be considered as bad blocks, so I wonder if the below change is >>>> valid now... >>>> >>>> Guillaume, would you have a way to revert this patch on top of >>>> linux-next? Stefan, would you mind giving more details on the testing >>>> procedure? Sorry I'm late to the party, was busy with some other kernelci issues. I gather this is being reverted anyway now, but please let me know if you still need to check anything. As far as I can tell, there hasn't been any automated bisection landing on this commit. It's generally possible to re-run anything, i.e. make a kernel build with a custom patchset and run one given test on any of the platforms in KernelCI. There just isn't any public self-service for doing that (yet). Best wishes, Guillaume >>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply >>> returning -EIO in nand_erase_nand when the block to be erased is one of >>> the >>> first two BBT blocks. >>> >>> I have seen this once on a customer board but were not able to reproduce >>> it >>> anymore, thus the simulation of the two bad blocks. >>> >>> Without the patch below new versions of the BBT can no longer be written >>> to >>> the first two blocks reserved for the BBT but they are still evaluated to >>> read >>> the BBT from during boot due the lack of a test if these blocks are bad. >>> So >>> changes to the BBT after these two blocks turn bad are only kept and used >>> until the next reboot where again the old version of the two worn blocks >>> is >>> used as a basis. >>> >>> I tried to use the same mechanism that is used to identify bad blocks >>> during a >>> scan for bad blocks. But maybe I missed something there? Or were my >>> assumptions wrong in the first place? >> >> Honestly I don't know what is wrong exactly in this patch. >> >> We will revert the commit as it clearly breaks something fundamental >> and the merge window is too close to adopt a hackish attitude. >> >> I would propose the following tests with your board: >> - Hack the core to allow yourself to access bad blocks from userspace >> for testing purposes. >> - With the below commit, you should have the same behavior than >> reported by Fabio. >> - Revert the commit. >> - Manually change the bad block markers (nanddump, flash_erase, >> nandwrite) to declare the two tables bad. Reboot and observe if there >> are any issues. You can try to work from there. > > Thanks for the input! I will follow your suggestions and let you guys know my > findings. > > Regards, > Stefan > >> >>>> ---8<--- >>>> >>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 >>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de> >>>> Date: Thu Mar 25 11:23:37 2021 +0100 >>>> >>>> mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in >>>> NAND >>>> >>>> The blocks containing the bad block table can become bad as well. So >>>> make sure to skip any blocks that are marked bad when searching for >>>> the >>>> bad block table. >>>> >>>> Otherwise in very rare cases where two BBT blocks wear out it might >>>> happen that an obsolete BBT is used instead of a newer available >>>> version. >>>> >>>> Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> >>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> >>>> Link: >>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de >>>> >>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c >>>> b/drivers/mtd/nand/raw/nand_bbt.c >>>> index dced32a126d9..6e25a5ce5ba9 100644 >>>> --- a/drivers/mtd/nand/raw/nand_bbt.c >>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c >>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, >>>> uint8_t >>>> *buf, >>>> { >>>> u64 targetsize = nanddev_target_size(&this->base); >>>> struct mtd_info *mtd = nand_to_mtd(this); >>>> + struct nand_bbt_descr *bd = this->badblock_pattern; >>>> int i, chips; >>>> int startblock, block, dir; >>>> int scanlen = mtd->writesize + mtd->oobsize; >>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, >>>> uint8_t >>>> *buf, >>>> int actblock = startblock + dir * block; >>>> loff_t offs = (loff_t)actblock << this- >>>>> bbt_erase_shift; >>>> >>>> + /* Check if block is marked bad */ >>>> + if (scan_block_fast(this, bd, offs, buf)) >>>> + continue; >>>> + >>>> /* Read first page */ >>>> scan_read(this, buf, offs, mtd->writesize, td); >>>> if (!check_pattern(buf, scanlen, mtd->writesize, >>>> td)) { >>>> >>>> >>>> Thanks, >>>> Miquèl >> >> Thanks, >> Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-21 20:44 ` Guillaume Tucker @ 2021-04-21 23:29 ` Fabio Estevam 2021-04-22 13:16 ` Guillaume Tucker 0 siblings, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2021-04-21 23:29 UTC (permalink / raw) To: Guillaume Tucker; +Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd Hi Guillaume, On Wed, Apr 21, 2021 at 5:44 PM Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > Sorry I'm late to the party, was busy with some other kernelci > issues. I gather this is being reverted anyway now, but please > let me know if you still need to check anything. As far as I can > tell, there hasn't been any automated bisection landing on this > commit. Thanks. Yes, we did the revert in linux-next, but I could not see the next-20210421 boot log for the imx27-phytec-phycard-s-rdk board to confirm that the NAND bad block table can be found again. Thanks for your help > > It's generally possible to re-run anything, i.e. make a kernel > build with a custom patchset and run one given test on any of the > platforms in KernelCI. There just isn't any public self-service > for doing that (yet). > > Best wishes, > Guillaume > > >>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply > >>> returning -EIO in nand_erase_nand when the block to be erased is one of > >>> the > >>> first two BBT blocks. > >>> > >>> I have seen this once on a customer board but were not able to reproduce > >>> it > >>> anymore, thus the simulation of the two bad blocks. > >>> > >>> Without the patch below new versions of the BBT can no longer be written > >>> to > >>> the first two blocks reserved for the BBT but they are still evaluated to > >>> read > >>> the BBT from during boot due the lack of a test if these blocks are bad. > >>> So > >>> changes to the BBT after these two blocks turn bad are only kept and used > >>> until the next reboot where again the old version of the two worn blocks > >>> is > >>> used as a basis. > >>> > >>> I tried to use the same mechanism that is used to identify bad blocks > >>> during a > >>> scan for bad blocks. But maybe I missed something there? Or were my > >>> assumptions wrong in the first place? > >> > >> Honestly I don't know what is wrong exactly in this patch. > >> > >> We will revert the commit as it clearly breaks something fundamental > >> and the merge window is too close to adopt a hackish attitude. > >> > >> I would propose the following tests with your board: > >> - Hack the core to allow yourself to access bad blocks from userspace > >> for testing purposes. > >> - With the below commit, you should have the same behavior than > >> reported by Fabio. > >> - Revert the commit. > >> - Manually change the bad block markers (nanddump, flash_erase, > >> nandwrite) to declare the two tables bad. Reboot and observe if there > >> are any issues. You can try to work from there. > > > > Thanks for the input! I will follow your suggestions and let you guys know my > > findings. > > > > Regards, > > Stefan > > > >> > >>>> ---8<--- > >>>> > >>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > >>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de> > >>>> Date: Thu Mar 25 11:23:37 2021 +0100 > >>>> > >>>> mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in > >>>> NAND > >>>> > >>>> The blocks containing the bad block table can become bad as well. So > >>>> make sure to skip any blocks that are marked bad when searching for > >>>> the > >>>> bad block table. > >>>> > >>>> Otherwise in very rare cases where two BBT blocks wear out it might > >>>> happen that an obsolete BBT is used instead of a newer available > >>>> version. > >>>> > >>>> Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > >>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > >>>> Link: > >>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de > >>>> > >>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c > >>>> b/drivers/mtd/nand/raw/nand_bbt.c > >>>> index dced32a126d9..6e25a5ce5ba9 100644 > >>>> --- a/drivers/mtd/nand/raw/nand_bbt.c > >>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c > >>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, > >>>> uint8_t > >>>> *buf, > >>>> { > >>>> u64 targetsize = nanddev_target_size(&this->base); > >>>> struct mtd_info *mtd = nand_to_mtd(this); > >>>> + struct nand_bbt_descr *bd = this->badblock_pattern; > >>>> int i, chips; > >>>> int startblock, block, dir; > >>>> int scanlen = mtd->writesize + mtd->oobsize; > >>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, > >>>> uint8_t > >>>> *buf, > >>>> int actblock = startblock + dir * block; > >>>> loff_t offs = (loff_t)actblock << this- > >>>>> bbt_erase_shift; > >>>> > >>>> + /* Check if block is marked bad */ > >>>> + if (scan_block_fast(this, bd, offs, buf)) > >>>> + continue; > >>>> + > >>>> /* Read first page */ > >>>> scan_read(this, buf, offs, mtd->writesize, td); > >>>> if (!check_pattern(buf, scanlen, mtd->writesize, > >>>> td)) { > >>>> > >>>> > >>>> Thanks, > >>>> Miquèl > >> > >> Thanks, > >> Miquèl > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-21 23:29 ` Fabio Estevam @ 2021-04-22 13:16 ` Guillaume Tucker 2021-04-22 13:28 ` Fabio Estevam 0 siblings, 1 reply; 20+ messages in thread From: Guillaume Tucker @ 2021-04-22 13:16 UTC (permalink / raw) To: Fabio Estevam Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd, Jan Lübbe, kernelci-results +Jan, +kernelci-results On 22/04/2021 00:29, Fabio Estevam wrote: > Hi Guillaume, > > On Wed, Apr 21, 2021 at 5:44 PM Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > >> Sorry I'm late to the party, was busy with some other kernelci >> issues. I gather this is being reverted anyway now, but please >> let me know if you still need to check anything. As far as I can >> tell, there hasn't been any automated bisection landing on this >> commit. > > Thanks. Yes, we did the revert in linux-next, but I could not see the > next-20210421 boot log for the imx27-phytec-phycard-s-rdk board to > confirm that the NAND bad block table can be found again. This device is only available in Pengutronix's lab which is currently being moved to a new location, so I'm being told. I guess we'll check again when it's back online. Are you aware of any other platform in KernelCI showing the same issue? I could take a look but there's been more boot failure regressions than usual on linux-next recently... Best wishes, Guillaume >> It's generally possible to re-run anything, i.e. make a kernel >> build with a custom patchset and run one given test on any of the >> platforms in KernelCI. There just isn't any public self-service >> for doing that (yet). >> >> Best wishes, >> Guillaume >> >>>>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply >>>>> returning -EIO in nand_erase_nand when the block to be erased is one of >>>>> the >>>>> first two BBT blocks. >>>>> >>>>> I have seen this once on a customer board but were not able to reproduce >>>>> it >>>>> anymore, thus the simulation of the two bad blocks. >>>>> >>>>> Without the patch below new versions of the BBT can no longer be written >>>>> to >>>>> the first two blocks reserved for the BBT but they are still evaluated to >>>>> read >>>>> the BBT from during boot due the lack of a test if these blocks are bad. >>>>> So >>>>> changes to the BBT after these two blocks turn bad are only kept and used >>>>> until the next reboot where again the old version of the two worn blocks >>>>> is >>>>> used as a basis. >>>>> >>>>> I tried to use the same mechanism that is used to identify bad blocks >>>>> during a >>>>> scan for bad blocks. But maybe I missed something there? Or were my >>>>> assumptions wrong in the first place? >>>> >>>> Honestly I don't know what is wrong exactly in this patch. >>>> >>>> We will revert the commit as it clearly breaks something fundamental >>>> and the merge window is too close to adopt a hackish attitude. >>>> >>>> I would propose the following tests with your board: >>>> - Hack the core to allow yourself to access bad blocks from userspace >>>> for testing purposes. >>>> - With the below commit, you should have the same behavior than >>>> reported by Fabio. >>>> - Revert the commit. >>>> - Manually change the bad block markers (nanddump, flash_erase, >>>> nandwrite) to declare the two tables bad. Reboot and observe if there >>>> are any issues. You can try to work from there. >>> >>> Thanks for the input! I will follow your suggestions and let you guys know my >>> findings. >>> >>> Regards, >>> Stefan >>> >>>> >>>>>> ---8<--- >>>>>> >>>>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 >>>>>> Author: Stefan Riedmueller <s.riedmueller@phytec.de> >>>>>> Date: Thu Mar 25 11:23:37 2021 +0100 >>>>>> >>>>>> mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in >>>>>> NAND >>>>>> >>>>>> The blocks containing the bad block table can become bad as well. So >>>>>> make sure to skip any blocks that are marked bad when searching for >>>>>> the >>>>>> bad block table. >>>>>> >>>>>> Otherwise in very rare cases where two BBT blocks wear out it might >>>>>> happen that an obsolete BBT is used instead of a newer available >>>>>> version. >>>>>> >>>>>> Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> >>>>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> >>>>>> Link: >>>>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de >>>>>> >>>>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c >>>>>> b/drivers/mtd/nand/raw/nand_bbt.c >>>>>> index dced32a126d9..6e25a5ce5ba9 100644 >>>>>> --- a/drivers/mtd/nand/raw/nand_bbt.c >>>>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c >>>>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, >>>>>> uint8_t >>>>>> *buf, >>>>>> { >>>>>> u64 targetsize = nanddev_target_size(&this->base); >>>>>> struct mtd_info *mtd = nand_to_mtd(this); >>>>>> + struct nand_bbt_descr *bd = this->badblock_pattern; >>>>>> int i, chips; >>>>>> int startblock, block, dir; >>>>>> int scanlen = mtd->writesize + mtd->oobsize; >>>>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, >>>>>> uint8_t >>>>>> *buf, >>>>>> int actblock = startblock + dir * block; >>>>>> loff_t offs = (loff_t)actblock << this- >>>>>>> bbt_erase_shift; >>>>>> >>>>>> + /* Check if block is marked bad */ >>>>>> + if (scan_block_fast(this, bd, offs, buf)) >>>>>> + continue; >>>>>> + >>>>>> /* Read first page */ >>>>>> scan_read(this, buf, offs, mtd->writesize, td); >>>>>> if (!check_pattern(buf, scanlen, mtd->writesize, >>>>>> td)) { >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Miquèl >>>> >>>> Thanks, >>>> Miquèl >> ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-22 13:16 ` Guillaume Tucker @ 2021-04-22 13:28 ` Fabio Estevam 2021-04-23 21:04 ` Fabio Estevam 0 siblings, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2021-04-22 13:28 UTC (permalink / raw) To: Guillaume Tucker Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd, Jan Lübbe, kernelci-results Hi Guillaume, On Thu, Apr 22, 2021 at 10:16 AM Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > This device is only available in Pengutronix's lab which is > currently being moved to a new location, so I'm being told. I > guess we'll check again when it's back online. Ok, no problem. Yes, we can check again when it is back online. > Are you aware of any other platform in KernelCI showing the same > issue? I could take a look but there's been more boot failure > regressions than usual on linux-next recently... There should probably be other platforms affected, but I am not aware of a different platform than imx27-phytec-phycard-s-rdk in KernelCI at the moment. Thanks ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-22 13:28 ` Fabio Estevam @ 2021-04-23 21:04 ` Fabio Estevam 0 siblings, 0 replies; 20+ messages in thread From: Fabio Estevam @ 2021-04-23 21:04 UTC (permalink / raw) To: Guillaume Tucker Cc: Stefan Riedmüller, miquel.raynal, kernel, linux-mtd, Jan Lübbe, kernelci-results Hi Guillaume, On Thu, Apr 22, 2021 at 10:28 AM Fabio Estevam <festevam@gmail.com> wrote: > > Hi Guillaume, > > On Thu, Apr 22, 2021 at 10:16 AM Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > > > This device is only available in Pengutronix's lab which is > > currently being moved to a new location, so I'm being told. I > > guess we'll check again when it's back online. > > Ok, no problem. Yes, we can check again when it is back online. I see it is back: https://storage.kernelci.org/next/master/next-20210423/arm/imx_v4_v5_defconfig/gcc-8/lab-pengutronix/baseline-imx27-phytec-phycard-s-rdk.html NAND is correctly probed now, so we are all good! Thanks for the good work with kernelci! It is super helpful :-) Cheers ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-19 15:36 ` Miquel Raynal 2021-04-20 6:26 ` Stefan Riedmüller @ 2021-04-26 15:53 ` Stefan Riedmüller 2021-05-04 8:34 ` Miquel Raynal 1 sibling, 1 reply; 20+ messages in thread From: Stefan Riedmüller @ 2021-04-26 15:53 UTC (permalink / raw) To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd Hi Miquel, On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote: > Hi Stefan, > > > > Interesting. Maybe I overlooked the below commit when applying. Indeed, > > > BBT may be considered as bad blocks, so I wonder if the below change is > > > valid now... > > > > > > Guillaume, would you have a way to revert this patch on top of > > > linux-next? Stefan, would you mind giving more details on the testing > > > procedure? > > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply > > returning -EIO in nand_erase_nand when the block to be erased is one of > > the > > first two BBT blocks. > > > > I have seen this once on a customer board but were not able to reproduce > > it > > anymore, thus the simulation of the two bad blocks. > > > > Without the patch below new versions of the BBT can no longer be written > > to > > the first two blocks reserved for the BBT but they are still evaluated to > > read > > the BBT from during boot due the lack of a test if these blocks are bad. > > So > > changes to the BBT after these two blocks turn bad are only kept and used > > until the next reboot where again the old version of the two worn blocks > > is > > used as a basis. > > > > I tried to use the same mechanism that is used to identify bad blocks > > during a > > scan for bad blocks. But maybe I missed something there? Or were my > > assumptions wrong in the first place? > > Honestly I don't know what is wrong exactly in this patch. > > We will revert the commit as it clearly breaks something fundamental > and the merge window is too close to adopt a hackish attitude. > > I would propose the following tests with your board: > - Hack the core to allow yourself to access bad blocks from userspace > for testing purposes. > - With the below commit, you should have the same behavior than > reported by Fabio. On my imx6 board the patch does not lead to the behavior reported by Fabio. The BBT is found and can be read: [ 1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3 [ 1.526944] nand: Macronix MX60LF8G18AC [ 1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 [ 1.539412] Bad block table found at page 524224, version 0x01 [ 1.545790] Bad block table found at page 524160, version 0x01 [ 1.551796] nand_read_bbt: bad block at 0x000001b60000 [ 1.557032] nand_read_bbt: bad block at 0x000008cc0000 [ 1.562204] nand_read_bbt: bad block at 0x00000f480000 [ 1.567395] nand_read_bbt: bad block at 0x0000111c0000 [ 1.572588] nand_read_bbt: bad block at 0x0000205c0000 [ 1.577802] nand_read_bbt: bad block at 0x00002dfc0000 I dug a little deeper and I think I found the cause for the failure on the imx27 board. The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with an offset of 0 in the OOB area. This is the same place the bad block marker is located on worn or factory bad blocks. This explains why the BBT is no longer found with my patch. scan_block_fast checks if there is anything else than 0xff in the bad block marker and finds the 'B' from 'Bbt0'. The same occurs for the mirrored version where it finds the '1' from '1tbB'. This also explains why the original BBT is detected as bad blocks in the scan after the BBT was not found, which results in the BBT being written to the remaining two blocks reserved for the BBT. 19:38:23.001385 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 19:38:23.002635 nand: ST Micro NAND01GR3B2CZA6 19:38:23.006666 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 19:38:23.028413 Bad block table not found for chip 0 19:38:23.035625 random: fast init done 19:38:23.049144 Bad block table not found for chip 0 19:38:23.050024 Scanning device for bad blocks 19:38:23.330999 Bad eraseblock 329 at 0x000002920000 19:38:23.345958 Bad eraseblock 330 at 0x000002940000 19:38:23.356024 Bad eraseblock 331 at 0x000002960000 19:38:23.365738 Bad eraseblock 332 at 0x000002980000 19:38:23.375590 Bad eraseblock 333 at 0x0000029a0000 19:38:23.385505 Bad eraseblock 334 at 0x0000029c0000 19:38:23.395548 Bad eraseblock 335 at 0x0000029e0000 19:38:23.405501 Bad eraseblock 336 at 0x000002a00000 19:38:23.415551 Bad eraseblock 337 at 0x000002a20000 19:38:23.425937 Bad eraseblock 338 at 0x000002a40000 19:38:23.436028 Bad eraseblock 339 at 0x000002a60000 19:38:23.445959 Bad eraseblock 340 at 0x000002a80000 19:38:23.456008 Bad eraseblock 341 at 0x000002aa0000 19:38:23.466006 Bad eraseblock 342 at 0x000002ac0000 19:38:23.475912 Bad eraseblock 343 at 0x000002ae0000 19:38:23.486064 Bad eraseblock 344 at 0x000002b00000 19:38:23.495925 Bad eraseblock 345 at 0x000002b20000 19:38:24.048053 Bad eraseblock 1022 at 0x000007fc0000 19:38:24.056117 Bad eraseblock 1023 at 0x000007fe0000 19:38:24.067953 Bad block table written to 0x000007fa0000, version 0x01 19:38:24.087637 Bad block table written to 0x000007f80000, version 0x01 On the next boot all four BBT version in flash are skipped for the same reason as before and the two blocks containing the latest BBT are also detected as bad blocks. The result is no more remaining blocks to write the BBT to. 21:22:55.032595 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 21:22:55.033333 nand: ST Micro NAND01GR3B2CZA6 21:22:55.037804 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 21:22:55.088475 Bad block table not found for chip 0 21:22:55.093807 Bad block table not found for chip 0 21:22:55.105995 Scanning device for bad blocks 21:22:55.109049 random: fast init done 21:22:55.395488 Bad eraseblock 329 at 0x000002920000 21:22:55.406832 Bad eraseblock 330 at 0x000002940000 21:22:55.416885 Bad eraseblock 331 at 0x000002960000 21:22:55.426736 Bad eraseblock 332 at 0x000002980000 21:22:55.436732 Bad eraseblock 333 at 0x0000029a0000 21:22:55.446864 Bad eraseblock 334 at 0x0000029c0000 21:22:55.456662 Bad eraseblock 335 at 0x0000029e0000 21:22:55.466785 Bad eraseblock 336 at 0x000002a00000 21:22:55.476801 Bad eraseblock 337 at 0x000002a20000 21:22:55.486772 Bad eraseblock 338 at 0x000002a40000 21:22:55.496768 Bad eraseblock 339 at 0x000002a60000 21:22:55.506607 Bad eraseblock 340 at 0x000002a80000 21:22:55.516965 Bad eraseblock 341 at 0x000002aa0000 21:22:55.526621 Bad eraseblock 342 at 0x000002ac0000 21:22:55.536702 Bad eraseblock 343 at 0x000002ae0000 21:22:55.546660 Bad eraseblock 344 at 0x000002b00000 21:22:55.556745 Bad eraseblock 345 at 0x000002b20000 21:22:56.172928 Bad eraseblock 1020 at 0x000007f80000 21:22:56.187043 Bad eraseblock 1021 at 0x000007fa0000 21:22:56.197437 Bad eraseblock 1022 at 0x000007fc0000 21:22:56.212665 Bad eraseblock 1023 at 0x000007fe0000 21:22:56.213356 No space left to write bad block table 21:22:56.215012 nand_bbt: error while writing bad block table -28 21:22:56.239353 mxc_nand: probe of d8000000.nand-controller failed with error -28 I'm not sure of the best way to address this issue. A few ideas came into my mind: - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the bad block marker. But I'm not sure if this would already conflict with the ECC hardware but the ooblayout functions would suggest that it could work. ---8<--- static int mxc_v1_ooblayout_free(struct mtd_info *mtd, int section, struct mtd_oob_region *oobregion) { struct nand_chip *nand_chip = mtd_to_nand(mtd); if (section > nand_chip->ecc.steps) return -ERANGE; if (!section) { if (mtd->writesize <= 512) { oobregion->offset = 0; oobregion->length = 5; } else { oobregion->offset = 2; oobregion->length = 4; } } else { oobregion->offset = ((section - 1) * 16) + MXC_V1_ECCBYTES + 6; if (section < nand_chip->ecc.steps) oobregion->length = (section * 16) + 6 - oobregion->offset; else oobregion->length = mtd->oobsize - oobregion->offset; } return 0; } ---8<--- Unfortunately I don't have any hardware at hand at the moment to test it. I think the distinction between small and large pagesizes needs to be reflected on the bbt_descr as well. - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment saying there is an overlap between the generic bbt descriptors and the ECC hardware. I'm not sure what other effects it might have to set NAND_BBT_NO_OOB. - Explicitly check for the bad block marker during a search for the BBT instead of using scan_block_fast Any suggestions? Regards, Stefan > - Revert the commit. > - Manually change the bad block markers (nanddump, flash_erase, > nandwrite) to declare the two tables bad. Reboot and observe if there > are any issues. You can try to work from there. > > > > ---8<--- > > > > > > commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51 > > > Author: Stefan Riedmueller <s.riedmueller@phytec.de> > > > Date: Thu Mar 25 11:23:37 2021 +0100 > > > > > > mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in > > > NAND > > > > > > The blocks containing the bad block table can become bad as well. So > > > make sure to skip any blocks that are marked bad when searching for > > > the > > > bad block table. > > > > > > Otherwise in very rare cases where two BBT blocks wear out it might > > > happen that an obsolete BBT is used instead of a newer available > > > version. > > > > > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de> > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > Link: > > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de > > > > > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c > > > b/drivers/mtd/nand/raw/nand_bbt.c > > > index dced32a126d9..6e25a5ce5ba9 100644 > > > --- a/drivers/mtd/nand/raw/nand_bbt.c > > > +++ b/drivers/mtd/nand/raw/nand_bbt.c > > > @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this, > > > uint8_t > > > *buf, > > > { > > > u64 targetsize = nanddev_target_size(&this->base); > > > struct mtd_info *mtd = nand_to_mtd(this); > > > + struct nand_bbt_descr *bd = this->badblock_pattern; > > > int i, chips; > > > int startblock, block, dir; > > > int scanlen = mtd->writesize + mtd->oobsize; > > > @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this, > > > uint8_t > > > *buf, > > > int actblock = startblock + dir * block; > > > loff_t offs = (loff_t)actblock << this- > > > > bbt_erase_shift; > > > > > > + /* Check if block is marked bad */ > > > + if (scan_block_fast(this, bd, offs, buf)) > > > + continue; > > > + > > > /* Read first page */ > > > scan_read(this, buf, offs, mtd->writesize, td); > > > if (!check_pattern(buf, scanlen, mtd->writesize, > > > td)) { > > > > > > > > > Thanks, > > > Miquèl > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-04-26 15:53 ` Stefan Riedmüller @ 2021-05-04 8:34 ` Miquel Raynal 2021-05-10 8:38 ` Stefan Riedmüller 0 siblings, 1 reply; 20+ messages in thread From: Miquel Raynal @ 2021-05-04 8:34 UTC (permalink / raw) To: Stefan Riedmüller; +Cc: festevam, guillaume.tucker, kernel, linux-mtd Hi Stefan, Stefan Riedmüller <S.Riedmueller@phytec.de> wrote on Mon, 26 Apr 2021 15:53:39 +0000: > Hi Miquel, > > On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote: > > Hi Stefan, > > > > > > Interesting. Maybe I overlooked the below commit when applying. Indeed, > > > > BBT may be considered as bad blocks, so I wonder if the below change is > > > > valid now... > > > > > > > > Guillaume, would you have a way to revert this patch on top of > > > > linux-next? Stefan, would you mind giving more details on the testing > > > > procedure? > > > > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply > > > returning -EIO in nand_erase_nand when the block to be erased is one of > > > the > > > first two BBT blocks. > > > > > > I have seen this once on a customer board but were not able to reproduce > > > it > > > anymore, thus the simulation of the two bad blocks. > > > > > > Without the patch below new versions of the BBT can no longer be written > > > to > > > the first two blocks reserved for the BBT but they are still evaluated to > > > read > > > the BBT from during boot due the lack of a test if these blocks are bad. > > > So > > > changes to the BBT after these two blocks turn bad are only kept and used > > > until the next reboot where again the old version of the two worn blocks > > > is > > > used as a basis. > > > > > > I tried to use the same mechanism that is used to identify bad blocks > > > during a > > > scan for bad blocks. But maybe I missed something there? Or were my > > > assumptions wrong in the first place? > > > > Honestly I don't know what is wrong exactly in this patch. > > > > We will revert the commit as it clearly breaks something fundamental > > and the merge window is too close to adopt a hackish attitude. > > > > I would propose the following tests with your board: > > - Hack the core to allow yourself to access bad blocks from userspace > > for testing purposes. > > - With the below commit, you should have the same behavior than > > reported by Fabio. > > On my imx6 board the patch does not lead to the behavior reported by Fabio. > The BBT is found and can be read: > > [ 1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3 > [ 1.526944] nand: Macronix MX60LF8G18AC > [ 1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > size: 64 > [ 1.539412] Bad block table found at page 524224, version 0x01 > [ 1.545790] Bad block table found at page 524160, version 0x01 > [ 1.551796] nand_read_bbt: bad block at 0x000001b60000 > [ 1.557032] nand_read_bbt: bad block at 0x000008cc0000 > [ 1.562204] nand_read_bbt: bad block at 0x00000f480000 > [ 1.567395] nand_read_bbt: bad block at 0x0000111c0000 > [ 1.572588] nand_read_bbt: bad block at 0x0000205c0000 > [ 1.577802] nand_read_bbt: bad block at 0x00002dfc0000 > > I dug a little deeper and I think I found the cause for the failure on the > imx27 board. > > The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with an > offset of 0 in the OOB area. This is the same place the bad block marker is > located on worn or factory bad blocks. > > This explains why the BBT is no longer found with my patch. scan_block_fast > checks if there is anything else than 0xff in the bad block marker and finds > the 'B' from 'Bbt0'. The same occurs for the mirrored version where it finds > the '1' from '1tbB'. Ok, that's the reason why the original logic failed, thanks for looking for it. > This also explains why the original BBT is detected as bad blocks in the scan > after the BBT was not found, which results in the BBT being written to the > remaining two blocks reserved for the BBT. > > 19:38:23.001385 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > 19:38:23.002635 nand: ST Micro NAND01GR3B2CZA6 > 19:38:23.006666 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > size: 64 > 19:38:23.028413 Bad block table not found for chip 0 > 19:38:23.035625 random: fast init done > 19:38:23.049144 Bad block table not found for chip 0 > 19:38:23.050024 Scanning device for bad blocks > 19:38:23.330999 Bad eraseblock 329 at 0x000002920000 > 19:38:23.345958 Bad eraseblock 330 at 0x000002940000 > 19:38:23.356024 Bad eraseblock 331 at 0x000002960000 > 19:38:23.365738 Bad eraseblock 332 at 0x000002980000 > 19:38:23.375590 Bad eraseblock 333 at 0x0000029a0000 > 19:38:23.385505 Bad eraseblock 334 at 0x0000029c0000 > 19:38:23.395548 Bad eraseblock 335 at 0x0000029e0000 > 19:38:23.405501 Bad eraseblock 336 at 0x000002a00000 > 19:38:23.415551 Bad eraseblock 337 at 0x000002a20000 > 19:38:23.425937 Bad eraseblock 338 at 0x000002a40000 > 19:38:23.436028 Bad eraseblock 339 at 0x000002a60000 > 19:38:23.445959 Bad eraseblock 340 at 0x000002a80000 > 19:38:23.456008 Bad eraseblock 341 at 0x000002aa0000 > 19:38:23.466006 Bad eraseblock 342 at 0x000002ac0000 > 19:38:23.475912 Bad eraseblock 343 at 0x000002ae0000 > 19:38:23.486064 Bad eraseblock 344 at 0x000002b00000 > 19:38:23.495925 Bad eraseblock 345 at 0x000002b20000 > 19:38:24.048053 Bad eraseblock 1022 at 0x000007fc0000 > 19:38:24.056117 Bad eraseblock 1023 at 0x000007fe0000 > 19:38:24.067953 Bad block table written to 0x000007fa0000, version 0x01 > 19:38:24.087637 Bad block table written to 0x000007f80000, version 0x01 > > > On the next boot all four BBT version in flash are skipped for the same reason > as before and the two blocks containing the latest BBT are also detected as > bad blocks. The result is no more remaining blocks to write the BBT to. > > > 21:22:55.032595 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > 21:22:55.033333 nand: ST Micro NAND01GR3B2CZA6 > 21:22:55.037804 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > size: 64 > 21:22:55.088475 Bad block table not found for chip 0 > 21:22:55.093807 Bad block table not found for chip 0 > 21:22:55.105995 Scanning device for bad blocks > 21:22:55.109049 random: fast init done > 21:22:55.395488 Bad eraseblock 329 at 0x000002920000 > 21:22:55.406832 Bad eraseblock 330 at 0x000002940000 > 21:22:55.416885 Bad eraseblock 331 at 0x000002960000 > 21:22:55.426736 Bad eraseblock 332 at 0x000002980000 > 21:22:55.436732 Bad eraseblock 333 at 0x0000029a0000 > 21:22:55.446864 Bad eraseblock 334 at 0x0000029c0000 > 21:22:55.456662 Bad eraseblock 335 at 0x0000029e0000 > 21:22:55.466785 Bad eraseblock 336 at 0x000002a00000 > 21:22:55.476801 Bad eraseblock 337 at 0x000002a20000 > 21:22:55.486772 Bad eraseblock 338 at 0x000002a40000 > 21:22:55.496768 Bad eraseblock 339 at 0x000002a60000 > 21:22:55.506607 Bad eraseblock 340 at 0x000002a80000 > 21:22:55.516965 Bad eraseblock 341 at 0x000002aa0000 > 21:22:55.526621 Bad eraseblock 342 at 0x000002ac0000 > 21:22:55.536702 Bad eraseblock 343 at 0x000002ae0000 > 21:22:55.546660 Bad eraseblock 344 at 0x000002b00000 > 21:22:55.556745 Bad eraseblock 345 at 0x000002b20000 > 21:22:56.172928 Bad eraseblock 1020 at 0x000007f80000 > 21:22:56.187043 Bad eraseblock 1021 at 0x000007fa0000 > 21:22:56.197437 Bad eraseblock 1022 at 0x000007fc0000 > 21:22:56.212665 Bad eraseblock 1023 at 0x000007fe0000 > 21:22:56.213356 No space left to write bad block table > 21:22:56.215012 nand_bbt: error while writing bad block table -28 > 21:22:56.239353 mxc_nand: probe of d8000000.nand-controller failed with error > -28 > > I'm not sure of the best way to address this issue. A few ideas came into my > mind: > > - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the bad > block marker. But I'm not sure if this would already conflict with the ECC > hardware but the ooblayout functions would suggest that it could work. There are thousands of boards out there that would be broken with such change: it's too late to do changes in this driver, unfortunately. > Unfortunately I don't have any hardware at hand at the moment to test it. I > think the distinction between small and large pagesizes needs to be reflected > on the bbt_descr as well. > > - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment saying > there is an overlap between the generic bbt descriptors and the ECC hardware. > I'm not sure what other effects it might have to set NAND_BBT_NO_OOB. Same here: that's not an option. > - Explicitly check for the bad block marker during a search for the BBT > instead of using scan_block_fast This look more reasonable. You can create a helper which does the scan_block_fast(), then eventually checks the beginning of the OOB buffer and tries to match with the ->td and ->md descriptors. This should work with all the legacy drivers implementing their own descriptors - hopefully. Other drivers are impacted as well, so maybe you'll find a board for testing (or someone gentle enough that will test it for you). Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: imx27: No space left to write bad block table 2021-05-04 8:34 ` Miquel Raynal @ 2021-05-10 8:38 ` Stefan Riedmüller 0 siblings, 0 replies; 20+ messages in thread From: Stefan Riedmüller @ 2021-05-10 8:38 UTC (permalink / raw) To: miquel.raynal; +Cc: festevam, guillaume.tucker, kernel, linux-mtd Hi Miquel, On Tue, 2021-05-04 at 10:34 +0200, Miquel Raynal wrote: > Hi Stefan, > > Stefan Riedmüller <S.Riedmueller@phytec.de> wrote on Mon, 26 Apr 2021 > 15:53:39 +0000: > > > Hi Miquel, > > > > On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote: > > > Hi Stefan, > > > > > > > > Interesting. Maybe I overlooked the below commit when applying. > > > > > Indeed, > > > > > BBT may be considered as bad blocks, so I wonder if the below change > > > > > is > > > > > valid now... > > > > > > > > > > Guillaume, would you have a way to revert this patch on top of > > > > > linux-next? Stefan, would you mind giving more details on the > > > > > testing > > > > > procedure? > > > > > > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by > > > > simply > > > > returning -EIO in nand_erase_nand when the block to be erased is one > > > > of > > > > the > > > > first two BBT blocks. > > > > > > > > I have seen this once on a customer board but were not able to > > > > reproduce > > > > it > > > > anymore, thus the simulation of the two bad blocks. > > > > > > > > Without the patch below new versions of the BBT can no longer be > > > > written > > > > to > > > > the first two blocks reserved for the BBT but they are still evaluated > > > > to > > > > read > > > > the BBT from during boot due the lack of a test if these blocks are > > > > bad. > > > > So > > > > changes to the BBT after these two blocks turn bad are only kept and > > > > used > > > > until the next reboot where again the old version of the two worn > > > > blocks > > > > is > > > > used as a basis. > > > > > > > > I tried to use the same mechanism that is used to identify bad blocks > > > > during a > > > > scan for bad blocks. But maybe I missed something there? Or were my > > > > assumptions wrong in the first place? > > > > > > Honestly I don't know what is wrong exactly in this patch. > > > > > > We will revert the commit as it clearly breaks something fundamental > > > and the merge window is too close to adopt a hackish attitude. > > > > > > I would propose the following tests with your board: > > > - Hack the core to allow yourself to access bad blocks from userspace > > > for testing purposes. > > > - With the below commit, you should have the same behavior than > > > reported by Fabio. > > > > On my imx6 board the patch does not lead to the behavior reported by > > Fabio. > > The BBT is found and can be read: > > > > [ 1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3 > > [ 1.526944] nand: Macronix MX60LF8G18AC > > [ 1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, > > OOB > > size: 64 > > [ 1.539412] Bad block table found at page 524224, version 0x01 > > [ 1.545790] Bad block table found at page 524160, version 0x01 > > [ 1.551796] nand_read_bbt: bad block at 0x000001b60000 > > [ 1.557032] nand_read_bbt: bad block at 0x000008cc0000 > > [ 1.562204] nand_read_bbt: bad block at 0x00000f480000 > > [ 1.567395] nand_read_bbt: bad block at 0x0000111c0000 > > [ 1.572588] nand_read_bbt: bad block at 0x0000205c0000 > > [ 1.577802] nand_read_bbt: bad block at 0x00002dfc0000 > > > > I dug a little deeper and I think I found the cause for the failure on the > > imx27 board. > > > > The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with > > an > > offset of 0 in the OOB area. This is the same place the bad block marker > > is > > located on worn or factory bad blocks. > > > > This explains why the BBT is no longer found with my patch. > > scan_block_fast > > checks if there is anything else than 0xff in the bad block marker and > > finds > > the 'B' from 'Bbt0'. The same occurs for the mirrored version where it > > finds > > the '1' from '1tbB'. > > Ok, that's the reason why the original logic failed, thanks for looking > for it. > > > This also explains why the original BBT is detected as bad blocks in the > > scan > > after the BBT was not found, which results in the BBT being written to the > > remaining two blocks reserved for the BBT. > > > > 19:38:23.001385 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > > 19:38:23.002635 nand: ST Micro NAND01GR3B2CZA6 > > 19:38:23.006666 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, > > OOB > > size: 64 > > 19:38:23.028413 Bad block table not found for chip 0 > > 19:38:23.035625 random: fast init done > > 19:38:23.049144 Bad block table not found for chip 0 > > 19:38:23.050024 Scanning device for bad blocks > > 19:38:23.330999 Bad eraseblock 329 at 0x000002920000 > > 19:38:23.345958 Bad eraseblock 330 at 0x000002940000 > > 19:38:23.356024 Bad eraseblock 331 at 0x000002960000 > > 19:38:23.365738 Bad eraseblock 332 at 0x000002980000 > > 19:38:23.375590 Bad eraseblock 333 at 0x0000029a0000 > > 19:38:23.385505 Bad eraseblock 334 at 0x0000029c0000 > > 19:38:23.395548 Bad eraseblock 335 at 0x0000029e0000 > > 19:38:23.405501 Bad eraseblock 336 at 0x000002a00000 > > 19:38:23.415551 Bad eraseblock 337 at 0x000002a20000 > > 19:38:23.425937 Bad eraseblock 338 at 0x000002a40000 > > 19:38:23.436028 Bad eraseblock 339 at 0x000002a60000 > > 19:38:23.445959 Bad eraseblock 340 at 0x000002a80000 > > 19:38:23.456008 Bad eraseblock 341 at 0x000002aa0000 > > 19:38:23.466006 Bad eraseblock 342 at 0x000002ac0000 > > 19:38:23.475912 Bad eraseblock 343 at 0x000002ae0000 > > 19:38:23.486064 Bad eraseblock 344 at 0x000002b00000 > > 19:38:23.495925 Bad eraseblock 345 at 0x000002b20000 > > 19:38:24.048053 Bad eraseblock 1022 at 0x000007fc0000 > > 19:38:24.056117 Bad eraseblock 1023 at 0x000007fe0000 > > 19:38:24.067953 Bad block table written to 0x000007fa0000, version 0x01 > > 19:38:24.087637 Bad block table written to 0x000007f80000, version 0x01 > > > > > > On the next boot all four BBT version in flash are skipped for the same > > reason > > as before and the two blocks containing the latest BBT are also detected > > as > > bad blocks. The result is no more remaining blocks to write the BBT to. > > > > > > 21:22:55.032595 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > > 21:22:55.033333 nand: ST Micro NAND01GR3B2CZA6 > > 21:22:55.037804 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, > > OOB > > size: 64 > > 21:22:55.088475 Bad block table not found for chip 0 > > 21:22:55.093807 Bad block table not found for chip 0 > > 21:22:55.105995 Scanning device for bad blocks > > 21:22:55.109049 random: fast init done > > 21:22:55.395488 Bad eraseblock 329 at 0x000002920000 > > 21:22:55.406832 Bad eraseblock 330 at 0x000002940000 > > 21:22:55.416885 Bad eraseblock 331 at 0x000002960000 > > 21:22:55.426736 Bad eraseblock 332 at 0x000002980000 > > 21:22:55.436732 Bad eraseblock 333 at 0x0000029a0000 > > 21:22:55.446864 Bad eraseblock 334 at 0x0000029c0000 > > 21:22:55.456662 Bad eraseblock 335 at 0x0000029e0000 > > 21:22:55.466785 Bad eraseblock 336 at 0x000002a00000 > > 21:22:55.476801 Bad eraseblock 337 at 0x000002a20000 > > 21:22:55.486772 Bad eraseblock 338 at 0x000002a40000 > > 21:22:55.496768 Bad eraseblock 339 at 0x000002a60000 > > 21:22:55.506607 Bad eraseblock 340 at 0x000002a80000 > > 21:22:55.516965 Bad eraseblock 341 at 0x000002aa0000 > > 21:22:55.526621 Bad eraseblock 342 at 0x000002ac0000 > > 21:22:55.536702 Bad eraseblock 343 at 0x000002ae0000 > > 21:22:55.546660 Bad eraseblock 344 at 0x000002b00000 > > 21:22:55.556745 Bad eraseblock 345 at 0x000002b20000 > > 21:22:56.172928 Bad eraseblock 1020 at 0x000007f80000 > > 21:22:56.187043 Bad eraseblock 1021 at 0x000007fa0000 > > 21:22:56.197437 Bad eraseblock 1022 at 0x000007fc0000 > > 21:22:56.212665 Bad eraseblock 1023 at 0x000007fe0000 > > 21:22:56.213356 No space left to write bad block table > > 21:22:56.215012 nand_bbt: error while writing bad block table -28 > > 21:22:56.239353 mxc_nand: probe of d8000000.nand-controller failed with > > error > > -28 > > > > I'm not sure of the best way to address this issue. A few ideas came into > > my > > mind: > > > > - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the > > bad > > block marker. But I'm not sure if this would already conflict with the ECC > > hardware but the ooblayout functions would suggest that it could work. > > There are thousands of boards out there that would be broken with such > change: it's too late to do changes in this driver, unfortunately. > > > Unfortunately I don't have any hardware at hand at the moment to test it. > > I > > think the distinction between small and large pagesizes needs to be > > reflected > > on the bbt_descr as well. > > > > - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment > > saying > > there is an overlap between the generic bbt descriptors and the ECC > > hardware. > > I'm not sure what other effects it might have to set NAND_BBT_NO_OOB. > > Same here: that's not an option. > > > - Explicitly check for the bad block marker during a search for the BBT > > instead of using scan_block_fast > > This look more reasonable. You can create a helper which does the > scan_block_fast(), then eventually checks the beginning of the OOB > buffer and tries to match with the ->td and ->md descriptors. This > should work with all the legacy drivers implementing their own > descriptors - hopefully. Thanks for your input. I will take another spin at it. > > Other drivers are impacted as well, so maybe you'll find a board for > testing (or someone gentle enough that will test it for you). I hope I'll get my hands at least on one of the imx27 boards. Thanks, Stefan > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-05-10 8:39 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-17 15:59 imx27: No space left to write bad block table Fabio Estevam 2021-04-19 6:37 ` Miquel Raynal 2021-04-19 11:47 ` Fabio Estevam 2021-04-19 12:27 ` Miquel Raynal 2021-04-19 12:41 ` Fabio Estevam 2021-04-19 12:48 ` Fabio Estevam 2021-04-19 13:01 ` Fabio Estevam 2021-04-19 13:40 ` Miquel Raynal 2021-04-19 13:56 ` Fabio Estevam 2021-04-19 13:04 ` Stefan Riedmüller 2021-04-19 15:36 ` Miquel Raynal 2021-04-20 6:26 ` Stefan Riedmüller 2021-04-21 20:44 ` Guillaume Tucker 2021-04-21 23:29 ` Fabio Estevam 2021-04-22 13:16 ` Guillaume Tucker 2021-04-22 13:28 ` Fabio Estevam 2021-04-23 21:04 ` Fabio Estevam 2021-04-26 15:53 ` Stefan Riedmüller 2021-05-04 8:34 ` Miquel Raynal 2021-05-10 8:38 ` Stefan Riedmüller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).