All of lore.kernel.org
 help / color / mirror / Atom feed
* mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
@ 2023-06-21 14:27 Kegl Rohit
  2023-06-21 15:26 ` han.xu
  0 siblings, 1 reply; 8+ messages in thread
From: Kegl Rohit @ 2023-06-21 14:27 UTC (permalink / raw)
  To: linux-mtd, han.xu

Hello!

Using imx7d and rt stable kernel tree.

After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90

After reverting the latest patch
(e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
not get corrupted.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915

The commit message states the timeout calculation was changed.
Here are the calculated timeouts `busy_timeout_cycles` before (_old)
and after the patch (_new):

[    0.491534] busy_timeout_cycles_old 4353
[    0.491604] busy_timeout_cycles_new 1424705
[    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
[    0.492310] nand: Macronix MX30LF4G28AC
[    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 112
[    0.492488] busy_timeout_cycles_old 4353
[    0.492493] busy_timeout_cycles_new 1424705
[    0.492863] busy_timeout_cycles_old 2510
[    0.492872] busy_timeout_cycles_new 350000

The new timeouts are set a lot higher. Higher timeouts should not be
an issue. Lower timeouts could be an issue.
But because of this high timeouts gpmi-nand is broken for us.

For now we simple reverted the change.
The new calculations seem to be flaky, a previous "fix backport" was
already reverted because of data corruption.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d

Any guesses why the high timeout causes issues?


Thanks in advance!

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
  2023-06-21 14:27 mtd: raw: nand: gpmi-nand data corruption @ v5.10.184 Kegl Rohit
@ 2023-06-21 15:26 ` han.xu
  2023-06-21 17:55   ` Kegl Rohit
  0 siblings, 1 reply; 8+ messages in thread
From: han.xu @ 2023-06-21 15:26 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: linux-mtd, han.xu

On 23/06/21 04:27PM, Kegl Rohit wrote:
> Hello!
> 
> Using imx7d and rt stable kernel tree.
> 
> After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> 
> After reverting the latest patch
> (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> not get corrupted.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> 
> The commit message states the timeout calculation was changed.
> Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> and after the patch (_new):
> 
> [    0.491534] busy_timeout_cycles_old 4353
> [    0.491604] busy_timeout_cycles_new 1424705
> [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> [    0.492310] nand: Macronix MX30LF4G28AC
> [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 112
> [    0.492488] busy_timeout_cycles_old 4353
> [    0.492493] busy_timeout_cycles_new 1424705
> [    0.492863] busy_timeout_cycles_old 2510
> [    0.492872] busy_timeout_cycles_new 350000
> 
> The new timeouts are set a lot higher. Higher timeouts should not be
> an issue. Lower timeouts could be an issue.
> But because of this high timeouts gpmi-nand is broken for us.
> 
> For now we simple reverted the change.
> The new calculations seem to be flaky, a previous "fix backport" was
> already reverted because of data corruption.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> 
> Any guesses why the high timeout causes issues?

high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
register turns to be 0.

> 
> 
> Thanks in advance!
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
  2023-06-21 15:26 ` han.xu
@ 2023-06-21 17:55   ` Kegl Rohit
  2023-06-22  4:46     ` Kegl Rohit
  0 siblings, 1 reply; 8+ messages in thread
From: Kegl Rohit @ 2023-06-21 17:55 UTC (permalink / raw)
  To: han.xu; +Cc: linux-mtd

ok, looking at the 5.10.184 gpmi-nand.c:

#define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
(((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)

hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);

and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893

hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
4096));

could be the cause. DIV_ROUND_UP is most likely a division and
busy_timeout_cycles * 4096 a multiplication!

The backport is wrong, because on the 5.10 kernel tree commit
cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
not.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d

=> now in 5.10.184 this line "hw->timing1 ..." is wrong!

 I will test this tomorrow.

On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:
>
> On 23/06/21 04:27PM, Kegl Rohit wrote:
> > Hello!
> >
> > Using imx7d and rt stable kernel tree.
> >
> > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> >
> > After reverting the latest patch
> > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > not get corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> >
> > The commit message states the timeout calculation was changed.
> > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > and after the patch (_new):
> >
> > [    0.491534] busy_timeout_cycles_old 4353
> > [    0.491604] busy_timeout_cycles_new 1424705
> > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > [    0.492310] nand: Macronix MX30LF4G28AC
> > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > 2048, OOB size: 112
> > [    0.492488] busy_timeout_cycles_old 4353
> > [    0.492493] busy_timeout_cycles_new 1424705
> > [    0.492863] busy_timeout_cycles_old 2510
> > [    0.492872] busy_timeout_cycles_new 350000
> >
> > The new timeouts are set a lot higher. Higher timeouts should not be
> > an issue. Lower timeouts could be an issue.
> > But because of this high timeouts gpmi-nand is broken for us.
> >
> > For now we simple reverted the change.
> > The new calculations seem to be flaky, a previous "fix backport" was
> > already reverted because of data corruption.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > Any guesses why the high timeout causes issues?
>
> high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> register turns to be 0.
>
> >
> >
> > Thanks in advance!
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
  2023-06-21 17:55   ` Kegl Rohit
@ 2023-06-22  4:46     ` Kegl Rohit
  2023-06-25  9:11         ` Kegl Rohit
  0 siblings, 1 reply; 8+ messages in thread
From: Kegl Rohit @ 2023-06-22  4:46 UTC (permalink / raw)
  To: han.xu; +Cc: linux-mtd

After reverting the revert :), the data corruption did not happen anymore!

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d

On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit@gmail.com> wrote:
>
> ok, looking at the 5.10.184 gpmi-nand.c:
>
> #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
>
> hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
>
> and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
>
> hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> 4096));
>
> could be the cause. DIV_ROUND_UP is most likely a division and
> busy_timeout_cycles * 4096 a multiplication!
>
> The backport is wrong, because on the 5.10 kernel tree commit
> cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> not.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
>
> => now in 5.10.184 this line "hw->timing1 ..." is wrong!
>
>  I will test this tomorrow.
>
> On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:
> >
> > On 23/06/21 04:27PM, Kegl Rohit wrote:
> > > Hello!
> > >
> > > Using imx7d and rt stable kernel tree.
> > >
> > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > >
> > > After reverting the latest patch
> > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > not get corrupted.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > >
> > > The commit message states the timeout calculation was changed.
> > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > and after the patch (_new):
> > >
> > > [    0.491534] busy_timeout_cycles_old 4353
> > > [    0.491604] busy_timeout_cycles_new 1424705
> > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > 2048, OOB size: 112
> > > [    0.492488] busy_timeout_cycles_old 4353
> > > [    0.492493] busy_timeout_cycles_new 1424705
> > > [    0.492863] busy_timeout_cycles_old 2510
> > > [    0.492872] busy_timeout_cycles_new 350000
> > >
> > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > an issue. Lower timeouts could be an issue.
> > > But because of this high timeouts gpmi-nand is broken for us.
> > >
> > > For now we simple reverted the change.
> > > The new calculations seem to be flaky, a previous "fix backport" was
> > > already reverted because of data corruption.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > >
> > > Any guesses why the high timeout causes issues?
> >
> > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > register turns to be 0.
> >
> > >
> > >
> > > Thanks in advance!
> > >
> > > ______________________________________________________
> > > Linux MTD discussion mailing list
> > > http://lists.infradead.org/mailman/listinfo/linux-mtd/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
  2023-06-22  4:46     ` Kegl Rohit
@ 2023-06-25  9:11         ` Kegl Rohit
  0 siblings, 0 replies; 8+ messages in thread
From: Kegl Rohit @ 2023-06-25  9:11 UTC (permalink / raw)
  To: han.xu; +Cc: linux-mtd, stable, s.hauer, miquel.raynal, tomasz.mon, gregkh

Hello!

Following to the initial discussion
https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
which caused the revert commit:
Are there any plans to fix this issue for 5.10.y (and maybe other
stable branches)?

Thanks in advance!

On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit@gmail.com> wrote:
>
> After reverting the revert :), the data corruption did not happen anymore!
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
>
> On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit@gmail.com> wrote:
> >
> > ok, looking at the 5.10.184 gpmi-nand.c:
> >
> > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> >
> > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > 4096));
> >
> > could be the cause. DIV_ROUND_UP is most likely a division and
> > busy_timeout_cycles * 4096 a multiplication!
> >
> > The backport is wrong, because on the 5.10 kernel tree commit
> > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > not.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > => now in 5.10.184 this line "hw->timing1 ..." is wrong!
> >
> >  I will test this tomorrow.
> >
> > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:
> > >
> > > On 23/06/21 04:27PM, Kegl Rohit wrote:
> > > > Hello!
> > > >
> > > > Using imx7d and rt stable kernel tree.
> > > >
> > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > > >
> > > > After reverting the latest patch
> > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > not get corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > > >
> > > > The commit message states the timeout calculation was changed.
> > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > and after the patch (_new):
> > > >
> > > > [    0.491534] busy_timeout_cycles_old 4353
> > > > [    0.491604] busy_timeout_cycles_new 1424705
> > > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > 2048, OOB size: 112
> > > > [    0.492488] busy_timeout_cycles_old 4353
> > > > [    0.492493] busy_timeout_cycles_new 1424705
> > > > [    0.492863] busy_timeout_cycles_old 2510
> > > > [    0.492872] busy_timeout_cycles_new 350000
> > > >
> > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > an issue. Lower timeouts could be an issue.
> > > > But because of this high timeouts gpmi-nand is broken for us.
> > > >
> > > > For now we simple reverted the change.
> > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > already reverted because of data corruption.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > > >
> > > > Any guesses why the high timeout causes issues?
> > >
> > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > register turns to be 0.
> > >
> > > >
> > > >
> > > > Thanks in advance!
> > > >
> > > > ______________________________________________________
> > > > Linux MTD discussion mailing list
> > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
@ 2023-06-25  9:11         ` Kegl Rohit
  0 siblings, 0 replies; 8+ messages in thread
From: Kegl Rohit @ 2023-06-25  9:11 UTC (permalink / raw)
  To: han.xu; +Cc: linux-mtd, stable, s.hauer, miquel.raynal, tomasz.mon, gregkh

Hello!

Following to the initial discussion
https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
which caused the revert commit:
Are there any plans to fix this issue for 5.10.y (and maybe other
stable branches)?

Thanks in advance!

On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit@gmail.com> wrote:
>
> After reverting the revert :), the data corruption did not happen anymore!
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
>
> On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit@gmail.com> wrote:
> >
> > ok, looking at the 5.10.184 gpmi-nand.c:
> >
> > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> >
> > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > 4096));
> >
> > could be the cause. DIV_ROUND_UP is most likely a division and
> > busy_timeout_cycles * 4096 a multiplication!
> >
> > The backport is wrong, because on the 5.10 kernel tree commit
> > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > not.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > => now in 5.10.184 this line "hw->timing1 ..." is wrong!
> >
> >  I will test this tomorrow.
> >
> > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:
> > >
> > > On 23/06/21 04:27PM, Kegl Rohit wrote:
> > > > Hello!
> > > >
> > > > Using imx7d and rt stable kernel tree.
> > > >
> > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > > >
> > > > After reverting the latest patch
> > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > not get corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > > >
> > > > The commit message states the timeout calculation was changed.
> > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > and after the patch (_new):
> > > >
> > > > [    0.491534] busy_timeout_cycles_old 4353
> > > > [    0.491604] busy_timeout_cycles_new 1424705
> > > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > 2048, OOB size: 112
> > > > [    0.492488] busy_timeout_cycles_old 4353
> > > > [    0.492493] busy_timeout_cycles_new 1424705
> > > > [    0.492863] busy_timeout_cycles_old 2510
> > > > [    0.492872] busy_timeout_cycles_new 350000
> > > >
> > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > an issue. Lower timeouts could be an issue.
> > > > But because of this high timeouts gpmi-nand is broken for us.
> > > >
> > > > For now we simple reverted the change.
> > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > already reverted because of data corruption.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > > >
> > > > Any guesses why the high timeout causes issues?
> > >
> > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > register turns to be 0.
> > >
> > > >
> > > >
> > > > Thanks in advance!
> > > >
> > > > ______________________________________________________
> > > > Linux MTD discussion mailing list
> > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
  2023-06-25  9:11         ` Kegl Rohit
@ 2023-06-26 10:56           ` Miquel Raynal
  -1 siblings, 0 replies; 8+ messages in thread
From: Miquel Raynal @ 2023-06-26 10:56 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: han.xu, linux-mtd, stable, s.hauer, tomasz.mon, gregkh

Hi Kegl,

keglrohit@gmail.com wrote on Sun, 25 Jun 2023 11:11:52 +0200:

> Hello!
> 
> Following to the initial discussion
> https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
> which caused the revert commit:
> Are there any plans to fix this issue for 5.10.y (and maybe other
> stable branches)?

If the fixes tags are right, all relevant branches which are still
maintained should see the final fix applied. If that's not the case, it
means the stable maintainers could not apply the patch as-is and let it
aside. You are pleased in this case to adapt the official patch to
the branch(es) of interest and send it to the stable team by mentioning
the upstream commit (see the documentation about how to ask for
backporting patches on stable branches).

Thanks,
Miquèl

> 
> Thanks in advance!
> 
> On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit@gmail.com> wrote:
> >
> > After reverting the revert :), the data corruption did not happen anymore!
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit@gmail.com> wrote:  
> > >
> > > ok, looking at the 5.10.184 gpmi-nand.c:
> > >
> > > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> > >
> > > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > > 4096));
> > >
> > > could be the cause. DIV_ROUND_UP is most likely a division and
> > > busy_timeout_cycles * 4096 a multiplication!
> > >
> > > The backport is wrong, because on the 5.10 kernel tree commit
> > > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > > not.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > >  
> > > => now in 5.10.184 this line "hw->timing1 ..." is wrong!  
> > >
> > >  I will test this tomorrow.
> > >
> > > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:  
> > > >
> > > > On 23/06/21 04:27PM, Kegl Rohit wrote:  
> > > > > Hello!
> > > > >
> > > > > Using imx7d and rt stable kernel tree.
> > > > >
> > > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > > > >
> > > > > After reverting the latest patch
> > > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > > not get corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > > > >
> > > > > The commit message states the timeout calculation was changed.
> > > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > > and after the patch (_new):
> > > > >
> > > > > [    0.491534] busy_timeout_cycles_old 4353
> > > > > [    0.491604] busy_timeout_cycles_new 1424705
> > > > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > > 2048, OOB size: 112
> > > > > [    0.492488] busy_timeout_cycles_old 4353
> > > > > [    0.492493] busy_timeout_cycles_new 1424705
> > > > > [    0.492863] busy_timeout_cycles_old 2510
> > > > > [    0.492872] busy_timeout_cycles_new 350000
> > > > >
> > > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > > an issue. Lower timeouts could be an issue.
> > > > > But because of this high timeouts gpmi-nand is broken for us.
> > > > >
> > > > > For now we simple reverted the change.
> > > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > > already reverted because of data corruption.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > > > >
> > > > > Any guesses why the high timeout causes issues?  
> > > >
> > > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > > register turns to be 0.
> > > >  
> > > > >
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > ______________________________________________________
> > > > > Linux MTD discussion mailing list
> > > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
@ 2023-06-26 10:56           ` Miquel Raynal
  0 siblings, 0 replies; 8+ messages in thread
From: Miquel Raynal @ 2023-06-26 10:56 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: han.xu, linux-mtd, stable, s.hauer, tomasz.mon, gregkh

Hi Kegl,

keglrohit@gmail.com wrote on Sun, 25 Jun 2023 11:11:52 +0200:

> Hello!
> 
> Following to the initial discussion
> https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
> which caused the revert commit:
> Are there any plans to fix this issue for 5.10.y (and maybe other
> stable branches)?

If the fixes tags are right, all relevant branches which are still
maintained should see the final fix applied. If that's not the case, it
means the stable maintainers could not apply the patch as-is and let it
aside. You are pleased in this case to adapt the official patch to
the branch(es) of interest and send it to the stable team by mentioning
the upstream commit (see the documentation about how to ask for
backporting patches on stable branches).

Thanks,
Miquèl

> 
> Thanks in advance!
> 
> On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit@gmail.com> wrote:
> >
> > After reverting the revert :), the data corruption did not happen anymore!
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit@gmail.com> wrote:  
> > >
> > > ok, looking at the 5.10.184 gpmi-nand.c:
> > >
> > > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> > >
> > > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > > 4096));
> > >
> > > could be the cause. DIV_ROUND_UP is most likely a division and
> > > busy_timeout_cycles * 4096 a multiplication!
> > >
> > > The backport is wrong, because on the 5.10 kernel tree commit
> > > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > > not.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > >  
> > > => now in 5.10.184 this line "hw->timing1 ..." is wrong!  
> > >
> > >  I will test this tomorrow.
> > >
> > > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu@nxp.com> wrote:  
> > > >
> > > > On 23/06/21 04:27PM, Kegl Rohit wrote:  
> > > > > Hello!
> > > > >
> > > > > Using imx7d and rt stable kernel tree.
> > > > >
> > > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > > > >
> > > > > After reverting the latest patch
> > > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > > not get corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > > > >
> > > > > The commit message states the timeout calculation was changed.
> > > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > > and after the patch (_new):
> > > > >
> > > > > [    0.491534] busy_timeout_cycles_old 4353
> > > > > [    0.491604] busy_timeout_cycles_new 1424705
> > > > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > > 2048, OOB size: 112
> > > > > [    0.492488] busy_timeout_cycles_old 4353
> > > > > [    0.492493] busy_timeout_cycles_new 1424705
> > > > > [    0.492863] busy_timeout_cycles_old 2510
> > > > > [    0.492872] busy_timeout_cycles_new 350000
> > > > >
> > > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > > an issue. Lower timeouts could be an issue.
> > > > > But because of this high timeouts gpmi-nand is broken for us.
> > > > >
> > > > > For now we simple reverted the change.
> > > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > > already reverted because of data corruption.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > > > >
> > > > > Any guesses why the high timeout causes issues?  
> > > >
> > > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > > register turns to be 0.
> > > >  
> > > > >
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > ______________________________________________________
> > > > > Linux MTD discussion mailing list
> > > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/  

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-26 10:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-21 14:27 mtd: raw: nand: gpmi-nand data corruption @ v5.10.184 Kegl Rohit
2023-06-21 15:26 ` han.xu
2023-06-21 17:55   ` Kegl Rohit
2023-06-22  4:46     ` Kegl Rohit
2023-06-25  9:11       ` Kegl Rohit
2023-06-25  9:11         ` Kegl Rohit
2023-06-26 10:56         ` Miquel Raynal
2023-06-26 10:56           ` Miquel Raynal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.