All of lore.kernel.org
 help / color / mirror / Atom feed
* marvell_nand driver fails to suspend
@ 2018-07-01 10:18 Daniel Mack
  2018-07-01 19:04 ` Daniel Mack
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Mack @ 2018-07-01 10:18 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Miquel,

I'm seeing the below error when trying to suspend and resume a PXA3xx 
machine booted from devicetree with the new nand driver. This used to 
work fine with the old driver, but admittedly, the only kernel I 
currently have for reference testing is very old (3.0.4), and many other 
things regarding nand/mtd have also changed since then.

The suspend/resume implementation in the old driver used to call into 
the ->suspend() and ->resume() functions of its mtd_info children 
directly, but looking at other drivers, it seems this is no longer 
needed or wanted. It also cleared all interrupts during resume, but that 
alone doesn't fix it in my tests.

I haven't followed the development in that area, so I'd appreciate any 
hint on how to fix this regression. I'm happy to test patches.


Thanks,
Daniel


$ [   43.576363] PM: suspend entry (deep)
[   43.579960] PM: Syncing filesystems ...
[   44.660162] marvell-nfc 43100000.nand-controller: Timeout waiting for 
RB signal
[   44.671492] ubi0 error: ubi_io_write: error -110 while writing 2048 
bytes to PEB 102:38912, written 0 bytes
[   44.682887] CPU: 0 PID: 1417 Comm: remote-control Not tainted 
4.18.0-rc2+ #344
[   44.691197] Hardware name: Marvell PXA3xx (Device Tree Support)
[   44.697111] Backtrace:
[   44.699593] [<c0106458>] (dump_backtrace) from [<c0106718>] 
(show_stack+0x18/0x1c)
[   44.708931]  r7:00000800 r6:00009800 r5:00000066 r4:c6139000
[   44.715833] [<c0106700>] (show_stack) from [<c0678a60>] 
(dump_stack+0x20/0x28)
[   44.724206] [<c0678a40>] (dump_stack) from [<c0456cbc>] 
(ubi_io_write+0x3d4/0x630)
[   44.732925] [<c04568e8>] (ubi_io_write) from [<c0454428>] 
(ubi_eba_write_leb+0x690/0x6fc)
[   44.742295]  r10:c67b1044 r9:c6b75dd8 r8:00000800 r7:c614bc00 
r6:00000080 r5:00008800
[   44.751220]  r4:c6139000
[   44.753817] [<c0453d98>] (ubi_eba_write_leb) from [<c04526d8>] 
(ubi_leb_write+0xc4/0xdc)
[   44.763406]  r10:c67b1044 r9:c6b75dd8 r8:00000800 r7:00008800 
r6:00000080 r5:00008800
[   44.772329]  r4:00000001
[   44.774921] [<c0452614>] (ubi_leb_write) from [<c02d2ad4>] 
(ubifs_leb_write+0xa0/0x110)
[   44.784405]  r6:00000218 r5:c66a6800 r4:c602a000
[   44.789224] [<c02d2a34>] (ubifs_leb_write) from [<c02d3680>] 
(ubifs_wbuf_sync_nolock+0x244/0x334)
[   44.799451]  r8:c01ee478 r7:00000760 r6:00000800 r5:c602a000 r4:c67a1880
[   44.807311] [<c02d343c>] (ubifs_wbuf_sync_nolock) from [<c02ce54c>] 
(ubifs_sync_fs+0x40/0x84)
[   44.817024]  r7:c67a18a4 r6:00000001 r5:c602a000 r4:c67a1880
[   44.828758] [<c02ce50c>] (ubifs_sync_fs) from [<c01ee4a8>] 
(sync_fs_one_sb+0x30/0x34)
[   44.837861]  r7:c0a179f0 r6:ffffe000 r5:c61fc000 r4:c67b1000
[   44.844927] [<c01ee478>] (sync_fs_one_sb) from [<c01c29d8>] 
(iterate_supers+0xf8/0x138)
[   44.854161] [<c01c28e0>] (iterate_supers) from [<c01ee5b4>] 
(ksys_sync+0x58/0xb8)
[   44.862839]  r10:00000051 r9:c0a3f108 r8:c07bcdac r7:00000003 
r6:c0a03008 r5:00000000
[   44.871763]  r4:c0a03008 r3:00000000
[   44.875404] [<c01ee55c>] (ksys_sync) from [<c014480c>] 
(pm_suspend+0xa0/0x2a4)
[   44.884096]  r5:c0a8f30c r4:00000003
[   44.887729] [<c014476c>] (pm_suspend) from [<c0143260>] 
(state_store+0xa4/0xd4)
[   44.896435]  r8:c07ba962 r7:c69ebf00 r6:00000004 r5:00000003 r4:00000003
[   44.904304] [<c01431bc>] (state_store) from [<c067d14c>] 
(kobj_attr_store+0x1c/0x28)
[   44.913242]  r9:00146be8 r8:c6b75f60 r7:c6bd1a90 r6:c6bd1a80 
r5:c69ebf00 r4:c01431bc
[   44.922127] [<c067d130>] (kobj_attr_store) from [<c022c12c>] 
(sysfs_kf_write+0x40/0x4c)
[   44.931227]  r5:c69ebf00 r4:c067d130
[   44.934848] [<c022c0ec>] (sysfs_kf_write) from [<c022b504>] 
(kernfs_fop_write+0x140/0x1b0)
[   44.944487]  r5:c69ebf00 r4:00000004
[   44.948109] [<c022b3c4>] (kernfs_fop_write) from [<c01bf47c>] 
(__vfs_write+0x40/0x154)
[   44.957419]  r10:00020000 r9:00000004 r8:c6b75f60 r7:c0a03008 
r6:00146be8 r5:c022b3c4
[   44.966352]  r4:c6945500
[   44.968935] [<c01bf43c>] (__vfs_write) from [<c01bf750>] 
(vfs_write+0xc4/0x150)
[   44.977656]  r9:00000004 r8:00000000 r7:c6b75f60 r6:00146be8 
r5:00000004 r4:c6945500
[   44.986529] [<c01bf68c>] (vfs_write) from [<c01bf954>] 
(ksys_write+0x54/0xa4)
[   44.994776]  r9:00000004 r8:00146be8 r7:c0a03008 r6:c6945503 
r5:001978b8 r4:c6945500
[   45.003650] [<c01bf900>] (ksys_write) from [<c01bf9b4>] 
(sys_write+0x10/0x14)
[   45.011893]  r9:c6b74000 r8:c01011e4 r7:00000004 r6:00000000 
r5:001978b8 r4:0000000e
[   45.019647] [<c01bf9a4>] (sys_write) from [<c0101000>] 
(ret_fast_syscall+0x0/0x50)
[   45.028608] Exception stack(0xc6b75fa8 to 0xc6b75ff0)
[   45.034782] 5fa0:                   0000000e 001978b8 0000000e 
00146be8 00000004 0000000e
[   45.044155] 5fc0: 0000000e 001978b8 00000000 00000004 001910d8 
b6594794 00000000 b64cfd14
[   45.053431] 5fe0: 00000000 befccb60 b62ed95c b62ee064
[   45.058924] ubi0: dumping 2048 bytes of data from PEB 102, offset 38912

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-01 10:18 marvell_nand driver fails to suspend Daniel Mack
@ 2018-07-01 19:04 ` Daniel Mack
  2018-07-02  7:20   ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Mack @ 2018-07-01 19:04 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi,

On Sunday, July 01, 2018 12:18 PM, Daniel Mack wrote:
> I'm seeing the below error when trying to suspend and resume a PXA3xx
> machine booted from devicetree with the new nand driver. This used to
> work fine with the old driver, but admittedly, the only kernel I
> currently have for reference testing is very old (3.0.4), and many other
> things regarding nand/mtd have also changed since then.
> 
> The suspend/resume implementation in the old driver used to call into
> the ->suspend() and ->resume() functions of its mtd_info children
> directly, but looking at other drivers, it seems this is no longer
> needed or wanted. It also cleared all interrupts during resume, but that
> alone doesn't fix it in my tests.
> 
> I haven't followed the development in that area, so I'd appreciate any
> hint on how to fix this regression. I'm happy to test patches.

I think I figured it out. Will send a patch.


Thanks,
Daniel



> 
> $ [   43.576363] PM: suspend entry (deep)
> [   43.579960] PM: Syncing filesystems ...
> [   44.660162] marvell-nfc 43100000.nand-controller: Timeout waiting for
> RB signal
> [   44.671492] ubi0 error: ubi_io_write: error -110 while writing 2048
> bytes to PEB 102:38912, written 0 bytes
> [   44.682887] CPU: 0 PID: 1417 Comm: remote-control Not tainted
> 4.18.0-rc2+ #344
> [   44.691197] Hardware name: Marvell PXA3xx (Device Tree Support)
> [   44.697111] Backtrace:
> [   44.699593] [<c0106458>] (dump_backtrace) from [<c0106718>]
> (show_stack+0x18/0x1c)
> [   44.708931]  r7:00000800 r6:00009800 r5:00000066 r4:c6139000
> [   44.715833] [<c0106700>] (show_stack) from [<c0678a60>]
> (dump_stack+0x20/0x28)
> [   44.724206] [<c0678a40>] (dump_stack) from [<c0456cbc>]
> (ubi_io_write+0x3d4/0x630)
> [   44.732925] [<c04568e8>] (ubi_io_write) from [<c0454428>]
> (ubi_eba_write_leb+0x690/0x6fc)
> [   44.742295]  r10:c67b1044 r9:c6b75dd8 r8:00000800 r7:c614bc00
> r6:00000080 r5:00008800
> [   44.751220]  r4:c6139000
> [   44.753817] [<c0453d98>] (ubi_eba_write_leb) from [<c04526d8>]
> (ubi_leb_write+0xc4/0xdc)
> [   44.763406]  r10:c67b1044 r9:c6b75dd8 r8:00000800 r7:00008800
> r6:00000080 r5:00008800
> [   44.772329]  r4:00000001
> [   44.774921] [<c0452614>] (ubi_leb_write) from [<c02d2ad4>]
> (ubifs_leb_write+0xa0/0x110)
> [   44.784405]  r6:00000218 r5:c66a6800 r4:c602a000
> [   44.789224] [<c02d2a34>] (ubifs_leb_write) from [<c02d3680>]
> (ubifs_wbuf_sync_nolock+0x244/0x334)
> [   44.799451]  r8:c01ee478 r7:00000760 r6:00000800 r5:c602a000 r4:c67a1880
> [   44.807311] [<c02d343c>] (ubifs_wbuf_sync_nolock) from [<c02ce54c>]
> (ubifs_sync_fs+0x40/0x84)
> [   44.817024]  r7:c67a18a4 r6:00000001 r5:c602a000 r4:c67a1880
> [   44.828758] [<c02ce50c>] (ubifs_sync_fs) from [<c01ee4a8>]
> (sync_fs_one_sb+0x30/0x34)
> [   44.837861]  r7:c0a179f0 r6:ffffe000 r5:c61fc000 r4:c67b1000
> [   44.844927] [<c01ee478>] (sync_fs_one_sb) from [<c01c29d8>]
> (iterate_supers+0xf8/0x138)
> [   44.854161] [<c01c28e0>] (iterate_supers) from [<c01ee5b4>]
> (ksys_sync+0x58/0xb8)
> [   44.862839]  r10:00000051 r9:c0a3f108 r8:c07bcdac r7:00000003
> r6:c0a03008 r5:00000000
> [   44.871763]  r4:c0a03008 r3:00000000
> [   44.875404] [<c01ee55c>] (ksys_sync) from [<c014480c>]
> (pm_suspend+0xa0/0x2a4)
> [   44.884096]  r5:c0a8f30c r4:00000003
> [   44.887729] [<c014476c>] (pm_suspend) from [<c0143260>]
> (state_store+0xa4/0xd4)
> [   44.896435]  r8:c07ba962 r7:c69ebf00 r6:00000004 r5:00000003 r4:00000003
> [   44.904304] [<c01431bc>] (state_store) from [<c067d14c>]
> (kobj_attr_store+0x1c/0x28)
> [   44.913242]  r9:00146be8 r8:c6b75f60 r7:c6bd1a90 r6:c6bd1a80
> r5:c69ebf00 r4:c01431bc
> [   44.922127] [<c067d130>] (kobj_attr_store) from [<c022c12c>]
> (sysfs_kf_write+0x40/0x4c)
> [   44.931227]  r5:c69ebf00 r4:c067d130
> [   44.934848] [<c022c0ec>] (sysfs_kf_write) from [<c022b504>]
> (kernfs_fop_write+0x140/0x1b0)
> [   44.944487]  r5:c69ebf00 r4:00000004
> [   44.948109] [<c022b3c4>] (kernfs_fop_write) from [<c01bf47c>]
> (__vfs_write+0x40/0x154)
> [   44.957419]  r10:00020000 r9:00000004 r8:c6b75f60 r7:c0a03008
> r6:00146be8 r5:c022b3c4
> [   44.966352]  r4:c6945500
> [   44.968935] [<c01bf43c>] (__vfs_write) from [<c01bf750>]
> (vfs_write+0xc4/0x150)
> [   44.977656]  r9:00000004 r8:00000000 r7:c6b75f60 r6:00146be8
> r5:00000004 r4:c6945500
> [   44.986529] [<c01bf68c>] (vfs_write) from [<c01bf954>]
> (ksys_write+0x54/0xa4)
> [   44.994776]  r9:00000004 r8:00146be8 r7:c0a03008 r6:c6945503
> r5:001978b8 r4:c6945500
> [   45.003650] [<c01bf900>] (ksys_write) from [<c01bf9b4>]
> (sys_write+0x10/0x14)
> [   45.011893]  r9:c6b74000 r8:c01011e4 r7:00000004 r6:00000000
> r5:001978b8 r4:0000000e
> [   45.019647] [<c01bf9a4>] (sys_write) from [<c0101000>]
> (ret_fast_syscall+0x0/0x50)
> [   45.028608] Exception stack(0xc6b75fa8 to 0xc6b75ff0)
> [   45.034782] 5fa0:                   0000000e 001978b8 0000000e
> 00146be8 00000004 0000000e
> [   45.044155] 5fc0: 0000000e 001978b8 00000000 00000004 001910d8
> b6594794 00000000 b64cfd14
> [   45.053431] 5fe0: 00000000 befccb60 b62ed95c b62ee064
> [   45.058924] ubi0: dumping 2048 bytes of data from PEB 102, offset 38912
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-01 19:04 ` Daniel Mack
@ 2018-07-02  7:20   ` Miquel Raynal
  2018-07-02  7:48     ` Daniel Mack
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2018-07-02  7:20 UTC (permalink / raw)
  To: Daniel Mack; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Daniel,

Daniel Mack <daniel@zonque.org> wrote on Sun, 1 Jul 2018 21:04:58 +0200:

> Hi,
> 
> On Sunday, July 01, 2018 12:18 PM, Daniel Mack wrote:
> > I'm seeing the below error when trying to suspend and resume a PXA3xx
> > machine booted from devicetree with the new nand driver. This used to
> > work fine with the old driver, but admittedly, the only kernel I
> > currently have for reference testing is very old (3.0.4), and many other
> > things regarding nand/mtd have also changed since then.  
> > > The suspend/resume implementation in the old driver used to call into  
> > the ->suspend() and ->resume() functions of its mtd_info children
> > directly, but looking at other drivers, it seems this is no longer
> > needed or wanted. It also cleared all interrupts during resume, but that
> > alone doesn't fix it in my tests.  
> > > I haven't followed the development in that area, so I'd appreciate any  
> > hint on how to fix this regression. I'm happy to test patches.  
> 
> I think I figured it out. Will send a patch.

Good to see you figured it out. Indeed there are no more suspend/resume
callbacks, waiting for your patches to fix this.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-02  7:20   ` Miquel Raynal
@ 2018-07-02  7:48     ` Daniel Mack
  2018-07-03  7:05       ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Mack @ 2018-07-02  7:48 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Miquel,

On Monday, July 02, 2018 09:20 AM, Miquel Raynal wrote:
> Daniel Mack <daniel@zonque.org> wrote on Sun, 1 Jul 2018 21:04:58 +0200:
>> On Sunday, July 01, 2018 12:18 PM, Daniel Mack wrote:
>>> I'm seeing the below error when trying to suspend and resume a PXA3xx
>>> machine booted from devicetree with the new nand driver. This used to
>>> work fine with the old driver, but admittedly, the only kernel I
>>> currently have for reference testing is very old (3.0.4), and many other
>>> things regarding nand/mtd have also changed since then.
>>>> The suspend/resume implementation in the old driver used to call into
>>> the ->suspend() and ->resume() functions of its mtd_info children
>>> directly, but looking at other drivers, it seems this is no longer
>>> needed or wanted. It also cleared all interrupts during resume, but that
>>> alone doesn't fix it in my tests.
>>>> I haven't followed the development in that area, so I'd appreciate any
>>> hint on how to fix this regression. I'm happy to test patches.
>>
>> I think I figured it out. Will send a patch.
> 
> Good to see you figured it out. Indeed there are no more suspend/resume
> callbacks, waiting for your patches to fix this.

Hmm, I got confused in my test setup last night, so no, I haven't 
figured it out yet. I've pasted the current version of the callbacks 
below, but that doesn't cut it.

The issue is easy to reproduce:

# echo mem >/sys/power/state
[wake up the device]
# sync

I'll give it a spin in a couple of days again, but if anything comes to 
your mind, please let me know.

And FTR, the device node has keep-config set for this board.


Thanks,
Daniel




static int __maybe_unused marvell_nfc_suspend(struct device *dev)
{
         struct marvell_nfc *nfc = dev_get_drvdata(dev);

         clk_disable_unprepare(nfc->core_clk);

         return 0;
}

static int __maybe_unused marvell_nfc_resume(struct device *dev)
{
         struct marvell_nfc *nfc = dev_get_drvdata(dev);
         int ret;

         ret = clk_prepare_enable(nfc->core_clk);
         if (ret < 0)
                 return ret;

         marvell_nfc_disable_int(nfc, NDCR_ALL_INT);
         marvell_nfc_clear_int(nfc, NDCR_ALL_INT);

         /*
          * Reset nfc->selected_chip so the next command will cause the 
timing
          * registers to be restored in marvell_nfc_select_chip().
          */
         nfc->selected_chip = NULL;

         return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-02  7:48     ` Daniel Mack
@ 2018-07-03  7:05       ` Miquel Raynal
  2018-07-05  8:22         ` Daniel Mack
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2018-07-03  7:05 UTC (permalink / raw)
  To: Daniel Mack; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Daniel,

Daniel Mack <daniel@zonque.org> wrote on Mon, 2 Jul 2018 09:48:01 +0200:

> Hi Miquel,
> 
> On Monday, July 02, 2018 09:20 AM, Miquel Raynal wrote:
> > Daniel Mack <daniel@zonque.org> wrote on Sun, 1 Jul 2018 21:04:58 +0200:  
> >> On Sunday, July 01, 2018 12:18 PM, Daniel Mack wrote:  
> >>> I'm seeing the below error when trying to suspend and resume a PXA3xx
> >>> machine booted from devicetree with the new nand driver. This used to
> >>> work fine with the old driver, but admittedly, the only kernel I
> >>> currently have for reference testing is very old (3.0.4), and many other
> >>> things regarding nand/mtd have also changed since then.  
> >>>> The suspend/resume implementation in the old driver used to call into  
> >>> the ->suspend() and ->resume() functions of its mtd_info children
> >>> directly, but looking at other drivers, it seems this is no longer
> >>> needed or wanted. It also cleared all interrupts during resume, but that
> >>> alone doesn't fix it in my tests.  
> >>>> I haven't followed the development in that area, so I'd appreciate any  
> >>> hint on how to fix this regression. I'm happy to test patches.  
> >>
> >> I think I figured it out. Will send a patch.
> > > Good to see you figured it out. Indeed there are no more suspend/resume  
> > callbacks, waiting for your patches to fix this.  
> 
> Hmm, I got confused in my test setup last night, so no, I haven't figured it out yet. I've pasted the current version of the callbacks below, but that doesn't cut it.
> 
> The issue is easy to reproduce:
> 
> # echo mem >/sys/power/state
> [wake up the device]
> # sync
> 
> I'll give it a spin in a couple of days again, but if anything comes to your mind, please let me know.
> 
> And FTR, the device node has keep-config set for this board.

Can you try without? It should work anyway, but it would allow us to
see if it's a timing issue.

Also, can you try with the mtd-utils directly (without UBI/UBIFS) with
nanddump flash_erase and nandwrite to see what are the errors raised by
the controller driver.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-03  7:05       ` Miquel Raynal
@ 2018-07-05  8:22         ` Daniel Mack
  2018-07-06  8:27           ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Mack @ 2018-07-05  8:22 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Miquel,

On Tuesday, July 03, 2018 09:05 AM, Miquel Raynal wrote:
> Daniel Mack <daniel@zonque.org> wrote on Mon, 2 Jul 2018 09:48:01 +0200:

>> I'll give it a spin in a couple of days again, but if anything comes to your mind, please let me know.
>>
>> And FTR, the device node has keep-config set for this board.
> 
> Can you try without? It should work anyway, but it would allow us to
> see if it's a timing issue.

I can't easily, because I don't have the individual timing parameters at 
hand. I could of course re-engineer them from the NDTR0/NDTR1 values the 
bootloader sets, but that would only lead to these registers being 
restored to the same value as they currently have. And as they're read 
during probe() anyway and re-used in marvell_nfc_select_chip(), I don't 
think that's the problem.

Btw - the comment in that function doesn't quite make sense; the 
registers are written unconditionally and regardless of keep-config:

	/*
	 * Do not change the timing registers when using the DT property
	 * marvell,nand-keep-config; in that case ->ndtr0 and ->ndtr1 from the
	 * marvell_nand structure are supposedly empty.
	 */
	writel_relaxed(marvell_nand->ndtr0, nfc->regs + NDTR0);
	writel_relaxed(marvell_nand->ndtr1, nfc->regs + NDTR1);

I have a patch to fix that.

> Also, can you try with the mtd-utils directly (without UBI/UBIFS) with
> nanddump flash_erase and nandwrite to see what are the errors raised by
> the controller driver.

The effect is that the controller does not get any interrupts anymore 
after the resume. My test setup now executes the following command sequence:

   # nandtest /dev/mtd2
   # echo mem >/sys/power/state
   [make the hardware resume]
   # echo "RESUMED!"
   # nandtest /dev/mtd2

The first nandtest works, the second fails. I've added some printks in 
select_chip() and a WARN_ON(!ret) in marvell_nfc_wait_op() to get a 
stack trace, and this is the output after resume:

> [   43.358823] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> [   44.417980] ------------[ cut here ]------------
> [   44.422678] WARNING: CPU: 0 PID: 1297 at drivers/mtd/nand/raw/marvell_nand.c:629 marvell_nfc_wait_op+0x88/0xb8 [marvell_nand]
> [   44.433994] Modules linked in: marvell_nand pxamci
> [   44.438853] CPU: 0 PID: 1297 Comm: nandtest Not tainted 4.18.0-rc3+ #392
> [   44.445510] Hardware name: Marvell PXA3xx (Device Tree Support)
> [   44.451450] Backtrace: 
> [   44.453938] [<c0106458>] (dump_backtrace) from [<c0106718>] (show_stack+0x18/0x1c)
> [   44.461533]  r7:00000009 r6:00000000 r5:bf008e87 r4:00000000
> [   44.467185] [<c0106700>] (show_stack) from [<c066f5e0>] (dump_stack+0x20/0x28)
> [   44.474468] [<c066f5c0>] (dump_stack) from [<c0111f9c>] (__warn+0xe8/0x104)
> [   44.481479] [<c0111eb4>] (__warn) from [<c01120d4>] (warn_slowpath_null+0x44/0x50)
> [   44.489077]  r9:00000004 r8:bf0094e4 r7:000003e8 r6:bf0075cc r5:00000275 r4:bf008e87
> [   44.496810] [<c0112090>] (warn_slowpath_null) from [<bf0075cc>] (marvell_nfc_wait_op+0x88/0xb8 [marvell_nand])
> [   44.506814]  r6:c66ba80c r5:c66ba7f0 r4:00000000
> [   44.511503] [<bf007544>] (marvell_nfc_wait_op [marvell_nand]) from [<bf0079e0>] (marvell_nfc_erase_cmd_type_exec+0x74/0xa8 [marvell_nand])
> [   44.523919]  r7:0000000c r6:00000000 r5:c66bc810 r4:c0a03008
> [   44.529647] [<bf00796c>] (marvell_nfc_erase_cmd_type_exec [marvell_nand]) from [<c04382c8>] (nand_op_parser_exec_op+0x148/0x25c)
> [   44.541214]  r6:00000001 r5:bf008760 r4:c67c3c30
> [   44.545839] [<c0438180>] (nand_op_parser_exec_op) from [<bf007500>] (marvell_nfc_exec_op+0x30/0x3c [marvell_nand])
> [   44.556195]  r10:00000001 r9:ffffffff r8:c0a03008 r7:89705f41 r6:36b4a597 r5:c67c3c38
> [   44.564023]  r4:c66bc810
> [   44.566577] [<bf0074d0>] (marvell_nfc_exec_op [marvell_nand]) from [<c0437dd4>] (nand_erase_op+0x148/0x1f4)
> [   44.576330] [<c0437c8c>] (nand_erase_op) from [<c04383fc>] (single_erase+0x20/0x24)
> [   44.584000]  r8:00000180 r7:00000040 r6:c6815900 r5:00000000 r4:c66bc810
> [   44.590745] [<c04383dc>] (single_erase) from [<c043f544>] (nand_erase_nand+0x1b0/0x23c)
> [   44.598781] [<c043f394>] (nand_erase_nand) from [<c043f6e8>] (nand_erase+0x14/0x18)
> [   44.606399]  r10:c0a03008 r9:00000000 r8:00000000 r7:00000000 r6:000c0000 r5:c67b5670
> [   44.614237]  r4:c6815900
> [   44.616794] [<c043f6d4>] (nand_erase) from [<c04300ac>] (part_erase+0x34/0x74)
> [   44.624051] [<c0430078>] (part_erase) from [<c042c794>] (mtd_erase+0x74/0x9c)
> [   44.631215]  r7:00000000 r6:00020000 r5:00000000 r4:00060000
> [   44.636854] [<c042c720>] (mtd_erase) from [<c0432c4c>] (mtdchar_ioctl+0x1188/0x11b0)
> [   44.644618]  r9:c67c2000 r8:00000051 r7:c6815900 r6:00000000 r5:c67b5400 r4:bee7dd08
> [   44.652396] [<c0431ac4>] (mtdchar_ioctl) from [<c0432ca8>] (mtdchar_unlocked_ioctl+0x34/0x4c)
> [   44.660947]  r10:c0a03008 r9:c67c2000 r8:bee7dd08 r7:bee7dd08 r6:40084d02 r5:c6650280
> [   44.668784]  r4:c0a227c0
> [   44.671339] [<c0432c74>] (mtdchar_unlocked_ioctl) from [<c01d17d4>] (vfs_ioctl+0x28/0x3c)
> [   44.679533]  r7:40084d02 r6:c6650280 r5:c6679c28 r4:bee7dd08
> [   44.685174] [<c01d17ac>] (vfs_ioctl) from [<c01d19c4>] (do_vfs_ioctl+0x98/0x904)
> [   44.692604] [<c01d192c>] (do_vfs_ioctl) from [<c01d2270>] (ksys_ioctl+0x40/0x5c)
> [   44.700033]  r10:00020000 r9:c67c2000 r8:bee7dd08 r7:40084d02 r6:c6650280 r5:00000003
> [   44.707805]  r4:c6650280
> [   44.710398] [<c01d2230>] (ksys_ioctl) from [<c01d229c>] (sys_ioctl+0x10/0x14)
> [   44.717494]  r9:c67c2000 r8:c01011e4 r7:00000036 r6:00000000 r5:00022098 r4:000220c0
> [   44.725265] [<c01d228c>] (sys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x50)
> [   44.732855] Exception stack(0xc67c3fa8 to 0xc67c3ff0)
> [   44.737954] 3fa0:                   000220c0 00022098 00000003 40084d02 bee7dd08 00020000
> [   44.746087] 3fc0: 000220c0 00022098 00000000 00000036 00022084 b6d5a008 b6d5a008 b6d5a008
> [   44.754273] 3fe0: 00022024 bee7dcec 00010e34 b6e67dbc
> [   44.759363] ---[ end trace b29035065663519d ]---
> [   44.763970] marvell-nfc 43100000.nand-controller: Timeout waiting for RB signal
> MEMERASE: Input/output error
> [   44.771960] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> 00020000: erasing... [   44.812608] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> [   45.857970] ------------[ cut here ]------------

FWIW, my suspend/resume routines gained some more flesh now, but 
something is still missing I suppose:

> static int __maybe_unused marvell_nfc_suspend(struct device *dev)
> {
> 	struct marvell_nfc *nfc = dev_get_drvdata(dev);
> 	struct marvell_nand_chip *chip;
> 
> 	list_for_each_entry(chip, &nfc->chips, node)
> 		marvell_nfc_wait_ndrun(&chip->chip);
> 
> 	clk_disable_unprepare(nfc->core_clk);
> 
> 	return 0;
> }
> 
> static int __maybe_unused marvell_nfc_resume(struct device *dev)
> {
> 	struct marvell_nfc *nfc = dev_get_drvdata(dev);
> 	int ret;
> 
> 	ret = clk_prepare_enable(nfc->core_clk);
> 	if (ret < 0)
> 		return ret;
> 
> 	/*
> 	 * Reset nfc->selected_chip so the next command will cause the timing
> 	 * registers to be restored in marvell_nfc_select_chip().
> 	 */
> 	nfc->selected_chip = NULL;
> 
> 	writel_relaxed(NDCR_ALL_INT | NDCR_ND_ARB_EN | NDCR_SPARE_EN |
> 		       NDCR_RD_ID_CNT(NFCV1_READID_LEN), nfc->regs + NDCR);
> 	writel_relaxed(0xFFFFFFFF, nfc->regs + NDSR);
> 	writel_relaxed(0, nfc->regs + NDECCCTRL);
> 
> 	return 0;
> }

The controller loses all its state after resume, but I don't see which 
register is not re-initialized after that, but I might miss something.

I'll get back to this in a couple of days, because I have to do this in 
my spare time. Miquel, do you have any platform with such a controller 
that is suspend-able? It might still be an issue outside of this driver, 
after all.



Thanks,
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-05  8:22         ` Daniel Mack
@ 2018-07-06  8:27           ` Miquel Raynal
  2018-07-06  8:41             ` Daniel Mack
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2018-07-06  8:27 UTC (permalink / raw)
  To: Daniel Mack; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Daniel,

Daniel Mack <daniel@zonque.org> wrote on Thu, 5 Jul 2018 10:22:29 +0200:

> Hi Miquel,
> 
> On Tuesday, July 03, 2018 09:05 AM, Miquel Raynal wrote:
> > Daniel Mack <daniel@zonque.org> wrote on Mon, 2 Jul 2018 09:48:01 +0200:  
> 
> >> I'll give it a spin in a couple of days again, but if anything comes to your mind, please let me know.
> >>
> >> And FTR, the device node has keep-config set for this board.
> > > Can you try without? It should work anyway, but it would allow us to  
> > see if it's a timing issue.  
> 
> I can't easily, because I don't have the individual timing parameters at hand. I could of course re-engineer them from the NDTR0/NDTR1 values the bootloader sets, but that would only lead to these registers being restored to the same value as they currently have. And as they're read during probe() anyway and re-used in marvell_nfc_select_chip(), I don't think that's the problem.

I do agree with the second part of your statement. However I don't
understand why you talk about re-engineering the timings. This
"keep-config" DT property was just an ugly hack to avoid implementing
->setup_data_interface() and rely on the Bootloader's setup. While this
work at the slowest speed, it is clearly not as efficient as tuning the
timings at probe time depending on the NAND chip. I recently changed
the DT NAND nodes to 1/ separate the controller and the NAND chip and
2/ remove superfluous properties like this one. If you are against NAND
throughput, I suggest you to test without it :)

> 
> Btw - the comment in that function doesn't quite make sense; the registers are written unconditionally and regardless of keep-config:
> 
> 	/*
> 	 * Do not change the timing registers when using the DT property
> 	 * marvell,nand-keep-config; in that case ->ndtr0 and ->ndtr1 from the
> 	 * marvell_nand structure are supposedly empty.
> 	 */
> 	writel_relaxed(marvell_nand->ndtr0, nfc->regs + NDTR0);
> 	writel_relaxed(marvell_nand->ndtr1, nfc->regs + NDTR1);
> 
> I have a patch to fix that.

This comment should not be there anymore. The logic described in the
first sentence is true and already handled in marvell_nand_chip_init().
But at any moment ->ndtr[0|1] are clearly not supposed to be empty
(anymore).

> 
> > Also, can you try with the mtd-utils directly (without UBI/UBIFS) with
> > nanddump flash_erase and nandwrite to see what are the errors raised by
> > the controller driver.  
> 
> The effect is that the controller does not get any interrupts anymore after the resume. My test setup now executes the following command sequence:
> 
>    # nandtest /dev/mtd2
>    # echo mem >/sys/power/state
>    [make the hardware resume]
>    # echo "RESUMED!"
>    # nandtest /dev/mtd2
> 
> The first nandtest works, the second fails. I've added some printks in select_chip() and a WARN_ON(!ret) in marvell_nfc_wait_op() to get a stack trace, and this is the output after resume:
> 
> > [   43.358823] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> > [   44.417980] ------------[ cut here ]------------
> > [   44.422678] WARNING: CPU: 0 PID: 1297 at drivers/mtd/nand/raw/marvell_nand.c:629 marvell_nfc_wait_op+0x88/0xb8 [marvell_nand]
> > [   44.433994] Modules linked in: marvell_nand pxamci
> > [   44.438853] CPU: 0 PID: 1297 Comm: nandtest Not tainted 4.18.0-rc3+ #392
> > [   44.445510] Hardware name: Marvell PXA3xx (Device Tree Support)
> > [   44.451450] Backtrace: > [   44.453938] [<c0106458>] (dump_backtrace) from [<c0106718>] (show_stack+0x18/0x1c)
> > [   44.461533]  r7:00000009 r6:00000000 r5:bf008e87 r4:00000000
> > [   44.467185] [<c0106700>] (show_stack) from [<c066f5e0>] (dump_stack+0x20/0x28)
> > [   44.474468] [<c066f5c0>] (dump_stack) from [<c0111f9c>] (__warn+0xe8/0x104)
> > [   44.481479] [<c0111eb4>] (__warn) from [<c01120d4>] (warn_slowpath_null+0x44/0x50)
> > [   44.489077]  r9:00000004 r8:bf0094e4 r7:000003e8 r6:bf0075cc r5:00000275 r4:bf008e87
> > [   44.496810] [<c0112090>] (warn_slowpath_null) from [<bf0075cc>] (marvell_nfc_wait_op+0x88/0xb8 [marvell_nand])
> > [   44.506814]  r6:c66ba80c r5:c66ba7f0 r4:00000000
> > [   44.511503] [<bf007544>] (marvell_nfc_wait_op [marvell_nand]) from [<bf0079e0>] (marvell_nfc_erase_cmd_type_exec+0x74/0xa8 [marvell_nand])
> > [   44.523919]  r7:0000000c r6:00000000 r5:c66bc810 r4:c0a03008
> > [   44.529647] [<bf00796c>] (marvell_nfc_erase_cmd_type_exec [marvell_nand]) from [<c04382c8>] (nand_op_parser_exec_op+0x148/0x25c)
> > [   44.541214]  r6:00000001 r5:bf008760 r4:c67c3c30
> > [   44.545839] [<c0438180>] (nand_op_parser_exec_op) from [<bf007500>] (marvell_nfc_exec_op+0x30/0x3c [marvell_nand])
> > [   44.556195]  r10:00000001 r9:ffffffff r8:c0a03008 r7:89705f41 r6:36b4a597 r5:c67c3c38
> > [   44.564023]  r4:c66bc810
> > [   44.566577] [<bf0074d0>] (marvell_nfc_exec_op [marvell_nand]) from [<c0437dd4>] (nand_erase_op+0x148/0x1f4)
> > [   44.576330] [<c0437c8c>] (nand_erase_op) from [<c04383fc>] (single_erase+0x20/0x24)
> > [   44.584000]  r8:00000180 r7:00000040 r6:c6815900 r5:00000000 r4:c66bc810
> > [   44.590745] [<c04383dc>] (single_erase) from [<c043f544>] (nand_erase_nand+0x1b0/0x23c)
> > [   44.598781] [<c043f394>] (nand_erase_nand) from [<c043f6e8>] (nand_erase+0x14/0x18)
> > [   44.606399]  r10:c0a03008 r9:00000000 r8:00000000 r7:00000000 r6:000c0000 r5:c67b5670
> > [   44.614237]  r4:c6815900
> > [   44.616794] [<c043f6d4>] (nand_erase) from [<c04300ac>] (part_erase+0x34/0x74)
> > [   44.624051] [<c0430078>] (part_erase) from [<c042c794>] (mtd_erase+0x74/0x9c)
> > [   44.631215]  r7:00000000 r6:00020000 r5:00000000 r4:00060000
> > [   44.636854] [<c042c720>] (mtd_erase) from [<c0432c4c>] (mtdchar_ioctl+0x1188/0x11b0)
> > [   44.644618]  r9:c67c2000 r8:00000051 r7:c6815900 r6:00000000 r5:c67b5400 r4:bee7dd08
> > [   44.652396] [<c0431ac4>] (mtdchar_ioctl) from [<c0432ca8>] (mtdchar_unlocked_ioctl+0x34/0x4c)
> > [   44.660947]  r10:c0a03008 r9:c67c2000 r8:bee7dd08 r7:bee7dd08 r6:40084d02 r5:c6650280
> > [   44.668784]  r4:c0a227c0
> > [   44.671339] [<c0432c74>] (mtdchar_unlocked_ioctl) from [<c01d17d4>] (vfs_ioctl+0x28/0x3c)
> > [   44.679533]  r7:40084d02 r6:c6650280 r5:c6679c28 r4:bee7dd08
> > [   44.685174] [<c01d17ac>] (vfs_ioctl) from [<c01d19c4>] (do_vfs_ioctl+0x98/0x904)
> > [   44.692604] [<c01d192c>] (do_vfs_ioctl) from [<c01d2270>] (ksys_ioctl+0x40/0x5c)
> > [   44.700033]  r10:00020000 r9:c67c2000 r8:bee7dd08 r7:40084d02 r6:c6650280 r5:00000003
> > [   44.707805]  r4:c6650280
> > [   44.710398] [<c01d2230>] (ksys_ioctl) from [<c01d229c>] (sys_ioctl+0x10/0x14)
> > [   44.717494]  r9:c67c2000 r8:c01011e4 r7:00000036 r6:00000000 r5:00022098 r4:000220c0
> > [   44.725265] [<c01d228c>] (sys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x50)
> > [   44.732855] Exception stack(0xc67c3fa8 to 0xc67c3ff0)
> > [   44.737954] 3fa0:                   000220c0 00022098 00000003 40084d02 bee7dd08 00020000
> > [   44.746087] 3fc0: 000220c0 00022098 00000000 00000036 00022084 b6d5a008 b6d5a008 b6d5a008
> > [   44.754273] 3fe0: 00022024 bee7dcec 00010e34 b6e67dbc
> > [   44.759363] ---[ end trace b29035065663519d ]---
> > [   44.763970] marvell-nfc 43100000.nand-controller: Timeout waiting for RB signal
> > MEMERASE: Input/output error
> > [   44.771960] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> > 00020000: erasing... [   44.812608] marvell_nfc_select_chip(): 0016271c 0f3d00f2
> > [   45.857970] ------------[ cut here ]------------  
> 
> FWIW, my suspend/resume routines gained some more flesh now, but something is still missing I suppose:
> 
> > static int __maybe_unused marvell_nfc_suspend(struct device *dev)
> > {
> > 	struct marvell_nfc *nfc = dev_get_drvdata(dev);
> > 	struct marvell_nand_chip *chip;  
> > > 	list_for_each_entry(chip, &nfc->chips, node)  
> > 		marvell_nfc_wait_ndrun(&chip->chip);  
> > > 	clk_disable_unprepare(nfc->core_clk);
> > > 	return 0;  
> > }  
> > > static int __maybe_unused marvell_nfc_resume(struct device *dev)  
> > {
> > 	struct marvell_nfc *nfc = dev_get_drvdata(dev);
> > 	int ret;  
> > > 	ret = clk_prepare_enable(nfc->core_clk);  
> > 	if (ret < 0)
> > 		return ret;  
> > > 	/*  
> > 	 * Reset nfc->selected_chip so the next command will cause the timing
> > 	 * registers to be restored in marvell_nfc_select_chip().
> > 	 */
> > 	nfc->selected_chip = NULL;  
> > > 	writel_relaxed(NDCR_ALL_INT | NDCR_ND_ARB_EN | NDCR_SPARE_EN |  
> > 		       NDCR_RD_ID_CNT(NFCV1_READID_LEN), nfc->regs + NDCR);
> > 	writel_relaxed(0xFFFFFFFF, nfc->regs + NDSR);
> > 	writel_relaxed(0, nfc->regs + NDECCCTRL);  
> > > 	return 0;  
> > }  
> 
> The controller loses all its state after resume, but I don't see which register is not re-initialized after that, but I might miss something.

I suppose you are good with the above hooks.

Maybe you can add traces in the IRQ handler and also dump NDSR on
timeout. Let's check:
- If the IRQ is fired or not
- If the right bit is set or not

I don't have more fresh ideas right now...

> 
> I'll get back to this in a couple of days, because I have to do this in my spare time. Miquel, do you have any platform with such a controller that is suspend-able? It might still be an issue outside of this driver, after all.

If it worked with the pxa3xx_nand.c driver, then it's probably a driver
issue... Can you test with the pxa3xx_nand.c driver with the same
kernel version? You'll have to revert at least:
7576594c8e69 mtd: nand: remove useless fields from pxa3xx NAND platform data
cc396436c24d mtd: nand: remove deprecated pxa3xx_nand driver
925d5e426861 ARM: dts: armada-38x: update NAND node with new bindings

I have some platforms with such controller but I never used
suspend/resume on them, I'll have to check.


Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-06  8:27           ` Miquel Raynal
@ 2018-07-06  8:41             ` Daniel Mack
  2018-07-06  9:02               ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Mack @ 2018-07-06  8:41 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Miquel,

On Friday, July 06, 2018 10:27 AM, Miquel Raynal wrote:
> Daniel Mack <daniel@zonque.org> wrote on Thu, 5 Jul 2018 10:22:29
>> I can't easily, because I don't have the individual timing
>> parameters at hand. I could of course re-engineer them from the
>> NDTR0/NDTR1 values the bootloader sets, but that would only lead to
>> these registers being restored to the same value as they currently
>> have. And as they're read during probe() anyway and re-used in
>> marvell_nfc_select_chip(), I don't think that's the problem.
> 
> I do agree with the second part of your statement. However I don't 
> understand why you talk about re-engineering the timings.

The timings were calculated when I wrote the bootloader some years back, 
and all I'm saying is that at least for now, the kernel doesn't need to 
do that again. It should instead just take whatever has been provided by 
the bootloader.

> This 
> "keep-config" DT property was just an ugly hack to avoid
> implementing ->setup_data_interface() and rely on the Bootloader's
> setup. While this work at the slowest speed, it is clearly not as
> efficient as tuning the timings at probe time depending on the NAND
> chip. I recently changed the DT NAND nodes to 1/ separate the
> controller and the NAND chip and 2/ remove superfluous properties
> like this one.

Do you have patches ready for that?

> If you are against NAND throughput, I suggest you to
> test without it :)

I have no problem adding the timing properties to DT, but I don't expect 
that to improve the throughput. The bootloader is specifically written 
for this board, and it has optimized timing values already.

>> /* * Do not change the timing registers when using the DT property 
>> * marvell,nand-keep-config; in that case ->ndtr0 and ->ndtr1 from
>> the * marvell_nand structure are supposedly empty. */ 
>> writel_relaxed(marvell_nand->ndtr0, nfc->regs + NDTR0); 
>> writel_relaxed(marvell_nand->ndtr1, nfc->regs + NDTR1);
> 
> This comment should not be there anymore. The logic described in the 
> first sentence is true and already handled in
> marvell_nand_chip_init(). But at any moment ->ndtr[0|1] are clearly
> not supposed to be empty (anymore).

But it's okay to write them here, that's what you're saying? For the 
resume case, this is actually exactly what's needed, as the NDTR 
registers contents are lost in low-power mode.

[...]

>> The controller loses all its state after resume, but I don't see
>> which register is not re-initialized after that, but I might miss
>> something.
> 
> I suppose you are good with the above hooks.
> 
> Maybe you can add traces in the IRQ handler and also dump NDSR on 
> timeout. Let's check: - If the IRQ is fired or not - If the right bit
> is set or not

Yes, I did that, and the ISR is never entered after resume.

> If it worked with the pxa3xx_nand.c driver, then it's probably a
> driver issue... Can you test with the pxa3xx_nand.c driver with the
> same kernel version?

That's be my next attempt, yes. Thanks for pointing me to the right 
commits :)

> I have some platforms with such controller but I never used 
> suspend/resume on them, I'll have to check.

It would be interesting to see if this works on other platforms, yes.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-06  8:41             ` Daniel Mack
@ 2018-07-06  9:02               ` Miquel Raynal
  2018-07-06 20:07                 ` Daniel Mack
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2018-07-06  9:02 UTC (permalink / raw)
  To: Daniel Mack; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

Hi Daniel,

Daniel Mack <daniel@zonque.org> wrote on Fri, 6 Jul 2018 10:41:35 +0200:

> Hi Miquel,
> 
> On Friday, July 06, 2018 10:27 AM, Miquel Raynal wrote:
> > Daniel Mack <daniel@zonque.org> wrote on Thu, 5 Jul 2018 10:22:29  
> >> I can't easily, because I don't have the individual timing
> >> parameters at hand. I could of course re-engineer them from the
> >> NDTR0/NDTR1 values the bootloader sets, but that would only lead to
> >> these registers being restored to the same value as they currently
> >> have. And as they're read during probe() anyway and re-used in
> >> marvell_nfc_select_chip(), I don't think that's the problem.
> > > I do agree with the second part of your statement. However I don't > understand why you talk about re-engineering the timings.  
> 
> The timings were calculated when I wrote the bootloader some years back, and all I'm saying is that at least for now, the kernel doesn't need to do that again. It should instead just take whatever has been provided by the bootloader.
> 
> > This > "keep-config" DT property was just an ugly hack to avoid
> > implementing ->setup_data_interface() and rely on the Bootloader's
> > setup. While this work at the slowest speed, it is clearly not as
> > efficient as tuning the timings at probe time depending on the NAND
> > chip. I recently changed the DT NAND nodes to 1/ separate the
> > controller and the NAND chip and 2/ remove superfluous properties
> > like this one.  
> 
> Do you have patches ready for that?

They are in the official tree now, the one I pointed below does this
for armada-38x which is I think the SoC you use?

925d5e426861 ARM: dts: armada-38x: update NAND node with new bindings

> 
> > If you are against NAND throughput, I suggest you to
> > test without it :)  
> 
> I have no problem adding the timing properties to DT,

I do :)

> but I don't expect that to improve the throughput. The bootloader is specifically written for this board, and it has optimized timing values already.

Ok, that was just an FYI.

> 
> >> /* * Do not change the timing registers when using the DT property >> * marvell,nand-keep-config; in that case ->ndtr0 and ->ndtr1 from
> >> the * marvell_nand structure are supposedly empty. */ >> writel_relaxed(marvell_nand->ndtr0, nfc->regs + NDTR0); >> writel_relaxed(marvell_nand->ndtr1, nfc->regs + NDTR1);
> > > This comment should not be there anymore. The logic described in the > first sentence is true and already handled in  
> > marvell_nand_chip_init(). But at any moment ->ndtr[0|1] are clearly
> > not supposed to be empty (anymore).  
> 
> But it's okay to write them here, that's what you're saying?

It's not just "ok", it's needed!

BTW, when I say "->ndtr[0|1]" I mean "the ndtr fields in the driver
structure", not "the physical registers".

> For the resume case, this is actually exactly what's needed, as the NDTR registers contents are lost in low-power mode.

The way you handle the timings looks good. Having selected_chip to
NULL will force the next call of ->select_chip() (done by the core,
please check this happens with a printk) to rewrite the NDTR registers
with valid saved values.

> 
> [...]
> 
> >> The controller loses all its state after resume, but I don't see
> >> which register is not re-initialized after that, but I might miss
> >> something.
> > > I suppose you are good with the above hooks.
> > > Maybe you can add traces in the IRQ handler and also dump NDSR on > timeout. Let's check: - If the IRQ is fired or not - If the right bit  
> > is set or not  
> 
> Yes, I did that, and the ISR is never entered after resume.

Is the interrupt bit really enabled and the RB bit set or not on
timeout? Can you dump NDCR and NDSR for that purpose?

> 
> > If it worked with the pxa3xx_nand.c driver, then it's probably a
> > driver issue... Can you test with the pxa3xx_nand.c driver with the
> > same kernel version?  
> 
> That's be my next attempt, yes. Thanks for pointing me to the right commits :)
> 
> > I have some platforms with such controller but I never used > suspend/resume on them, I'll have to check.  
> 
> It would be interesting to see if this works on other platforms, yes.
> 
> 
> Thanks,
> Daniel
> 

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: marvell_nand driver fails to suspend
  2018-07-06  9:02               ` Miquel Raynal
@ 2018-07-06 20:07                 ` Daniel Mack
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Mack @ 2018-07-06 20:07 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Robert Jarzmik, Boris Brezillon, David Woodhouse, linux-mtd

On Friday, July 06, 2018 11:02 AM, Miquel Raynal wrote:
> Daniel Mack <daniel@zonque.org> wrote on Fri, 6 Jul 2018 10:41:35 +0200:
>> On Friday, July 06, 2018 10:27 AM, Miquel Raynal wrote:

>>> This > "keep-config" DT property was just an ugly hack to avoid
>>> implementing ->setup_data_interface() and rely on the Bootloader's
>>> setup. While this work at the slowest speed, it is clearly not as
>>> efficient as tuning the timings at probe time depending on the NAND
>>> chip. I recently changed the DT NAND nodes to 1/ separate the
>>> controller and the NAND chip and 2/ remove superfluous properties
>>> like this one.
>>
>> Do you have patches ready for that?
> 
> They are in the official tree now, the one I pointed below does this
> for armada-38x which is I think the SoC you use?

Nope, it's a PXA300 based board.

> The way you handle the timings looks good. Having selected_chip to
> NULL will force the next call of ->select_chip() (done by the core,
> please check this happens with a printk) to rewrite the NDTR registers
> with valid saved values.

Okay, so we're on the same page. printk()s indicate that the right thing 
happens, too.

And I now figured out what was going on. The IRQ code for the PXA 
platform failed to restore the masks correctly, so the NAND IRQ was dead 
after resume. I'll wrap up my changes and send some patches for the 
marvell_nand driver.


Thanks for bearing with me :)

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-06 20:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-01 10:18 marvell_nand driver fails to suspend Daniel Mack
2018-07-01 19:04 ` Daniel Mack
2018-07-02  7:20   ` Miquel Raynal
2018-07-02  7:48     ` Daniel Mack
2018-07-03  7:05       ` Miquel Raynal
2018-07-05  8:22         ` Daniel Mack
2018-07-06  8:27           ` Miquel Raynal
2018-07-06  8:41             ` Daniel Mack
2018-07-06  9:02               ` Miquel Raynal
2018-07-06 20:07                 ` Daniel Mack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.