linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep complain for zram
@ 2012-11-21  8:37 Minchan Kim
  2012-11-21 18:06 ` Nitin Gupta
  2012-11-22 11:13 ` Jerome Marchand
  0 siblings, 2 replies; 8+ messages in thread
From: Minchan Kim @ 2012-11-21  8:37 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

Hi alls,

Today, I saw below complain of lockdep.
As a matter of fact, I knew it long time ago but forgot that.
The reason lockdep complains is that now zram uses GFP_KERNEL
in reclaim path(ex, __zram_make_request) :(
I can fix it via replacing GFP_KERNEL with GFP_NOIO.
But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
Of course, I can change it with __vmalloc which can receive gfp_t.
But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
allocation of GFP_KERNEL. That's why I sent the patch.
https://lkml.org/lkml/2012/4/23/77
Since then, I forgot it, saw the bug today and poped the question again.

Yes. Fundamental problem is utter crap API vmalloc.
If we can fix it, everyone would be happy. But life isn't simple like seeing
my thread of the patch.

So next option is to move zram_init_device into setting disksize time.
But it makes unnecessary metadata waste until zram is used really(That's why
Nitin move zram_init_device from disksize setting time to make_request) and
it makes user should set the disksize before using, which are behavior change.

I would like to clean up this issue before promoting because it might change
usage behavior.

Do you have any idea?

============ 8< ==============


[  335.772277] =================================
[  335.772615] [ INFO: inconsistent lock state ]
[  335.772955] 3.7.0-rc6 #162 Tainted: G         C  
[  335.773320] ---------------------------------
[  335.773663] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[  335.774170] kswapd0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  335.774564]  (&zram->init_lock){+++++-}, at: [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
[  335.775321] {RECLAIM_FS-ON-W} state was registered at:
[  335.775716]   [<ffffffff81099532>] mark_held_locks+0x82/0x130
[  335.776009]   [<ffffffff81099c47>] lockdep_trace_alloc+0x67/0xc0
[  335.776009]   [<ffffffff811189d4>] __alloc_pages_nodemask+0x94/0xa00
[  335.776009]   [<ffffffff81152806>] alloc_pages_current+0xb6/0x120
[  335.776009]   [<ffffffff811143d4>] __get_free_pages+0x14/0x50
[  335.776009]   [<ffffffff81157dff>] kmalloc_order_trace+0x3f/0xf0
[  335.776009]   [<ffffffffa0009b1b>] zram_init_device+0x7b/0x220 [zram]
[  335.776009]   [<ffffffffa0009f0a>] zram_make_request+0x24a/0x260 [zram]
[  335.776009]   [<ffffffff81299d8a>] generic_make_request+0xca/0x100
[  335.776009]   [<ffffffff81299e3b>] submit_bio+0x7b/0x160
[  335.776009]   [<ffffffff81197b92>] submit_bh+0xf2/0x120
[  335.776009]   [<ffffffff8119b855>] block_read_full_page+0x235/0x3a0
[  335.776009]   [<ffffffff8119efc8>] blkdev_readpage+0x18/0x20
[  335.776009]   [<ffffffff8111c107>] __do_page_cache_readahead+0x2c7/0x2d0
[  335.776009]   [<ffffffff8111c249>] force_page_cache_readahead+0x79/0xb0
[  335.776009]   [<ffffffff8111c683>] page_cache_sync_readahead+0x43/0x50
[  335.776009]   [<ffffffff81111490>] generic_file_aio_read+0x4f0/0x760
[  335.776009]   [<ffffffff8119ffbb>] blkdev_aio_read+0xbb/0xf0
[  335.776009]   [<ffffffff811661f3>] do_sync_read+0xa3/0xe0
[  335.776009]   [<ffffffff81166910>] vfs_read+0xb0/0x180
[  335.776009]   [<ffffffff81166a32>] sys_read+0x52/0xa0
[  335.776009]   [<ffffffff8155a982>] system_call_fastpath+0x16/0x1b
[  335.776009] irq event stamp: 97589
[  335.776009] hardirqs last  enabled at (97589): [<ffffffff812b3374>] throtl_update_dispatch_stats+0x94/0xf0
[  335.776009] hardirqs last disabled at (97588): [<ffffffff812b332d>] throtl_update_dispatch_stats+0x4d/0xf0
[  335.776009] softirqs last  enabled at (67416): [<ffffffff81046c59>] __do_softirq+0x139/0x280
[  335.776009] softirqs last disabled at (67395): [<ffffffff8155bb0c>] call_softirq+0x1c/0x30
[  335.776009] 
[  335.776009] other info that might help us debug this:
[  335.776009]  Possible unsafe locking scenario:
[  335.776009] 
[  335.776009]        CPU0
[  335.776009]        ----
[  335.776009]   lock(&zram->init_lock);
[  335.776009]   <Interrupt>
[  335.776009]     lock(&zram->init_lock);
[  335.776009] 
[  335.776009]  *** DEADLOCK ***
[  335.776009] 
[  335.776009] no locks held by kswapd0/23.
[  335.776009] 
[  335.776009] stack backtrace:
[  335.776009] Pid: 23, comm: kswapd0 Tainted: G         C   3.7.0-rc6 #162
[  335.776009] Call Trace:
[  335.776009]  [<ffffffff81547ff7>] print_usage_bug+0x1f5/0x206
[  335.776009]  [<ffffffff8100f90f>] ? save_stack_trace+0x2f/0x50
[  335.776009]  [<ffffffff81096a65>] mark_lock+0x295/0x2f0
[  335.776009]  [<ffffffff81095e90>] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0
[  335.776009]  [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
[  335.776009]  [<ffffffff81097024>] __lock_acquire+0x564/0x1c00
[  335.776009]  [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
[  335.776009]  [<ffffffff812b4e32>] ? blk_throtl_bio+0x3c2/0x630
[  335.776009]  [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
[  335.776009]  [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
[  335.776009]  [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
[  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[  335.776009]  [<ffffffff81098c85>] lock_acquire+0x85/0x130
[  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[  335.776009]  [<ffffffff8154f87c>] down_read+0x4c/0x61
[  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
[  335.776009]  [<ffffffff81299ac2>] ? generic_make_request_checks+0x222/0x420
[  335.776009]  [<ffffffff8111a4fe>] ? test_set_page_writeback+0x6e/0x1a0
[  335.776009]  [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
[  335.776009]  [<ffffffff81299d8a>] generic_make_request+0xca/0x100
[  335.776009]  [<ffffffff81299e3b>] submit_bio+0x7b/0x160
[  335.776009]  [<ffffffff81119423>] ? account_page_writeback+0x13/0x20
[  335.776009]  [<ffffffff8111a585>] ? test_set_page_writeback+0xf5/0x1a0
[  335.776009]  [<ffffffff81148aa9>] swap_writepage+0x1b9/0x240
[  335.776009]  [<ffffffff81149bf5>] ? __swap_duplicate+0x65/0x170
[  335.776009]  [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
[  335.776009]  [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
[  335.776009]  [<ffffffff8154f411>] ? __mutex_unlock_slowpath+0xd1/0x160
[  335.776009]  [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
[  335.776009]  [<ffffffff8109977d>] ? trace_hardirqs_on+0xd/0x10
[  335.776009]  [<ffffffff81127195>] shmem_writepage+0x1b5/0x2f0
[  335.776009]  [<ffffffff81122b96>] shrink_page_list+0x516/0x9a0
[  335.776009]  [<ffffffff8111ca20>] ? __activate_page+0x150/0x150
[  335.776009]  [<ffffffff811235c7>] shrink_inactive_list+0x1f7/0x3f0
[  335.776009]  [<ffffffff81123f05>] shrink_lruvec+0x435/0x540
[  335.776009]  [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
[  335.776009]  [<ffffffff81125093>] kswapd+0x703/0xc80
[  335.776009]  [<ffffffff81551dd0>] ? _raw_spin_unlock_irq+0x30/0x50
[  335.776009]  [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
[  335.776009]  [<ffffffff81124990>] ? try_to_free_pages+0x6f0/0x6f0
[  335.776009]  [<ffffffff810646ea>] kthread+0xea/0xf0
[  335.776009]  [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
[  335.776009]  [<ffffffff8155a8dc>] ret_from_fork+0x7c/0xb0
[  335.776009]  [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-21  8:37 Lockdep complain for zram Minchan Kim
@ 2012-11-21 18:06 ` Nitin Gupta
       [not found]   ` <20121122083110.GC5121@bbox>
  2012-11-22 11:13 ` Jerome Marchand
  1 sibling, 1 reply; 8+ messages in thread
From: Nitin Gupta @ 2012-11-21 18:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On 11/21/2012 12:37 AM, Minchan Kim wrote:
> Hi alls,
>
> Today, I saw below complain of lockdep.
> As a matter of fact, I knew it long time ago but forgot that.
> The reason lockdep complains is that now zram uses GFP_KERNEL
> in reclaim path(ex, __zram_make_request) :(
> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> Of course, I can change it with __vmalloc which can receive gfp_t.
> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> allocation of GFP_KERNEL. That's why I sent the patch.
> https://lkml.org/lkml/2012/4/23/77
> Since then, I forgot it, saw the bug today and poped the question again.
>
> Yes. Fundamental problem is utter crap API vmalloc.
> If we can fix it, everyone would be happy. But life isn't simple like seeing
> my thread of the patch.
>
> So next option is to move zram_init_device into setting disksize time.
> But it makes unnecessary metadata waste until zram is used really(That's why
> Nitin move zram_init_device from disksize setting time to make_request) and
> it makes user should set the disksize before using, which are behavior change.
>
> I would like to clean up this issue before promoting because it might change
> usage behavior.
>
> Do you have any idea?
>

Maybe we can alloc_vm_area() right on device creation in create_device() 
assuming the default disksize. If user explicitly sets the disksize, 
this vm area is deallocated and a new one is allocated based on the new 
disksize.  When the device is reset, we should only free physical pages 
allocated for the table and the virtual area should be set back as if 
disksize is set to the default.

At the device init time, all the pages can be allocated with GFP_NOIO | 
__GPF_HIGHMEM and since the VM area is preallocated, map_vm_area() will 
not hit any of those page-table allocations with GFP_KERNEL.

Other allocations made directly from zram, for instance in the partial 
I/O case, should also be changed to GFP_NOIO | __GFP_HIGHMEM.

Thanks,
Nitin

> ============ 8< ==============
>
>
> [  335.772277] =================================
> [  335.772615] [ INFO: inconsistent lock state ]
> [  335.772955] 3.7.0-rc6 #162 Tainted: G         C
> [  335.773320] ---------------------------------
> [  335.773663] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
> [  335.774170] kswapd0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [  335.774564]  (&zram->init_lock){+++++-}, at: [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
> [  335.775321] {RECLAIM_FS-ON-W} state was registered at:
> [  335.775716]   [<ffffffff81099532>] mark_held_locks+0x82/0x130
> [  335.776009]   [<ffffffff81099c47>] lockdep_trace_alloc+0x67/0xc0
> [  335.776009]   [<ffffffff811189d4>] __alloc_pages_nodemask+0x94/0xa00
> [  335.776009]   [<ffffffff81152806>] alloc_pages_current+0xb6/0x120
> [  335.776009]   [<ffffffff811143d4>] __get_free_pages+0x14/0x50
> [  335.776009]   [<ffffffff81157dff>] kmalloc_order_trace+0x3f/0xf0
> [  335.776009]   [<ffffffffa0009b1b>] zram_init_device+0x7b/0x220 [zram]
> [  335.776009]   [<ffffffffa0009f0a>] zram_make_request+0x24a/0x260 [zram]
> [  335.776009]   [<ffffffff81299d8a>] generic_make_request+0xca/0x100
> [  335.776009]   [<ffffffff81299e3b>] submit_bio+0x7b/0x160
> [  335.776009]   [<ffffffff81197b92>] submit_bh+0xf2/0x120
> [  335.776009]   [<ffffffff8119b855>] block_read_full_page+0x235/0x3a0
> [  335.776009]   [<ffffffff8119efc8>] blkdev_readpage+0x18/0x20
> [  335.776009]   [<ffffffff8111c107>] __do_page_cache_readahead+0x2c7/0x2d0
> [  335.776009]   [<ffffffff8111c249>] force_page_cache_readahead+0x79/0xb0
> [  335.776009]   [<ffffffff8111c683>] page_cache_sync_readahead+0x43/0x50
> [  335.776009]   [<ffffffff81111490>] generic_file_aio_read+0x4f0/0x760
> [  335.776009]   [<ffffffff8119ffbb>] blkdev_aio_read+0xbb/0xf0
> [  335.776009]   [<ffffffff811661f3>] do_sync_read+0xa3/0xe0
> [  335.776009]   [<ffffffff81166910>] vfs_read+0xb0/0x180
> [  335.776009]   [<ffffffff81166a32>] sys_read+0x52/0xa0
> [  335.776009]   [<ffffffff8155a982>] system_call_fastpath+0x16/0x1b
> [  335.776009] irq event stamp: 97589
> [  335.776009] hardirqs last  enabled at (97589): [<ffffffff812b3374>] throtl_update_dispatch_stats+0x94/0xf0
> [  335.776009] hardirqs last disabled at (97588): [<ffffffff812b332d>] throtl_update_dispatch_stats+0x4d/0xf0
> [  335.776009] softirqs last  enabled at (67416): [<ffffffff81046c59>] __do_softirq+0x139/0x280
> [  335.776009] softirqs last disabled at (67395): [<ffffffff8155bb0c>] call_softirq+0x1c/0x30
> [  335.776009]
> [  335.776009] other info that might help us debug this:
> [  335.776009]  Possible unsafe locking scenario:
> [  335.776009]
> [  335.776009]        CPU0
> [  335.776009]        ----
> [  335.776009]   lock(&zram->init_lock);
> [  335.776009]   <Interrupt>
> [  335.776009]     lock(&zram->init_lock);
> [  335.776009]
> [  335.776009]  *** DEADLOCK ***
> [  335.776009]
> [  335.776009] no locks held by kswapd0/23.
> [  335.776009]
> [  335.776009] stack backtrace:
> [  335.776009] Pid: 23, comm: kswapd0 Tainted: G         C   3.7.0-rc6 #162
> [  335.776009] Call Trace:
> [  335.776009]  [<ffffffff81547ff7>] print_usage_bug+0x1f5/0x206
> [  335.776009]  [<ffffffff8100f90f>] ? save_stack_trace+0x2f/0x50
> [  335.776009]  [<ffffffff81096a65>] mark_lock+0x295/0x2f0
> [  335.776009]  [<ffffffff81095e90>] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0
> [  335.776009]  [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
> [  335.776009]  [<ffffffff81097024>] __lock_acquire+0x564/0x1c00
> [  335.776009]  [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
> [  335.776009]  [<ffffffff812b4e32>] ? blk_throtl_bio+0x3c2/0x630
> [  335.776009]  [<ffffffff812b4af8>] ? blk_throtl_bio+0x88/0x630
> [  335.776009]  [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
> [  335.776009]  [<ffffffff812a101c>] ? create_task_io_context+0xdc/0x150
> [  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [  335.776009]  [<ffffffff81098c85>] lock_acquire+0x85/0x130
> [  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [  335.776009]  [<ffffffff8154f87c>] down_read+0x4c/0x61
> [  335.776009]  [<ffffffffa0009d0a>] ? zram_make_request+0x4a/0x260 [zram]
> [  335.776009]  [<ffffffff81299ac2>] ? generic_make_request_checks+0x222/0x420
> [  335.776009]  [<ffffffff8111a4fe>] ? test_set_page_writeback+0x6e/0x1a0
> [  335.776009]  [<ffffffffa0009d0a>] zram_make_request+0x4a/0x260 [zram]
> [  335.776009]  [<ffffffff81299d8a>] generic_make_request+0xca/0x100
> [  335.776009]  [<ffffffff81299e3b>] submit_bio+0x7b/0x160
> [  335.776009]  [<ffffffff81119423>] ? account_page_writeback+0x13/0x20
> [  335.776009]  [<ffffffff8111a585>] ? test_set_page_writeback+0xf5/0x1a0
> [  335.776009]  [<ffffffff81148aa9>] swap_writepage+0x1b9/0x240
> [  335.776009]  [<ffffffff81149bf5>] ? __swap_duplicate+0x65/0x170
> [  335.776009]  [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
> [  335.776009]  [<ffffffff8112715a>] ? shmem_writepage+0x17a/0x2f0
> [  335.776009]  [<ffffffff8154f411>] ? __mutex_unlock_slowpath+0xd1/0x160
> [  335.776009]  [<ffffffff810996e5>] ? trace_hardirqs_on_caller+0x105/0x190
> [  335.776009]  [<ffffffff8109977d>] ? trace_hardirqs_on+0xd/0x10
> [  335.776009]  [<ffffffff81127195>] shmem_writepage+0x1b5/0x2f0
> [  335.776009]  [<ffffffff81122b96>] shrink_page_list+0x516/0x9a0
> [  335.776009]  [<ffffffff8111ca20>] ? __activate_page+0x150/0x150
> [  335.776009]  [<ffffffff811235c7>] shrink_inactive_list+0x1f7/0x3f0
> [  335.776009]  [<ffffffff81123f05>] shrink_lruvec+0x435/0x540
> [  335.776009]  [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
> [  335.776009]  [<ffffffff81125093>] kswapd+0x703/0xc80
> [  335.776009]  [<ffffffff81551dd0>] ? _raw_spin_unlock_irq+0x30/0x50
> [  335.776009]  [<ffffffff81065050>] ? __init_waitqueue_head+0x60/0x60
> [  335.776009]  [<ffffffff81124990>] ? try_to_free_pages+0x6f0/0x6f0
> [  335.776009]  [<ffffffff810646ea>] kthread+0xea/0xf0
> [  335.776009]  [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
> [  335.776009]  [<ffffffff8155a8dc>] ret_from_fork+0x7c/0xb0
> [  335.776009]  [<ffffffff81064600>] ? flush_kthread_work+0x1a0/0x1a0
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
       [not found]   ` <20121122083110.GC5121@bbox>
@ 2012-11-22 10:08     ` Nitin Gupta
  2012-11-22 23:20       ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Nitin Gupta @ 2012-11-22 10:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On 11/22/2012 12:31 AM, Minchan Kim wrote:
> Hi Nitin,
>
> On Wed, Nov 21, 2012 at 10:06:33AM -0800, Nitin Gupta wrote:
>> On 11/21/2012 12:37 AM, Minchan Kim wrote:
>>> Hi alls,
>>>
>>> Today, I saw below complain of lockdep.
>>> As a matter of fact, I knew it long time ago but forgot that.
>>> The reason lockdep complains is that now zram uses GFP_KERNEL
>>> in reclaim path(ex, __zram_make_request) :(
>>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
>>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>>> Of course, I can change it with __vmalloc which can receive gfp_t.
>>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
>>> allocation of GFP_KERNEL. That's why I sent the patch.
>>> https://lkml.org/lkml/2012/4/23/77
>>> Since then, I forgot it, saw the bug today and poped the question again.
>>>
>>> Yes. Fundamental problem is utter crap API vmalloc.
>>> If we can fix it, everyone would be happy. But life isn't simple like seeing
>>> my thread of the patch.
>>>
>>> So next option is to move zram_init_device into setting disksize time.
>>> But it makes unnecessary metadata waste until zram is used really(That's why
>>> Nitin move zram_init_device from disksize setting time to make_request) and
>>> it makes user should set the disksize before using, which are behavior change.
>>>
>>> I would like to clean up this issue before promoting because it might change
>>> usage behavior.
>>>
>>> Do you have any idea?
>>>
>>
>> Maybe we can alloc_vm_area() right on device creation in
>> create_device() assuming the default disksize. If user explicitly
>> sets the disksize, this vm area is deallocated and a new one is
>> allocated based on the new disksize.  When the device is reset, we
>> should only free physical pages allocated for the table and the
>> virtual area should be set back as if disksize is set to the
>> default.
>>
>> At the device init time, all the pages can be allocated with
>> GFP_NOIO | __GPF_HIGHMEM and since the VM area is preallocated,
>> map_vm_area() will not hit any of those page-table allocations with
>> GFP_KERNEL.
>>
>> Other allocations made directly from zram, for instance in the
>> partial I/O case, should also be changed to GFP_NOIO |
>> __GFP_HIGHMEM.
>>
>
> Yes. It's a good idea and actually I thought about it.
> My concern about that approach is following as.
>
> 1) User of zram normally do mkfs.xxx or mkswap before using
>     the zram block device(ex, normally, do it at booting time)
>     It ends up allocating such metadata of zram before real usage so
>     benefit of lazy initialzation would be mitigated.
>
> 2) Some user want to use zram when memory pressure is high.(ie, load zram
>     dynamically, NOT booting time). It does make sense because people don't
>     want to waste memory until memory pressure is high(ie, where zram is really
>     helpful time). In this case, lazy initialzation could be failed easily
>     because we will use GFP_NOIO instead of GFP_KERNEL due to swap use-case.
>     So the benefit of lazy initialzation would be mitigated, too.
>
> 3) Current zram's documenation is wrong.
>     Set Disksize isn't optional when we use zram firstly.
>     Once user set disksize, it could be optional, but NOT optional
>     at first usage time. It's very odd behavior. So I think user set to disksizes
>     before using is more safe and clear.
>
> So my suggestion is following as.
>
>   * Let's change disksize setting to MUST before using for consistent behavior.
>   * When user set to disksize, let's allocate metadata all at once.
>     4K : 12 byte(64bit) -> 64G : 192M so 0.3% isn't big overhead.
>     If insane user use such big zram device up to 20, it could consume 6% of ram
>     but efficieny of zram will cover the waste.
>   * If someone has a concern about this, let's guide for him set to disksize
>     right before zram using.
>
> What do you think about it?
>

I agree. This lazy initialization has been a problem since the 
beginning. So, lets just force user to first set the disksize and 
document this behavior as such.

I plan to reduce struct table to just an array of handles (so getting 
rid of size, flag, count fields), but for now we could just use existing 
struct table and do this change later.

Thanks,
Nitin




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-21  8:37 Lockdep complain for zram Minchan Kim
  2012-11-21 18:06 ` Nitin Gupta
@ 2012-11-22 11:13 ` Jerome Marchand
  2012-11-22 23:34   ` Minchan Kim
  1 sibling, 1 reply; 8+ messages in thread
From: Jerome Marchand @ 2012-11-22 11:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Nitin Gupta, Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On 11/21/2012 09:37 AM, Minchan Kim wrote:
> Hi alls,
> 
> Today, I saw below complain of lockdep.
> As a matter of fact, I knew it long time ago but forgot that.
> The reason lockdep complains is that now zram uses GFP_KERNEL
> in reclaim path(ex, __zram_make_request) :(
> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> Of course, I can change it with __vmalloc which can receive gfp_t.
> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> allocation of GFP_KERNEL. That's why I sent the patch.
> https://lkml.org/lkml/2012/4/23/77
> Since then, I forgot it, saw the bug today and poped the question again.
> 
> Yes. Fundamental problem is utter crap API vmalloc.
> If we can fix it, everyone would be happy. But life isn't simple like seeing
> my thread of the patch.
> 
> So next option is to move zram_init_device into setting disksize time.
> But it makes unnecessary metadata waste until zram is used really(That's why
> Nitin move zram_init_device from disksize setting time to make_request) and
> it makes user should set the disksize before using, which are behavior change.
> 
> I would like to clean up this issue before promoting because it might change
> usage behavior.
> 
> Do you have any idea?

This is a false positive due to the memory allocation in
zram_init_device() called from zram_make_request(). It appears to
lockdep that the allocation might trigger a request on the device that
would try to take init_lock again, but in fact it doesn't. The device
is not initialized yet, even less swapped on.

The following (quickly tested) patch should prevent lockdep complain.  

Jerome

---
>From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
From: Jerome Marchand <jmarchan@redhat.com>
Date: Thu, 22 Nov 2012 09:07:40 +0100
Subject: [PATCH] staging: zram: Avoid lockdep warning

zram triggers a lockdep warning. The cause of it is the call to
zram_init_device() from zram_make_request(). The memory allocation in
zram_init_device() could start a memory reclaim which in turn could
cause swapout and (as it appears to lockdep) a call to
zram_make_request(). However this is a false positive: an
unititialized device can't be used as swap.
A solution is to split init_lock in two lock. One mutex that protects
init, reset and size setting and a rw_semaphore that protects requests
and reset. Thus init and request would be protected by different locks
and lockdep will be happy.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 drivers/staging/zram/zram_drv.c   |   41 +++++++++++++++++++-----------------
 drivers/staging/zram/zram_drv.h   |   16 ++++++++++---
 drivers/staging/zram/zram_sysfs.c |   20 +++++++++---------
 3 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index fb4a7c9..b3bc3c4 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	if (unlikely(!zram->init_done) && zram_init_device(zram))
+	if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
 		goto error;
 
-	down_read(&zram->init_lock);
-	if (unlikely(!zram->init_done))
+	down_read(&zram->req_lock);
+	if (unlikely(!is_initialized(zram)))
 		goto error_unlock;
 
 	if (!valid_io_request(zram, bio)) {
@@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 	}
 
 	__zram_make_request(zram, bio, bio_data_dir(bio));
-	up_read(&zram->init_lock);
+	up_read(&zram->req_lock);
 
 	return;
 
 error_unlock:
-	up_read(&zram->init_lock);
+	up_read(&zram->req_lock);
 error:
 	bio_io_error(bio);
 }
@@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
 {
 	size_t index;
 
-	zram->init_done = 0;
+	atomic_set(&zram->init_done, 0);
 
 	/* Free various per-device buffers */
 	kfree(zram->compress_workmem);
@@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
 
 void zram_reset_device(struct zram *zram)
 {
-	down_write(&zram->init_lock);
-	__zram_reset_device(zram);
-	up_write(&zram->init_lock);
+	mutex_lock(&zram->init_lock);
+	down_write(&zram->req_lock);
+	if (is_initialized(zram))
+		__zram_reset_device(zram);
+	up_write(&zram->req_lock);
+	mutex_unlock(&zram->init_lock);
 }
 
 int zram_init_device(struct zram *zram)
@@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
 	int ret;
 	size_t num_pages;
 
-	down_write(&zram->init_lock);
+	mutex_lock(&zram->init_lock);
 
-	if (zram->init_done) {
-		up_write(&zram->init_lock);
+	if (is_initialized(zram)) {
+		mutex_unlock(&zram->init_lock);
 		return 0;
 	}
 
@@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
 		goto fail;
 	}
 
-	zram->init_done = 1;
-	up_write(&zram->init_lock);
+	atomic_set(&zram->init_done, 1);
+	mutex_unlock(&zram->init_lock);
 
 	pr_debug("Initialization done!\n");
 	return 0;
@@ -594,7 +597,7 @@ fail_no_table:
 	zram->disksize = 0;
 fail:
 	__zram_reset_device(zram);
-	up_write(&zram->init_lock);
+	mutex_unlock(&zram->init_lock);
 	pr_err("Initialization failed: err=%d\n", ret);
 	return ret;
 }
@@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = 0;
 
 	init_rwsem(&zram->lock);
-	init_rwsem(&zram->init_lock);
+	mutex_init(&zram->init_lock);
+	init_rwsem(&zram->req_lock);
 	spin_lock_init(&zram->stat64_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
@@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
 		goto out;
 	}
 
-	zram->init_done = 0;
+	atomic_set(&zram->init_done, 0);
 
 out:
 	return ret;
@@ -755,8 +759,7 @@ static void __exit zram_exit(void)
 		zram = &zram_devices[i];
 
 		destroy_device(zram);
-		if (zram->init_done)
-			zram_reset_device(zram);
+		zram_reset_device(zram);
 	}
 
 	unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index df2eec4..f6bcead 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -96,9 +96,12 @@ struct zram {
 				   * against concurrent read and writes */
 	struct request_queue *queue;
 	struct gendisk *disk;
-	int init_done;
-	/* Prevent concurrent execution of device init, reset and R/W request */
-	struct rw_semaphore init_lock;
+	atomic_t init_done;
+	/* Prevent concurrent execution of device init, reset and
+	 * disksize_store */
+	struct mutex init_lock;
+	/* Prevent concurent execution device reset and R/W requests */
+	struct rw_semaphore req_lock;
 	/*
 	 * This is the limit on amount of *uncompressed* worth of data
 	 * we can store in a disk.
@@ -108,6 +111,11 @@ struct zram {
 	struct zram_stats stats;
 };
 
+static inline int is_initialized(struct zram *zram)
+{
+	return atomic_read(&zram->init_done);
+}
+
 extern struct zram *zram_devices;
 unsigned int zram_get_num_devices(void);
 #ifdef CONFIG_SYSFS
@@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
 #endif
 
 extern int zram_init_device(struct zram *zram);
-extern void __zram_reset_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);
 
 #endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index de1eacf..b300881 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
 	if (!disksize)
 		return -EINVAL;
 
-	down_write(&zram->init_lock);
-	if (zram->init_done) {
-		up_write(&zram->init_lock);
+	mutex_lock(&zram->init_lock);
+	down_write(&zram->req_lock);
+	if (is_initialized(zram)) {
+		up_write(&zram->req_lock);
+		mutex_unlock(&zram->init_lock);
 		pr_info("Cannot change disksize for initialized device\n");
 		return -EBUSY;
 	}
 
 	zram->disksize = PAGE_ALIGN(disksize);
 	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	up_write(&zram->init_lock);
+	up_write(&zram->req_lock);
+	mutex_unlock(&zram->init_lock);
 
 	return len;
 }
@@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->init_done);
+	return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
 }
 
 static ssize_t reset_store(struct device *dev,
@@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
 	if (bdev)
 		fsync_bdev(bdev);
 
-	down_write(&zram->init_lock);
-	if (zram->init_done)
-		__zram_reset_device(zram);
-	up_write(&zram->init_lock);
+	zram_reset_device(zram);
 
 	return len;
 }
@@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 	u64 val = 0;
 	struct zram *zram = dev_to_zram(dev);
 
-	if (zram->init_done)
+	if (is_initialized(zram))
 		val = zs_get_total_size_bytes(zram->mem_pool);
 
 	return sprintf(buf, "%llu\n", val);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-22 10:08     ` Nitin Gupta
@ 2012-11-22 23:20       ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2012-11-22 23:20 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On Thu, Nov 22, 2012 at 02:08:43AM -0800, Nitin Gupta wrote:
> On 11/22/2012 12:31 AM, Minchan Kim wrote:
> >Hi Nitin,
> >
> >On Wed, Nov 21, 2012 at 10:06:33AM -0800, Nitin Gupta wrote:
> >>On 11/21/2012 12:37 AM, Minchan Kim wrote:
> >>>Hi alls,
> >>>
> >>>Today, I saw below complain of lockdep.
> >>>As a matter of fact, I knew it long time ago but forgot that.
> >>>The reason lockdep complains is that now zram uses GFP_KERNEL
> >>>in reclaim path(ex, __zram_make_request) :(
> >>>I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >>>But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >>>Of course, I can change it with __vmalloc which can receive gfp_t.
> >>>But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> >>>allocation of GFP_KERNEL. That's why I sent the patch.
> >>>https://lkml.org/lkml/2012/4/23/77
> >>>Since then, I forgot it, saw the bug today and poped the question again.
> >>>
> >>>Yes. Fundamental problem is utter crap API vmalloc.
> >>>If we can fix it, everyone would be happy. But life isn't simple like seeing
> >>>my thread of the patch.
> >>>
> >>>So next option is to move zram_init_device into setting disksize time.
> >>>But it makes unnecessary metadata waste until zram is used really(That's why
> >>>Nitin move zram_init_device from disksize setting time to make_request) and
> >>>it makes user should set the disksize before using, which are behavior change.
> >>>
> >>>I would like to clean up this issue before promoting because it might change
> >>>usage behavior.
> >>>
> >>>Do you have any idea?
> >>>
> >>
> >>Maybe we can alloc_vm_area() right on device creation in
> >>create_device() assuming the default disksize. If user explicitly
> >>sets the disksize, this vm area is deallocated and a new one is
> >>allocated based on the new disksize.  When the device is reset, we
> >>should only free physical pages allocated for the table and the
> >>virtual area should be set back as if disksize is set to the
> >>default.
> >>
> >>At the device init time, all the pages can be allocated with
> >>GFP_NOIO | __GPF_HIGHMEM and since the VM area is preallocated,
> >>map_vm_area() will not hit any of those page-table allocations with
> >>GFP_KERNEL.
> >>
> >>Other allocations made directly from zram, for instance in the
> >>partial I/O case, should also be changed to GFP_NOIO |
> >>__GFP_HIGHMEM.
> >>
> >
> >Yes. It's a good idea and actually I thought about it.
> >My concern about that approach is following as.
> >
> >1) User of zram normally do mkfs.xxx or mkswap before using
> >    the zram block device(ex, normally, do it at booting time)
> >    It ends up allocating such metadata of zram before real usage so
> >    benefit of lazy initialzation would be mitigated.
> >
> >2) Some user want to use zram when memory pressure is high.(ie, load zram
> >    dynamically, NOT booting time). It does make sense because people don't
> >    want to waste memory until memory pressure is high(ie, where zram is really
> >    helpful time). In this case, lazy initialzation could be failed easily
> >    because we will use GFP_NOIO instead of GFP_KERNEL due to swap use-case.
> >    So the benefit of lazy initialzation would be mitigated, too.
> >
> >3) Current zram's documenation is wrong.
> >    Set Disksize isn't optional when we use zram firstly.
> >    Once user set disksize, it could be optional, but NOT optional
> >    at first usage time. It's very odd behavior. So I think user set to disksizes
> >    before using is more safe and clear.
> >
> >So my suggestion is following as.
> >
> >  * Let's change disksize setting to MUST before using for consistent behavior.
> >  * When user set to disksize, let's allocate metadata all at once.
> >    4K : 12 byte(64bit) -> 64G : 192M so 0.3% isn't big overhead.
> >    If insane user use such big zram device up to 20, it could consume 6% of ram
> >    but efficieny of zram will cover the waste.
> >  * If someone has a concern about this, let's guide for him set to disksize
> >    right before zram using.
> >
> >What do you think about it?
> >
> 
> I agree. This lazy initialization has been a problem since the
> beginning. So, lets just force user to first set the disksize and
> document this behavior as such.

Sure.

> 
> I plan to reduce struct table to just an array of handles (so
> getting rid of size, flag, count fields), but for now we could just
> use existing struct table and do this change later.

Good to hear.

Thanks for the quick feedback.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-22 11:13 ` Jerome Marchand
@ 2012-11-22 23:34   ` Minchan Kim
  2012-11-23 14:46     ` Jerome Marchand
  0 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2012-11-22 23:34 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: Nitin Gupta, Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
> On 11/21/2012 09:37 AM, Minchan Kim wrote:
> > Hi alls,
> > 
> > Today, I saw below complain of lockdep.
> > As a matter of fact, I knew it long time ago but forgot that.
> > The reason lockdep complains is that now zram uses GFP_KERNEL
> > in reclaim path(ex, __zram_make_request) :(
> > I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> > But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> > Of course, I can change it with __vmalloc which can receive gfp_t.
> > But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> > allocation of GFP_KERNEL. That's why I sent the patch.
> > https://lkml.org/lkml/2012/4/23/77
> > Since then, I forgot it, saw the bug today and poped the question again.
> > 
> > Yes. Fundamental problem is utter crap API vmalloc.
> > If we can fix it, everyone would be happy. But life isn't simple like seeing
> > my thread of the patch.
> > 
> > So next option is to move zram_init_device into setting disksize time.
> > But it makes unnecessary metadata waste until zram is used really(That's why
> > Nitin move zram_init_device from disksize setting time to make_request) and
> > it makes user should set the disksize before using, which are behavior change.
> > 
> > I would like to clean up this issue before promoting because it might change
> > usage behavior.
> > 
> > Do you have any idea?
> 
> This is a false positive due to the memory allocation in
> zram_init_device() called from zram_make_request(). It appears to
> lockdep that the allocation might trigger a request on the device that
> would try to take init_lock again, but in fact it doesn't. The device
> is not initialized yet, even less swapped on.

That's not a only swap case.
Let's think following usecase.

1) Booting
2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
4) Written 4K page(page-A) is still page cache and isn't submitted
   to zram block device.
5) Memory pressure happen by some memory hogger.
6) VM start to reclaim and write page-A to zram0.
7) zram_init_device is called at last.
8) allocate GFP_KERNEL in zram_init_device
9) goto reclaim path again.
10) deadlock.

So I think it's not false positive.
Even if it is, I think lock split isn't a good idea to just avoid
lockdep warn. It makes code unnecessary complicated and it would be more
error-prone. Let's not add another lock without performance trouble report
by the lock.

As I discussed with Nitin in this thread, lazy initialization don't have
much point and disksize setting option isn't consistent for user behavior.
And I expect Nitin will send patch "diet of table" soonish.

So just moving the initialzation part from reclaim context to process's one
is simple and clear solution, I believe.

> 
> The following (quickly tested) patch should prevent lockdep complain.  
> 
> Jerome
> 
> ---
> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
> From: Jerome Marchand <jmarchan@redhat.com>
> Date: Thu, 22 Nov 2012 09:07:40 +0100
> Subject: [PATCH] staging: zram: Avoid lockdep warning
> 
> zram triggers a lockdep warning. The cause of it is the call to
> zram_init_device() from zram_make_request(). The memory allocation in
> zram_init_device() could start a memory reclaim which in turn could
> cause swapout and (as it appears to lockdep) a call to
> zram_make_request(). However this is a false positive: an
> unititialized device can't be used as swap.
> A solution is to split init_lock in two lock. One mutex that protects
> init, reset and size setting and a rw_semaphore that protects requests
> and reset. Thus init and request would be protected by different locks
> and lockdep will be happy.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  drivers/staging/zram/zram_drv.c   |   41 +++++++++++++++++++-----------------
>  drivers/staging/zram/zram_drv.h   |   16 ++++++++++---
>  drivers/staging/zram/zram_sysfs.c |   20 +++++++++---------
>  3 files changed, 44 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index fb4a7c9..b3bc3c4 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>  	struct zram *zram = queue->queuedata;
>  
> -	if (unlikely(!zram->init_done) && zram_init_device(zram))
> +	if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
>  		goto error;
>  
> -	down_read(&zram->init_lock);
> -	if (unlikely(!zram->init_done))
> +	down_read(&zram->req_lock);
> +	if (unlikely(!is_initialized(zram)))
>  		goto error_unlock;
>  
>  	if (!valid_io_request(zram, bio)) {
> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  	}
>  
>  	__zram_make_request(zram, bio, bio_data_dir(bio));
> -	up_read(&zram->init_lock);
> +	up_read(&zram->req_lock);
>  
>  	return;
>  
>  error_unlock:
> -	up_read(&zram->init_lock);
> +	up_read(&zram->req_lock);
>  error:
>  	bio_io_error(bio);
>  }
> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
>  {
>  	size_t index;
>  
> -	zram->init_done = 0;
> +	atomic_set(&zram->init_done, 0);
>  
>  	/* Free various per-device buffers */
>  	kfree(zram->compress_workmem);
> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
>  
>  void zram_reset_device(struct zram *zram)
>  {
> -	down_write(&zram->init_lock);
> -	__zram_reset_device(zram);
> -	up_write(&zram->init_lock);
> +	mutex_lock(&zram->init_lock);
> +	down_write(&zram->req_lock);
> +	if (is_initialized(zram))
> +		__zram_reset_device(zram);
> +	up_write(&zram->req_lock);
> +	mutex_unlock(&zram->init_lock);
>  }
>  
>  int zram_init_device(struct zram *zram)
> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
>  	int ret;
>  	size_t num_pages;
>  
> -	down_write(&zram->init_lock);
> +	mutex_lock(&zram->init_lock);
>  
> -	if (zram->init_done) {
> -		up_write(&zram->init_lock);
> +	if (is_initialized(zram)) {
> +		mutex_unlock(&zram->init_lock);
>  		return 0;
>  	}
>  
> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
>  		goto fail;
>  	}
>  
> -	zram->init_done = 1;
> -	up_write(&zram->init_lock);
> +	atomic_set(&zram->init_done, 1);
> +	mutex_unlock(&zram->init_lock);
>  
>  	pr_debug("Initialization done!\n");
>  	return 0;
> @@ -594,7 +597,7 @@ fail_no_table:
>  	zram->disksize = 0;
>  fail:
>  	__zram_reset_device(zram);
> -	up_write(&zram->init_lock);
> +	mutex_unlock(&zram->init_lock);
>  	pr_err("Initialization failed: err=%d\n", ret);
>  	return ret;
>  }
> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
>  	int ret = 0;
>  
>  	init_rwsem(&zram->lock);
> -	init_rwsem(&zram->init_lock);
> +	mutex_init(&zram->init_lock);
> +	init_rwsem(&zram->req_lock);
>  	spin_lock_init(&zram->stat64_lock);
>  
>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
>  		goto out;
>  	}
>  
> -	zram->init_done = 0;
> +	atomic_set(&zram->init_done, 0);
>  
>  out:
>  	return ret;
> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
>  		zram = &zram_devices[i];
>  
>  		destroy_device(zram);
> -		if (zram->init_done)
> -			zram_reset_device(zram);
> +		zram_reset_device(zram);
>  	}
>  
>  	unregister_blkdev(zram_major, "zram");
> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
> index df2eec4..f6bcead 100644
> --- a/drivers/staging/zram/zram_drv.h
> +++ b/drivers/staging/zram/zram_drv.h
> @@ -96,9 +96,12 @@ struct zram {
>  				   * against concurrent read and writes */
>  	struct request_queue *queue;
>  	struct gendisk *disk;
> -	int init_done;
> -	/* Prevent concurrent execution of device init, reset and R/W request */
> -	struct rw_semaphore init_lock;
> +	atomic_t init_done;
> +	/* Prevent concurrent execution of device init, reset and
> +	 * disksize_store */
> +	struct mutex init_lock;
> +	/* Prevent concurent execution device reset and R/W requests */
> +	struct rw_semaphore req_lock;
>  	/*
>  	 * This is the limit on amount of *uncompressed* worth of data
>  	 * we can store in a disk.
> @@ -108,6 +111,11 @@ struct zram {
>  	struct zram_stats stats;
>  };
>  
> +static inline int is_initialized(struct zram *zram)
> +{
> +	return atomic_read(&zram->init_done);
> +}
> +
>  extern struct zram *zram_devices;
>  unsigned int zram_get_num_devices(void);
>  #ifdef CONFIG_SYSFS
> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
>  #endif
>  
>  extern int zram_init_device(struct zram *zram);
> -extern void __zram_reset_device(struct zram *zram);
> +extern void zram_reset_device(struct zram *zram);
>  
>  #endif
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index de1eacf..b300881 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
>  	if (!disksize)
>  		return -EINVAL;
>  
> -	down_write(&zram->init_lock);
> -	if (zram->init_done) {
> -		up_write(&zram->init_lock);
> +	mutex_lock(&zram->init_lock);
> +	down_write(&zram->req_lock);
> +	if (is_initialized(zram)) {
> +		up_write(&zram->req_lock);
> +		mutex_unlock(&zram->init_lock);
>  		pr_info("Cannot change disksize for initialized device\n");
>  		return -EBUSY;
>  	}
>  
>  	zram->disksize = PAGE_ALIGN(disksize);
>  	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> -	up_write(&zram->init_lock);
> +	up_write(&zram->req_lock);
> +	mutex_unlock(&zram->init_lock);
>  
>  	return len;
>  }
> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
>  {
>  	struct zram *zram = dev_to_zram(dev);
>  
> -	return sprintf(buf, "%u\n", zram->init_done);
> +	return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
>  }
>  
>  static ssize_t reset_store(struct device *dev,
> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
>  	if (bdev)
>  		fsync_bdev(bdev);
>  
> -	down_write(&zram->init_lock);
> -	if (zram->init_done)
> -		__zram_reset_device(zram);
> -	up_write(&zram->init_lock);
> +	zram_reset_device(zram);
>  
>  	return len;
>  }
> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
>  	u64 val = 0;
>  	struct zram *zram = dev_to_zram(dev);
>  
> -	if (zram->init_done)
> +	if (is_initialized(zram))
>  		val = zs_get_total_size_bytes(zram->mem_pool);
>  
>  	return sprintf(buf, "%llu\n", val);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-22 23:34   ` Minchan Kim
@ 2012-11-23 14:46     ` Jerome Marchand
  2012-11-28  5:20       ` Minchan Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Jerome Marchand @ 2012-11-23 14:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Nitin Gupta, Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On 11/23/2012 12:34 AM, Minchan Kim wrote:
> On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
>> On 11/21/2012 09:37 AM, Minchan Kim wrote:
>>> Hi alls,
>>>
>>> Today, I saw below complain of lockdep.
>>> As a matter of fact, I knew it long time ago but forgot that.
>>> The reason lockdep complains is that now zram uses GFP_KERNEL
>>> in reclaim path(ex, __zram_make_request) :(
>>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
>>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>>> Of course, I can change it with __vmalloc which can receive gfp_t.
>>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
>>> allocation of GFP_KERNEL. That's why I sent the patch.
>>> https://lkml.org/lkml/2012/4/23/77
>>> Since then, I forgot it, saw the bug today and poped the question again.
>>>
>>> Yes. Fundamental problem is utter crap API vmalloc.
>>> If we can fix it, everyone would be happy. But life isn't simple like seeing
>>> my thread of the patch.
>>>
>>> So next option is to move zram_init_device into setting disksize time.
>>> But it makes unnecessary metadata waste until zram is used really(That's why
>>> Nitin move zram_init_device from disksize setting time to make_request) and
>>> it makes user should set the disksize before using, which are behavior change.
>>>
>>> I would like to clean up this issue before promoting because it might change
>>> usage behavior.
>>>
>>> Do you have any idea?
>>
>> This is a false positive due to the memory allocation in
>> zram_init_device() called from zram_make_request(). It appears to
>> lockdep that the allocation might trigger a request on the device that
>> would try to take init_lock again, but in fact it doesn't. The device
>> is not initialized yet, even less swapped on.
> 
> That's not a only swap case.
> Let's think following usecase.
> 
> 1) Booting
> 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
> 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
> 4) Written 4K page(page-A) is still page cache and isn't submitted
>    to zram block device.
> 5) Memory pressure happen by some memory hogger.
> 6) VM start to reclaim and write page-A to zram0.
> 7) zram_init_device is called at last.
> 8) allocate GFP_KERNEL in zram_init_device
> 9) goto reclaim path again.
> 10) deadlock.
> 
> So I think it's not false positive.

I guess you're right. That's a scenario I haven't imagined. At any rate, my
patch fixes that.

> Even if it is, I think lock split isn't a good idea to just avoid
> lockdep warn. It makes code unnecessary complicated and it would be more
> error-prone. Let's not add another lock without performance trouble report
> by the lock.
> 
> As I discussed with Nitin in this thread, lazy initialization don't have
> much point and disksize setting option isn't consistent for user behavior.
> And I expect Nitin will send patch "diet of table" soonish.
> 
> So just moving the initialzation part from reclaim context to process's one
> is simple and clear solution, I believe.

Although that would avoid deadlocks (I guess, I'm not sure anymore...), it
won't stop lockdep from complaining. It still makes an allocation while
holding a lock that is also taken in a reclaim context.
Anyway, I like the idea to removes the lazy initialization. It makes things
more complicated without any actual advantage.

Jerome

> 
>>
>> The following (quickly tested) patch should prevent lockdep complain.  
>>
>> Jerome
>>
>> ---
>> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
>> From: Jerome Marchand <jmarchan@redhat.com>
>> Date: Thu, 22 Nov 2012 09:07:40 +0100
>> Subject: [PATCH] staging: zram: Avoid lockdep warning
>>
>> zram triggers a lockdep warning. The cause of it is the call to
>> zram_init_device() from zram_make_request(). The memory allocation in
>> zram_init_device() could start a memory reclaim which in turn could
>> cause swapout and (as it appears to lockdep) a call to
>> zram_make_request(). However this is a false positive: an
>> unititialized device can't be used as swap.
>> A solution is to split init_lock in two lock. One mutex that protects
>> init, reset and size setting and a rw_semaphore that protects requests
>> and reset. Thus init and request would be protected by different locks
>> and lockdep will be happy.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>> ---
>>  drivers/staging/zram/zram_drv.c   |   41 +++++++++++++++++++-----------------
>>  drivers/staging/zram/zram_drv.h   |   16 ++++++++++---
>>  drivers/staging/zram/zram_sysfs.c |   20 +++++++++---------
>>  3 files changed, 44 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
>> index fb4a7c9..b3bc3c4 100644
>> --- a/drivers/staging/zram/zram_drv.c
>> +++ b/drivers/staging/zram/zram_drv.c
>> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>>  {
>>  	struct zram *zram = queue->queuedata;
>>  
>> -	if (unlikely(!zram->init_done) && zram_init_device(zram))
>> +	if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
>>  		goto error;
>>  
>> -	down_read(&zram->init_lock);
>> -	if (unlikely(!zram->init_done))
>> +	down_read(&zram->req_lock);
>> +	if (unlikely(!is_initialized(zram)))
>>  		goto error_unlock;
>>  
>>  	if (!valid_io_request(zram, bio)) {
>> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>>  	}
>>  
>>  	__zram_make_request(zram, bio, bio_data_dir(bio));
>> -	up_read(&zram->init_lock);
>> +	up_read(&zram->req_lock);
>>  
>>  	return;
>>  
>>  error_unlock:
>> -	up_read(&zram->init_lock);
>> +	up_read(&zram->req_lock);
>>  error:
>>  	bio_io_error(bio);
>>  }
>> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
>>  {
>>  	size_t index;
>>  
>> -	zram->init_done = 0;
>> +	atomic_set(&zram->init_done, 0);
>>  
>>  	/* Free various per-device buffers */
>>  	kfree(zram->compress_workmem);
>> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
>>  
>>  void zram_reset_device(struct zram *zram)
>>  {
>> -	down_write(&zram->init_lock);
>> -	__zram_reset_device(zram);
>> -	up_write(&zram->init_lock);
>> +	mutex_lock(&zram->init_lock);
>> +	down_write(&zram->req_lock);
>> +	if (is_initialized(zram))
>> +		__zram_reset_device(zram);
>> +	up_write(&zram->req_lock);
>> +	mutex_unlock(&zram->init_lock);
>>  }
>>  
>>  int zram_init_device(struct zram *zram)
>> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
>>  	int ret;
>>  	size_t num_pages;
>>  
>> -	down_write(&zram->init_lock);
>> +	mutex_lock(&zram->init_lock);
>>  
>> -	if (zram->init_done) {
>> -		up_write(&zram->init_lock);
>> +	if (is_initialized(zram)) {
>> +		mutex_unlock(&zram->init_lock);
>>  		return 0;
>>  	}
>>  
>> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
>>  		goto fail;
>>  	}
>>  
>> -	zram->init_done = 1;
>> -	up_write(&zram->init_lock);
>> +	atomic_set(&zram->init_done, 1);
>> +	mutex_unlock(&zram->init_lock);
>>  
>>  	pr_debug("Initialization done!\n");
>>  	return 0;
>> @@ -594,7 +597,7 @@ fail_no_table:
>>  	zram->disksize = 0;
>>  fail:
>>  	__zram_reset_device(zram);
>> -	up_write(&zram->init_lock);
>> +	mutex_unlock(&zram->init_lock);
>>  	pr_err("Initialization failed: err=%d\n", ret);
>>  	return ret;
>>  }
>> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
>>  	int ret = 0;
>>  
>>  	init_rwsem(&zram->lock);
>> -	init_rwsem(&zram->init_lock);
>> +	mutex_init(&zram->init_lock);
>> +	init_rwsem(&zram->req_lock);
>>  	spin_lock_init(&zram->stat64_lock);
>>  
>>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
>> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
>>  		goto out;
>>  	}
>>  
>> -	zram->init_done = 0;
>> +	atomic_set(&zram->init_done, 0);
>>  
>>  out:
>>  	return ret;
>> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
>>  		zram = &zram_devices[i];
>>  
>>  		destroy_device(zram);
>> -		if (zram->init_done)
>> -			zram_reset_device(zram);
>> +		zram_reset_device(zram);
>>  	}
>>  
>>  	unregister_blkdev(zram_major, "zram");
>> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
>> index df2eec4..f6bcead 100644
>> --- a/drivers/staging/zram/zram_drv.h
>> +++ b/drivers/staging/zram/zram_drv.h
>> @@ -96,9 +96,12 @@ struct zram {
>>  				   * against concurrent read and writes */
>>  	struct request_queue *queue;
>>  	struct gendisk *disk;
>> -	int init_done;
>> -	/* Prevent concurrent execution of device init, reset and R/W request */
>> -	struct rw_semaphore init_lock;
>> +	atomic_t init_done;
>> +	/* Prevent concurrent execution of device init, reset and
>> +	 * disksize_store */
>> +	struct mutex init_lock;
>> +	/* Prevent concurent execution device reset and R/W requests */
>> +	struct rw_semaphore req_lock;
>>  	/*
>>  	 * This is the limit on amount of *uncompressed* worth of data
>>  	 * we can store in a disk.
>> @@ -108,6 +111,11 @@ struct zram {
>>  	struct zram_stats stats;
>>  };
>>  
>> +static inline int is_initialized(struct zram *zram)
>> +{
>> +	return atomic_read(&zram->init_done);
>> +}
>> +
>>  extern struct zram *zram_devices;
>>  unsigned int zram_get_num_devices(void);
>>  #ifdef CONFIG_SYSFS
>> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
>>  #endif
>>  
>>  extern int zram_init_device(struct zram *zram);
>> -extern void __zram_reset_device(struct zram *zram);
>> +extern void zram_reset_device(struct zram *zram);
>>  
>>  #endif
>> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
>> index de1eacf..b300881 100644
>> --- a/drivers/staging/zram/zram_sysfs.c
>> +++ b/drivers/staging/zram/zram_sysfs.c
>> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
>>  	if (!disksize)
>>  		return -EINVAL;
>>  
>> -	down_write(&zram->init_lock);
>> -	if (zram->init_done) {
>> -		up_write(&zram->init_lock);
>> +	mutex_lock(&zram->init_lock);
>> +	down_write(&zram->req_lock);
>> +	if (is_initialized(zram)) {
>> +		up_write(&zram->req_lock);
>> +		mutex_unlock(&zram->init_lock);
>>  		pr_info("Cannot change disksize for initialized device\n");
>>  		return -EBUSY;
>>  	}
>>  
>>  	zram->disksize = PAGE_ALIGN(disksize);
>>  	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
>> -	up_write(&zram->init_lock);
>> +	up_write(&zram->req_lock);
>> +	mutex_unlock(&zram->init_lock);
>>  
>>  	return len;
>>  }
>> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
>>  {
>>  	struct zram *zram = dev_to_zram(dev);
>>  
>> -	return sprintf(buf, "%u\n", zram->init_done);
>> +	return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
>>  }
>>  
>>  static ssize_t reset_store(struct device *dev,
>> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
>>  	if (bdev)
>>  		fsync_bdev(bdev);
>>  
>> -	down_write(&zram->init_lock);
>> -	if (zram->init_done)
>> -		__zram_reset_device(zram);
>> -	up_write(&zram->init_lock);
>> +	zram_reset_device(zram);
>>  
>>  	return len;
>>  }
>> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
>>  	u64 val = 0;
>>  	struct zram *zram = dev_to_zram(dev);
>>  
>> -	if (zram->init_done)
>> +	if (is_initialized(zram))
>>  		val = zs_get_total_size_bytes(zram->mem_pool);
>>  
>>  	return sprintf(buf, "%llu\n", val);
>> -- 
>> 1.7.7.6
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lockdep complain for zram
  2012-11-23 14:46     ` Jerome Marchand
@ 2012-11-28  5:20       ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2012-11-28  5:20 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: Nitin Gupta, Seth Jennings, linux-mm, linux-kernel, Pekka Enberg,
	Konrad Rzeszutek Wilk, Dan Magenheimer, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Nov 23, 2012 at 03:46:31PM +0100, Jerome Marchand wrote:
> On 11/23/2012 12:34 AM, Minchan Kim wrote:
> > On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote:
> >> On 11/21/2012 09:37 AM, Minchan Kim wrote:
> >>> Hi alls,
> >>>
> >>> Today, I saw below complain of lockdep.
> >>> As a matter of fact, I knew it long time ago but forgot that.
> >>> The reason lockdep complains is that now zram uses GFP_KERNEL
> >>> in reclaim path(ex, __zram_make_request) :(
> >>> I can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >>> Of course, I can change it with __vmalloc which can receive gfp_t.
> >>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
> >>> allocation of GFP_KERNEL. That's why I sent the patch.
> >>> https://lkml.org/lkml/2012/4/23/77
> >>> Since then, I forgot it, saw the bug today and poped the question again.
> >>>
> >>> Yes. Fundamental problem is utter crap API vmalloc.
> >>> If we can fix it, everyone would be happy. But life isn't simple like seeing
> >>> my thread of the patch.
> >>>
> >>> So next option is to move zram_init_device into setting disksize time.
> >>> But it makes unnecessary metadata waste until zram is used really(That's why
> >>> Nitin move zram_init_device from disksize setting time to make_request) and
> >>> it makes user should set the disksize before using, which are behavior change.
> >>>
> >>> I would like to clean up this issue before promoting because it might change
> >>> usage behavior.
> >>>
> >>> Do you have any idea?
> >>
> >> This is a false positive due to the memory allocation in
> >> zram_init_device() called from zram_make_request(). It appears to
> >> lockdep that the allocation might trigger a request on the device that
> >> would try to take init_lock again, but in fact it doesn't. The device
> >> is not initialized yet, even less swapped on.
> > 
> > That's not a only swap case.
> > Let's think following usecase.
> > 
> > 1) Booting
> > 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize
> > 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1
> > 4) Written 4K page(page-A) is still page cache and isn't submitted
> >    to zram block device.
> > 5) Memory pressure happen by some memory hogger.
> > 6) VM start to reclaim and write page-A to zram0.
> > 7) zram_init_device is called at last.
> > 8) allocate GFP_KERNEL in zram_init_device
> > 9) goto reclaim path again.
> > 10) deadlock.
> > 
> > So I think it's not false positive.
> 
> I guess you're right. That's a scenario I haven't imagined. At any rate, my
> patch fixes that.
> 
> > Even if it is, I think lock split isn't a good idea to just avoid
> > lockdep warn. It makes code unnecessary complicated and it would be more
> > error-prone. Let's not add another lock without performance trouble report
> > by the lock.
> > 
> > As I discussed with Nitin in this thread, lazy initialization don't have
> > much point and disksize setting option isn't consistent for user behavior.
> > And I expect Nitin will send patch "diet of table" soonish.
> > 
> > So just moving the initialzation part from reclaim context to process's one
> > is simple and clear solution, I believe.
> 
> Although that would avoid deadlocks (I guess, I'm not sure anymore...), it
> won't stop lockdep from complaining. It still makes an allocation while

Argh, I sent it by mistake anyway, It's false-positive by this patch now.
Anyway we need more patch to shut lockdep up. I just sent patchset.

> holding a lock that is also taken in a reclaim context.
> Anyway, I like the idea to removes the lazy initialization. It makes things
> more complicated without any actual advantage.

Thanks for the review, Jerome.

> 
> Jerome
> 
> > 
> >>
> >> The following (quickly tested) patch should prevent lockdep complain.  
> >>
> >> Jerome
> >>
> >> ---
> >> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001
> >> From: Jerome Marchand <jmarchan@redhat.com>
> >> Date: Thu, 22 Nov 2012 09:07:40 +0100
> >> Subject: [PATCH] staging: zram: Avoid lockdep warning
> >>
> >> zram triggers a lockdep warning. The cause of it is the call to
> >> zram_init_device() from zram_make_request(). The memory allocation in
> >> zram_init_device() could start a memory reclaim which in turn could
> >> cause swapout and (as it appears to lockdep) a call to
> >> zram_make_request(). However this is a false positive: an
> >> unititialized device can't be used as swap.
> >> A solution is to split init_lock in two lock. One mutex that protects
> >> init, reset and size setting and a rw_semaphore that protects requests
> >> and reset. Thus init and request would be protected by different locks
> >> and lockdep will be happy.
> >>
> >> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> >> ---
> >>  drivers/staging/zram/zram_drv.c   |   41 +++++++++++++++++++-----------------
> >>  drivers/staging/zram/zram_drv.h   |   16 ++++++++++---
> >>  drivers/staging/zram/zram_sysfs.c |   20 +++++++++---------
> >>  3 files changed, 44 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> >> index fb4a7c9..b3bc3c4 100644
> >> --- a/drivers/staging/zram/zram_drv.c
> >> +++ b/drivers/staging/zram/zram_drv.c
> >> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >>  {
> >>  	struct zram *zram = queue->queuedata;
> >>  
> >> -	if (unlikely(!zram->init_done) && zram_init_device(zram))
> >> +	if (unlikely(!is_initialized(zram)) && zram_init_device(zram))
> >>  		goto error;
> >>  
> >> -	down_read(&zram->init_lock);
> >> -	if (unlikely(!zram->init_done))
> >> +	down_read(&zram->req_lock);
> >> +	if (unlikely(!is_initialized(zram)))
> >>  		goto error_unlock;
> >>  
> >>  	if (!valid_io_request(zram, bio)) {
> >> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >>  	}
> >>  
> >>  	__zram_make_request(zram, bio, bio_data_dir(bio));
> >> -	up_read(&zram->init_lock);
> >> +	up_read(&zram->req_lock);
> >>  
> >>  	return;
> >>  
> >>  error_unlock:
> >> -	up_read(&zram->init_lock);
> >> +	up_read(&zram->req_lock);
> >>  error:
> >>  	bio_io_error(bio);
> >>  }
> >> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram)
> >>  {
> >>  	size_t index;
> >>  
> >> -	zram->init_done = 0;
> >> +	atomic_set(&zram->init_done, 0);
> >>  
> >>  	/* Free various per-device buffers */
> >>  	kfree(zram->compress_workmem);
> >> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram)
> >>  
> >>  void zram_reset_device(struct zram *zram)
> >>  {
> >> -	down_write(&zram->init_lock);
> >> -	__zram_reset_device(zram);
> >> -	up_write(&zram->init_lock);
> >> +	mutex_lock(&zram->init_lock);
> >> +	down_write(&zram->req_lock);
> >> +	if (is_initialized(zram))
> >> +		__zram_reset_device(zram);
> >> +	up_write(&zram->req_lock);
> >> +	mutex_unlock(&zram->init_lock);
> >>  }
> >>  
> >>  int zram_init_device(struct zram *zram)
> >> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram)
> >>  	int ret;
> >>  	size_t num_pages;
> >>  
> >> -	down_write(&zram->init_lock);
> >> +	mutex_lock(&zram->init_lock);
> >>  
> >> -	if (zram->init_done) {
> >> -		up_write(&zram->init_lock);
> >> +	if (is_initialized(zram)) {
> >> +		mutex_unlock(&zram->init_lock);
> >>  		return 0;
> >>  	}
> >>  
> >> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram)
> >>  		goto fail;
> >>  	}
> >>  
> >> -	zram->init_done = 1;
> >> -	up_write(&zram->init_lock);
> >> +	atomic_set(&zram->init_done, 1);
> >> +	mutex_unlock(&zram->init_lock);
> >>  
> >>  	pr_debug("Initialization done!\n");
> >>  	return 0;
> >> @@ -594,7 +597,7 @@ fail_no_table:
> >>  	zram->disksize = 0;
> >>  fail:
> >>  	__zram_reset_device(zram);
> >> -	up_write(&zram->init_lock);
> >> +	mutex_unlock(&zram->init_lock);
> >>  	pr_err("Initialization failed: err=%d\n", ret);
> >>  	return ret;
> >>  }
> >> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id)
> >>  	int ret = 0;
> >>  
> >>  	init_rwsem(&zram->lock);
> >> -	init_rwsem(&zram->init_lock);
> >> +	mutex_init(&zram->init_lock);
> >> +	init_rwsem(&zram->req_lock);
> >>  	spin_lock_init(&zram->stat64_lock);
> >>  
> >>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
> >> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id)
> >>  		goto out;
> >>  	}
> >>  
> >> -	zram->init_done = 0;
> >> +	atomic_set(&zram->init_done, 0);
> >>  
> >>  out:
> >>  	return ret;
> >> @@ -755,8 +759,7 @@ static void __exit zram_exit(void)
> >>  		zram = &zram_devices[i];
> >>  
> >>  		destroy_device(zram);
> >> -		if (zram->init_done)
> >> -			zram_reset_device(zram);
> >> +		zram_reset_device(zram);
> >>  	}
> >>  
> >>  	unregister_blkdev(zram_major, "zram");
> >> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
> >> index df2eec4..f6bcead 100644
> >> --- a/drivers/staging/zram/zram_drv.h
> >> +++ b/drivers/staging/zram/zram_drv.h
> >> @@ -96,9 +96,12 @@ struct zram {
> >>  				   * against concurrent read and writes */
> >>  	struct request_queue *queue;
> >>  	struct gendisk *disk;
> >> -	int init_done;
> >> -	/* Prevent concurrent execution of device init, reset and R/W request */
> >> -	struct rw_semaphore init_lock;
> >> +	atomic_t init_done;
> >> +	/* Prevent concurrent execution of device init, reset and
> >> +	 * disksize_store */
> >> +	struct mutex init_lock;
> >> +	/* Prevent concurent execution device reset and R/W requests */
> >> +	struct rw_semaphore req_lock;
> >>  	/*
> >>  	 * This is the limit on amount of *uncompressed* worth of data
> >>  	 * we can store in a disk.
> >> @@ -108,6 +111,11 @@ struct zram {
> >>  	struct zram_stats stats;
> >>  };
> >>  
> >> +static inline int is_initialized(struct zram *zram)
> >> +{
> >> +	return atomic_read(&zram->init_done);
> >> +}
> >> +
> >>  extern struct zram *zram_devices;
> >>  unsigned int zram_get_num_devices(void);
> >>  #ifdef CONFIG_SYSFS
> >> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group;
> >>  #endif
> >>  
> >>  extern int zram_init_device(struct zram *zram);
> >> -extern void __zram_reset_device(struct zram *zram);
> >> +extern void zram_reset_device(struct zram *zram);
> >>  
> >>  #endif
> >> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> >> index de1eacf..b300881 100644
> >> --- a/drivers/staging/zram/zram_sysfs.c
> >> +++ b/drivers/staging/zram/zram_sysfs.c
> >> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev,
> >>  	if (!disksize)
> >>  		return -EINVAL;
> >>  
> >> -	down_write(&zram->init_lock);
> >> -	if (zram->init_done) {
> >> -		up_write(&zram->init_lock);
> >> +	mutex_lock(&zram->init_lock);
> >> +	down_write(&zram->req_lock);
> >> +	if (is_initialized(zram)) {
> >> +		up_write(&zram->req_lock);
> >> +		mutex_unlock(&zram->init_lock);
> >>  		pr_info("Cannot change disksize for initialized device\n");
> >>  		return -EBUSY;
> >>  	}
> >>  
> >>  	zram->disksize = PAGE_ALIGN(disksize);
> >>  	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> >> -	up_write(&zram->init_lock);
> >> +	up_write(&zram->req_lock);
> >> +	mutex_unlock(&zram->init_lock);
> >>  
> >>  	return len;
> >>  }
> >> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev,
> >>  {
> >>  	struct zram *zram = dev_to_zram(dev);
> >>  
> >> -	return sprintf(buf, "%u\n", zram->init_done);
> >> +	return sprintf(buf, "%u\n", atomic_read(&zram->init_done));
> >>  }
> >>  
> >>  static ssize_t reset_store(struct device *dev,
> >> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev,
> >>  	if (bdev)
> >>  		fsync_bdev(bdev);
> >>  
> >> -	down_write(&zram->init_lock);
> >> -	if (zram->init_done)
> >> -		__zram_reset_device(zram);
> >> -	up_write(&zram->init_lock);
> >> +	zram_reset_device(zram);
> >>  
> >>  	return len;
> >>  }
> >> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev,
> >>  	u64 val = 0;
> >>  	struct zram *zram = dev_to_zram(dev);
> >>  
> >> -	if (zram->init_done)
> >> +	if (is_initialized(zram))
> >>  		val = zs_get_total_size_bytes(zram->mem_pool);
> >>  
> >>  	return sprintf(buf, "%llu\n", val);
> >> -- 
> >> 1.7.7.6
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-11-28  5:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-21  8:37 Lockdep complain for zram Minchan Kim
2012-11-21 18:06 ` Nitin Gupta
     [not found]   ` <20121122083110.GC5121@bbox>
2012-11-22 10:08     ` Nitin Gupta
2012-11-22 23:20       ` Minchan Kim
2012-11-22 11:13 ` Jerome Marchand
2012-11-22 23:34   ` Minchan Kim
2012-11-23 14:46     ` Jerome Marchand
2012-11-28  5:20       ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).