All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm bufio: fix deadlock issue with loop device
@ 2019-07-02 23:14 Junxiao Bi
  2019-07-03 16:21 ` Mike Snitzer
  2019-08-08  9:13 ` [PATCH] " Mikulas Patocka
  0 siblings, 2 replies; 8+ messages in thread
From: Junxiao Bi @ 2019-07-02 23:14 UTC (permalink / raw)
  To: dm-devel; +Cc: honglei.wang, mpatocka, agk, snitzer

The following deadlock was caputred on 4.1, since dm_bufio_shrink_count
still had bufio lock acquired, this was already fixed by mainline. But
shrinker will also invoke dm_bufio_shrink_scan by ->scan_objects, so
looks like mainline will suffer the same deadlock issue.

This deadlock happened when both kswapd0 and loop1 were shrinking memory,
kswapd0 hold bufio lock and waiting for an in-flight io done, but it will
never done because loop1 who was issuing the io was hung by the same lock
hold by kswapd0. This was ABBA deadlock.

The gfp_flags used in direct IO is GFP_KERNEL, so checking GFP_FS/IO won't
work, just stop shrinking if lock was hold by others.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

  PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
   #0 [ffff88272f5af228] __schedule at ffffffff8173f405
   #1 [ffff88272f5af280] schedule at ffffffff8173fa27
   #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
   #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
   #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
   #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
   #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
   #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
   #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
   #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
  #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
  #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
  #12 [ffff88272f5af760] new_slab at ffffffff811f4523
  #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
  #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
  #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
  #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
  #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
  #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
  #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
  #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
  #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
  #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
  #23 [ffff88272f5afec0] kthread at ffffffff810a8428
  #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 drivers/md/dm-bufio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 2a48ea3f1b30..b6b5acc92ca2 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1599,9 +1599,7 @@ dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	unsigned long freed;
 
 	c = container_of(shrink, struct dm_bufio_client, shrinker);
-	if (sc->gfp_mask & __GFP_FS)
-		dm_bufio_lock(c);
-	else if (!dm_bufio_trylock(c))
+	if (!dm_bufio_trylock(c))
 		return SHRINK_STOP;
 
 	freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-02 23:14 [PATCH] dm bufio: fix deadlock issue with loop device Junxiao Bi
@ 2019-07-03 16:21 ` Mike Snitzer
  2019-07-03 17:19   ` Junxiao Bi
  2019-08-08  9:13 ` [PATCH] " Mikulas Patocka
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2019-07-03 16:21 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: honglei.wang, dm-devel, mpatocka, agk

On Tue, Jul 02 2019 at  7:14pm -0400,
Junxiao Bi <junxiao.bi@oracle.com> wrote:

> The following deadlock was caputred on 4.1, since dm_bufio_shrink_count
> still had bufio lock acquired, this was already fixed by mainline. But
> shrinker will also invoke dm_bufio_shrink_scan by ->scan_objects, so
> looks like mainline will suffer the same deadlock issue.
> 
> This deadlock happened when both kswapd0 and loop1 were shrinking memory,
> kswapd0 hold bufio lock and waiting for an in-flight io done, but it will
> never done because loop1 who was issuing the io was hung by the same lock
> hold by kswapd0. This was ABBA deadlock.
> 
> The gfp_flags used in direct IO is GFP_KERNEL, so checking GFP_FS/IO won't
> work, just stop shrinking if lock was hold by others.
> 
> PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
>    #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
>    #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
>    #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
>    #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
>    #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
>    #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
>    #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
>    #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
>    #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
>    #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
>   #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
>   #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
>   #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
>   #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
>   #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
> 
>   PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
>    #0 [ffff88272f5af228] __schedule at ffffffff8173f405
>    #1 [ffff88272f5af280] schedule at ffffffff8173fa27
>    #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
>    #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
>    #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
>    #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
>    #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
>    #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
>    #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
>    #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
>   #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
>   #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
>   #12 [ffff88272f5af760] new_slab at ffffffff811f4523
>   #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
>   #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
>   #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
>   #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
>   #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
>   #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
>   #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
>   #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
>   #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
>   #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
>   #23 [ffff88272f5afec0] kthread at ffffffff810a8428
>   #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>  drivers/md/dm-bufio.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> index 2a48ea3f1b30..b6b5acc92ca2 100644
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -1599,9 +1599,7 @@ dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	unsigned long freed;
>  
>  	c = container_of(shrink, struct dm_bufio_client, shrinker);
> -	if (sc->gfp_mask & __GFP_FS)
> -		dm_bufio_lock(c);
> -	else if (!dm_bufio_trylock(c))
> +	if (!dm_bufio_trylock(c))
>  		return SHRINK_STOP;
>  
>  	freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
> -- 
> 2.17.1
> 

I don't follow how this fixes the direct IO to DM device ontop of loop
case given that you're saying __GFP_FS will not have been set by the
direct IO path.  In that case it should resort to the trylock anyway,
no?

We need a reproducer in the context of the latest upstream kernel code,
not some 4.1 branch point for an Oracle kernel.

Please submit with a less conflated patch header that has a description
of the bufio issue that the upstream kernel has.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-03 16:21 ` Mike Snitzer
@ 2019-07-03 17:19   ` Junxiao Bi
  2019-07-05 20:24     ` Junxiao Bi
  0 siblings, 1 reply; 8+ messages in thread
From: Junxiao Bi @ 2019-07-03 17:19 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: honglei.wang, dm-devel, mpatocka, agk

Hi Mike,

Thanks for reviewing, see comments inlined.

On 7/3/19 9:21 AM, Mike Snitzer wrote:
> On Tue, Jul 02 2019 at  7:14pm -0400,
> Junxiao Bi <junxiao.bi@oracle.com> wrote:
>
>> The following deadlock was caputred on 4.1, since dm_bufio_shrink_count
>> still had bufio lock acquired, this was already fixed by mainline. But
>> shrinker will also invoke dm_bufio_shrink_scan by ->scan_objects, so
>> looks like mainline will suffer the same deadlock issue.
>>
>> This deadlock happened when both kswapd0 and loop1 were shrinking memory,
>> kswapd0 hold bufio lock and waiting for an in-flight io done, but it will
>> never done because loop1 who was issuing the io was hung by the same lock
>> hold by kswapd0. This was ABBA deadlock.
>>
>> The gfp_flags used in direct IO is GFP_KERNEL, so checking GFP_FS/IO won't
>> work, just stop shrinking if lock was hold by others.
>>
>> PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
>>     #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
>>     #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
>>     #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
>>     #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
>>     #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
>>     #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
>>     #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
>>     #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
>>     #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
>>     #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
>>    #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
>>    #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
>>    #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
>>    #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
>>    #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
>>
>>    PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
>>     #0 [ffff88272f5af228] __schedule at ffffffff8173f405
>>     #1 [ffff88272f5af280] schedule at ffffffff8173fa27
>>     #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
>>     #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
>>     #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
>>     #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
>>     #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
>>     #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
>>     #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
>>     #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
>>    #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
>>    #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
>>    #12 [ffff88272f5af760] new_slab at ffffffff811f4523
>>    #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
>>    #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
>>    #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
>>    #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
>>    #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
>>    #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
>>    #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
>>    #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
>>    #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
>>    #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
>>    #23 [ffff88272f5afec0] kthread at ffffffff810a8428
>>    #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> ---
>>   drivers/md/dm-bufio.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>> index 2a48ea3f1b30..b6b5acc92ca2 100644
>> --- a/drivers/md/dm-bufio.c
>> +++ b/drivers/md/dm-bufio.c
>> @@ -1599,9 +1599,7 @@ dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>   	unsigned long freed;
>>   
>>   	c = container_of(shrink, struct dm_bufio_client, shrinker);
>> -	if (sc->gfp_mask & __GFP_FS)
>> -		dm_bufio_lock(c);
>> -	else if (!dm_bufio_trylock(c))
>> +	if (!dm_bufio_trylock(c))
>>   		return SHRINK_STOP;
>>   
>>   	freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
>> -- 
>> 2.17.1
>>
> I don't follow how this fixes the direct IO to DM device ontop of loop
> case given that you're saying __GFP_FS will not have been set by the
> direct IO path.  In that case it should resort to the trylock anyway,
> no?
See the call trace of loop, in do_blockdev_direct_IO(), to alloc dio, 
GFP_KERNEL was used, __GFP_FS was implied by it. So it hung by bufio lock.
>
> We need a reproducer in the context of the latest upstream kernel code,
> not some 4.1 branch point for an Oracle kernel.
>
> Please submit with a less conflated patch header that has a description
> of the bufio issue that the upstream kernel has.
The call trace is just to give an example of the deadlock. Mainline 
didn't use lock in dm_bufio_shrink_count, but it does use in

dm_bufio_shrink_scan which will be invoked by the memory shrinker in the same context by loop, so it will suffer the same deadlock.
This is hard to reproduce and customer help us reproduced it, we don't reproduce it.

Thanks,
Junxiao

>
> Thanks,
> Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-03 17:19   ` Junxiao Bi
@ 2019-07-05 20:24     ` Junxiao Bi
  2019-07-08 14:14       ` Mike Snitzer
  0 siblings, 1 reply; 8+ messages in thread
From: Junxiao Bi @ 2019-07-05 20:24 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: honglei.wang, dm-devel, mpatocka, agk

Hi Mike,

Do i make sense on this?

Thanks,

Junxiao.

On 7/3/19 10:19 AM, Junxiao Bi wrote:
> Hi Mike,
>
> Thanks for reviewing, see comments inlined.
>
> On 7/3/19 9:21 AM, Mike Snitzer wrote:
>> On Tue, Jul 02 2019 at  7:14pm -0400,
>> Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>
>>> The following deadlock was caputred on 4.1, since dm_bufio_shrink_count
>>> still had bufio lock acquired, this was already fixed by mainline. But
>>> shrinker will also invoke dm_bufio_shrink_scan by ->scan_objects, so
>>> looks like mainline will suffer the same deadlock issue.
>>>
>>> This deadlock happened when both kswapd0 and loop1 were shrinking 
>>> memory,
>>> kswapd0 hold bufio lock and waiting for an in-flight io done, but it 
>>> will
>>> never done because loop1 who was issuing the io was hung by the same 
>>> lock
>>> hold by kswapd0. This was ABBA deadlock.
>>>
>>> The gfp_flags used in direct IO is GFP_KERNEL, so checking GFP_FS/IO 
>>> won't
>>> work, just stop shrinking if lock was hold by others.
>>>
>>> PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
>>>     #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
>>>     #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
>>>     #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
>>>     #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
>>>     #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
>>>     #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
>>>     #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
>>>     #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f 
>>> [dm_bufio]
>>>     #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 
>>> [dm_bufio]
>>>     #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 
>>> [dm_bufio]
>>>    #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
>>>    #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
>>>    #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
>>>    #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
>>>    #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
>>>
>>>    PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
>>>     #0 [ffff88272f5af228] __schedule at ffffffff8173f405
>>>     #1 [ffff88272f5af280] schedule at ffffffff8173fa27
>>>     #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
>>>     #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
>>>     #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
>>>     #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 
>>> [dm_bufio]
>>>     #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
>>>     #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
>>>     #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
>>>     #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
>>>    #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
>>>    #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
>>>    #12 [ffff88272f5af760] new_slab at ffffffff811f4523
>>>    #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
>>>    #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
>>>    #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
>>>    #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
>>>    #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
>>>    #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
>>>    #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at 
>>> ffffffffa020c970 [xfs]
>>>    #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
>>>    #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
>>>    #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
>>>    #23 [ffff88272f5afec0] kthread at ffffffff810a8428
>>>    #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> ---
>>>   drivers/md/dm-bufio.c | 4 +---
>>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>>> index 2a48ea3f1b30..b6b5acc92ca2 100644
>>> --- a/drivers/md/dm-bufio.c
>>> +++ b/drivers/md/dm-bufio.c
>>> @@ -1599,9 +1599,7 @@ dm_bufio_shrink_scan(struct shrinker *shrink, 
>>> struct shrink_control *sc)
>>>       unsigned long freed;
>>>         c = container_of(shrink, struct dm_bufio_client, shrinker);
>>> -    if (sc->gfp_mask & __GFP_FS)
>>> -        dm_bufio_lock(c);
>>> -    else if (!dm_bufio_trylock(c))
>>> +    if (!dm_bufio_trylock(c))
>>>           return SHRINK_STOP;
>>>         freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
>>> -- 
>>> 2.17.1
>>>
>> I don't follow how this fixes the direct IO to DM device ontop of loop
>> case given that you're saying __GFP_FS will not have been set by the
>> direct IO path.  In that case it should resort to the trylock anyway,
>> no?
> See the call trace of loop, in do_blockdev_direct_IO(), to alloc dio, 
> GFP_KERNEL was used, __GFP_FS was implied by it. So it hung by bufio 
> lock.
>>
>> We need a reproducer in the context of the latest upstream kernel code,
>> not some 4.1 branch point for an Oracle kernel.
>>
>> Please submit with a less conflated patch header that has a description
>> of the bufio issue that the upstream kernel has.
> The call trace is just to give an example of the deadlock. Mainline 
> didn't use lock in dm_bufio_shrink_count, but it does use in
>
> dm_bufio_shrink_scan which will be invoked by the memory shrinker in 
> the same context by loop, so it will suffer the same deadlock.
> This is hard to reproduce and customer help us reproduced it, we don't 
> reproduce it.
>
> Thanks,
> Junxiao
>
>>
>> Thanks,
>> Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-05 20:24     ` Junxiao Bi
@ 2019-07-08 14:14       ` Mike Snitzer
  2019-07-08 23:54         ` Junxiao Bi
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2019-07-08 14:14 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: honglei.wang, dm-devel, mpatocka, agk

On Fri, Jul 05 2019 at  4:24pm -0400,
Junxiao Bi <junxiao.bi@oracle.com> wrote:

> Hi Mike,
> 
> Do i make sense on this?

No, you haven't made your chase for this change.  Sorry.

Please refine the patch header to _not_ get into context you have from
a vendor kernel.  I know you say this is hard to reproduce, etc.  But
you don't even get into ther usecase where the issue was seen.  Was this
DM thinp?  DM cache?  Something else?

Please be as concise and precise as possible.  Saying that shrinker is
the same context as loop doesn't imply a whole lot to me (relative to
why this is prone to deadlock).

To restate my concern: if __GFP_FS isn't set then why does your patch
help at all?  If __GFP_FS is set, then that changes things..

Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-08 14:14       ` Mike Snitzer
@ 2019-07-08 23:54         ` Junxiao Bi
  2019-07-09  0:15           ` Mike Snitzer
  0 siblings, 1 reply; 8+ messages in thread
From: Junxiao Bi @ 2019-07-08 23:54 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: honglei.wang, dm-devel, mpatocka, agk

On 7/8/19 7:14 AM, Mike Snitzer wrote:

> On Fri, Jul 05 2019 at  4:24pm -0400,
> Junxiao Bi <junxiao.bi@oracle.com> wrote:
>
>> Hi Mike,
>>
>> Do i make sense on this?
> No, you haven't made your chase for this change.  Sorry.
>
> Please refine the patch header to _not_ get into context you have from
> a vendor kernel.  I know you say this is hard to reproduce, etc.
Thanks, I will refine it in v2.
> But
> you don't even get into ther usecase where the issue was seen.  Was this
> DM thinp?  DM cache?  Something else?
it's thin-provision target. Customer is using docker.
>
> Please be as concise and precise as possible.  Saying that shrinker is
> the same context as loop doesn't imply a whole lot to me (relative to
> why this is prone to deadlock).
>
> To restate my concern: if __GFP_FS isn't set then why does your patch
> help at all?  If __GFP_FS is set, then that changes things..

If __GFP_FS isn't set, the behavior is the same with w/o this patch. If 
it is set and the mutex was already hold by others, shrinker stop, 
deadlock avoid.

Thanks,

Junxiao.

>
> Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm bufio: fix deadlock issue with loop device
  2019-07-08 23:54         ` Junxiao Bi
@ 2019-07-09  0:15           ` Mike Snitzer
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Snitzer @ 2019-07-09  0:15 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: honglei.wang, dm-devel, mpatocka, agk

On Mon, Jul 08 2019 at  7:54pm -0400,
Junxiao Bi <junxiao.bi@oracle.com> wrote:

> On 7/8/19 7:14 AM, Mike Snitzer wrote:
> 
> >On Fri, Jul 05 2019 at  4:24pm -0400,
> >Junxiao Bi <junxiao.bi@oracle.com> wrote:
> >
> >>Hi Mike,
> >>
> >>Do i make sense on this?
> >No, you haven't made your chase for this change.  Sorry.
> >
> >Please refine the patch header to _not_ get into context you have from
> >a vendor kernel.  I know you say this is hard to reproduce, etc.
> Thanks, I will refine it in v2.
> >But
> >you don't even get into ther usecase where the issue was seen.  Was this
> >DM thinp?  DM cache?  Something else?
> it's thin-provision target. Customer is using docker.

OK, with loop files? (really hackish and poor performing but loopback
enabled the ability to not reinstall, or plan ahead, caused a lot of
people to use it... that is until overlayfs arrived)

> >Please be as concise and precise as possible.  Saying that shrinker is
> >the same context as loop doesn't imply a whole lot to me (relative to
> >why this is prone to deadlock).
> >
> >To restate my concern: if __GFP_FS isn't set then why does your patch
> >help at all?  If __GFP_FS is set, then that changes things..
> 
> If __GFP_FS isn't set, the behavior is the same with w/o this patch.

Yes.

> If it is set and the mutex was already hold by others, shrinker
> stop, deadlock avoid.

Fine, please explain how that happens in the context of existing
upstream code.  Please make the case for fixing upstream.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm bufio: fix deadlock issue with loop device
  2019-07-02 23:14 [PATCH] dm bufio: fix deadlock issue with loop device Junxiao Bi
  2019-07-03 16:21 ` Mike Snitzer
@ 2019-08-08  9:13 ` Mikulas Patocka
  1 sibling, 0 replies; 8+ messages in thread
From: Mikulas Patocka @ 2019-08-08  9:13 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: honglei.wang, dm-devel, agk, snitzer

Hi

The is not a bug in dm-bufio.

>   #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
>   #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
>   #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
>   #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
>   #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
>   #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
>   #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
>   #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
>   #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
>   #23 [ffff88272f5afec0] kthread at ffffffff810a8428
>   #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242

The problem is that in this stacktrace the kernel calls kmem_cache_alloc 
with GFP_KERNEL even though we are in a filesystem driver and block 
driver.

The proper fix is to change do_blockdev_direct_IO to use the GFP_NOIO 
flag.

Mikulas



On Tue, 2 Jul 2019, Junxiao Bi wrote:

> The following deadlock was caputred on 4.1, since dm_bufio_shrink_count
> still had bufio lock acquired, this was already fixed by mainline. But
> shrinker will also invoke dm_bufio_shrink_scan by ->scan_objects, so
> looks like mainline will suffer the same deadlock issue.
> 
> This deadlock happened when both kswapd0 and loop1 were shrinking memory,
> kswapd0 hold bufio lock and waiting for an in-flight io done, but it will
> never done because loop1 who was issuing the io was hung by the same lock
> hold by kswapd0. This was ABBA deadlock.
> 
> The gfp_flags used in direct IO is GFP_KERNEL, so checking GFP_FS/IO won't
> work, just stop shrinking if lock was hold by others.
> 
> PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
>    #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
>    #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
>    #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
>    #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
>    #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
>    #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
>    #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
>    #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
>    #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
>    #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
>   #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
>   #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
>   #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
>   #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
>   #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
> 
>   PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
>    #0 [ffff88272f5af228] __schedule at ffffffff8173f405
>    #1 [ffff88272f5af280] schedule at ffffffff8173fa27
>    #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
>    #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
>    #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
>    #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
>    #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
>    #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
>    #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
>    #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
>   #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
>   #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
>   #12 [ffff88272f5af760] new_slab at ffffffff811f4523
>   #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
>   #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
>   #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
>   #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
>   #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
>   #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
>   #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
>   #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
>   #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
>   #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
>   #23 [ffff88272f5afec0] kthread at ffffffff810a8428
>   #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>  drivers/md/dm-bufio.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> index 2a48ea3f1b30..b6b5acc92ca2 100644
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -1599,9 +1599,7 @@ dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	unsigned long freed;
>  
>  	c = container_of(shrink, struct dm_bufio_client, shrinker);
> -	if (sc->gfp_mask & __GFP_FS)
> -		dm_bufio_lock(c);
> -	else if (!dm_bufio_trylock(c))
> +	if (!dm_bufio_trylock(c))
>  		return SHRINK_STOP;
>  
>  	freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-08  9:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-02 23:14 [PATCH] dm bufio: fix deadlock issue with loop device Junxiao Bi
2019-07-03 16:21 ` Mike Snitzer
2019-07-03 17:19   ` Junxiao Bi
2019-07-05 20:24     ` Junxiao Bi
2019-07-08 14:14       ` Mike Snitzer
2019-07-08 23:54         ` Junxiao Bi
2019-07-09  0:15           ` Mike Snitzer
2019-08-08  9:13 ` [PATCH] " Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.