linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* blk_mq_freeze_queue hang and possible race in percpu-refcount
@ 2018-03-13 22:50 David Chen
  2018-03-14 15:43 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: David Chen @ 2018-03-13 22:50 UTC (permalink / raw)
  To: Tejun Heo, Shaohua Li, lkml

Hi Tejun,

We recently hit an issue where several tasks hung in blk_mq_freeze_queue_wait.
All the task have the same stack trace as the one below, which is
doing fput on loop device.
Which is strange because the task should be no more than a open(2), a
couple pread(2),
and close(2) on a loop device.

====
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool           D    0 32416  32156 0x00000080
ffff967e4b0a1740 ffff967e4b0a1740 ffff967dcdd50000 ffff967dcd8825c0
ffff967eafad9780 ffffb7ab84c5fd18 ffffffff938789dc 0000000000000286
0000000000000286 ffff967dcdd50000 ffff967ea7d691e8 ffff967ea6df2528
Call Trace:
[<ffffffff938789dc>] ? __schedule+0x21c/0x680
[<ffffffff93878e76>] schedule+0x36/0x80
[<ffffffff933d1cfc>] blk_mq_freeze_queue_wait+0x3c/0x90
[<ffffffff930ecd50>] ? prepare_to_wait_event+0x110/0x110
[<ffffffff933d3bda>] blk_mq_freeze_queue+0x1a/0x20
[<ffffffff935b9631>] lo_release+0x61/0x70
[<ffffffff9328c095>] __blkdev_put+0x225/0x280
[<ffffffff9328c8fc>] blkdev_put+0x4c/0x110
[<ffffffff9328ca75>] blkdev_close+0x25/0x30
[<ffffffff93253137>] __fput+0xe7/0x220
[<ffffffff932532ae>] ____fput+0xe/0x10
[<ffffffff930c53f3>] task_work_run+0x83/0xb0
[<ffffffff930a21e7>] exit_to_usermode_loop+0x66/0x92
[<ffffffff93003a95>] do_syscall_64+0x165/0x180
[<ffffffff9387d5ab>] entry_SYSCALL64_slow_path+0x25/0x25
====

I look into the percpu-refcount code, and found a place that might
have a race that might cause this.
Consider a concurrent percpu_ref_kill and percpu_ref_tryget_live:

====
CPU A                           CPU B
-----                           -----
percpu_ref_kill()               percpu_ref_tryget_live()
{
                                if (__ref_is_percpu())
  set __PERCPU_REF_DEAD;
  __percpu_ref_switch_mode();
   ^ sum up current percpu_count
                                this_cpu_inc(*percpu_count); <- this
increment got leaked.

====

So if later CPU B later does percpu_ref_put, it will cause ref->count
to drop to -1.
And thus causing the above hung task issue.

Do you think this theory is correct, or am I missing something?
Please tell me what do you think.

Thanks,
David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: blk_mq_freeze_queue hang and possible race in percpu-refcount
  2018-03-13 22:50 blk_mq_freeze_queue hang and possible race in percpu-refcount David Chen
@ 2018-03-14 15:43 ` Tejun Heo
  2018-03-14 17:42   ` David Chen
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2018-03-14 15:43 UTC (permalink / raw)
  To: David Chen; +Cc: Shaohua Li, lkml

Hello, David.

On Tue, Mar 13, 2018 at 03:50:47PM -0700, David Chen wrote:
> ====
> CPU A                           CPU B
> -----                           -----
> percpu_ref_kill()               percpu_ref_tryget_live()
> {
>                                 if (__ref_is_percpu())
>   set __PERCPU_REF_DEAD;
>   __percpu_ref_switch_mode();
>    ^ sum up current percpu_count
>                                 this_cpu_inc(*percpu_count); <- this
> increment got leaked.
> 
> ====
> 
> So if later CPU B later does percpu_ref_put, it will cause ref->count
> to drop to -1.
> And thus causing the above hung task issue.
> 
> Do you think this theory is correct, or am I missing something?
> Please tell me what do you think.

The switching to atomic mode does something like the following.

1. Mark the refcnt so that __ref_is_percpu() is false.

2. Wait for RCU grace period so that everyone including
   percpu_ref_tryget_live() which has seen true __ref_is_percpu() is
   done with its operation.

3. Now that it knows nobody is operating on the assumption that the
   counter is in percpu mode, it adds up all the percpu counters.

So, provided there aren't some silly bugs, what you described
shouldn't happen.  Can you force the refcnt into atomic mode w/
PERCPU_REF_INIT_ATOMIC and see whether the problem persists?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: blk_mq_freeze_queue hang and possible race in percpu-refcount
  2018-03-14 15:43 ` Tejun Heo
@ 2018-03-14 17:42   ` David Chen
  0 siblings, 0 replies; 3+ messages in thread
From: David Chen @ 2018-03-14 17:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Shaohua Li, lkml

Hi Tejun,

Thanks, I see I missed the RCU part.
I'll try the force atomic thing.
Though so far I haven't been able to reproduce it yet.

Thanks,
David


2018-03-14 8:43 GMT-07:00 Tejun Heo <tj@kernel.org>:
> Hello, David.
>
> On Tue, Mar 13, 2018 at 03:50:47PM -0700, David Chen wrote:
>> ====
>> CPU A                           CPU B
>> -----                           -----
>> percpu_ref_kill()               percpu_ref_tryget_live()
>> {
>>                                 if (__ref_is_percpu())
>>   set __PERCPU_REF_DEAD;
>>   __percpu_ref_switch_mode();
>>    ^ sum up current percpu_count
>>                                 this_cpu_inc(*percpu_count); <- this
>> increment got leaked.
>>
>> ====
>>
>> So if later CPU B later does percpu_ref_put, it will cause ref->count
>> to drop to -1.
>> And thus causing the above hung task issue.
>>
>> Do you think this theory is correct, or am I missing something?
>> Please tell me what do you think.
>
> The switching to atomic mode does something like the following.
>
> 1. Mark the refcnt so that __ref_is_percpu() is false.
>
> 2. Wait for RCU grace period so that everyone including
>    percpu_ref_tryget_live() which has seen true __ref_is_percpu() is
>    done with its operation.
>
> 3. Now that it knows nobody is operating on the assumption that the
>    counter is in percpu mode, it adds up all the percpu counters.
>
> So, provided there aren't some silly bugs, what you described
> shouldn't happen.  Can you force the refcnt into atomic mode w/
> PERCPU_REF_INIT_ATOMIC and see whether the problem persists?
>
> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-03-14 17:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-13 22:50 blk_mq_freeze_queue hang and possible race in percpu-refcount David Chen
2018-03-14 15:43 ` Tejun Heo
2018-03-14 17:42   ` David Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).