* [PATCH] block: null_blk: fix race condition for null_del_dev
@ 2019-05-31 6:05 Bob Liu
2019-06-02 23:43 ` Chaitanya Kulkarni
2019-06-13 8:45 ` Jens Axboe
0 siblings, 2 replies; 6+ messages in thread
From: Bob Liu @ 2019-05-31 6:05 UTC (permalink / raw)
To: linux-block
Cc: axboe, hare, hch, martin.petersen, bart.vanassche, ming.lei, Bob Liu
Dulicate call of null_del_dev() will trigger null pointer error like below.
The reason is a race condition between nullb_device_power_store() and
nullb_group_drop_item().
CPU#0 CPU#1
---------------- -----------------
do_rmdir()
>configfs_rmdir()
>client_drop_item()
>nullb_group_drop_item()
nullb_device_power_store()
>null_del_dev()
>test_and_clear_bit(NULLB_DEV_FL_UP
>null_del_dev()
^^^^^
Duplicated null_dev_dev() triger null pointer error
>clear_bit(NULLB_DEV_FL_UP
The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
[ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 698.613608] #PF error: [normal kernel read fault]
[ 698.613611] PGD 0 P4D 0
[ 698.613619] Oops: 0000 [#1] SMP PTI
[ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
[ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
[ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
[ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
[ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
[ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
[ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
[ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
[ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
[ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
[ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
[ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
[ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 698.613700] Call Trace:
[ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
[ 698.613722] client_drop_item+0x29/0x40
[ 698.613728] configfs_rmdir+0x1ed/0x300
[ 698.613738] vfs_rmdir+0xb2/0x130
[ 698.613743] do_rmdir+0x1c7/0x1e0
[ 698.613750] __x64_sys_rmdir+0x17/0x20
[ 698.613759] do_syscall_64+0x5a/0x110
[ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
drivers/block/null_blk_main.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 62c9654..99dd0ab 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
dev->power = newp;
} else if (dev->power && !newp) {
- mutex_lock(&lock);
- dev->power = newp;
- null_del_dev(dev->nullb);
- mutex_unlock(&lock);
- clear_bit(NULLB_DEV_FL_UP, &dev->flags);
+ if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
+ mutex_lock(&lock);
+ dev->power = newp;
+ null_del_dev(dev->nullb);
+ mutex_unlock(&lock);
+ }
clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
}
--
2.9.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] block: null_blk: fix race condition for null_del_dev
2019-05-31 6:05 [PATCH] block: null_blk: fix race condition for null_del_dev Bob Liu
@ 2019-06-02 23:43 ` Chaitanya Kulkarni
2019-06-12 9:11 ` Bob Liu
2019-06-13 8:45 ` Jens Axboe
1 sibling, 1 reply; 6+ messages in thread
From: Chaitanya Kulkarni @ 2019-06-02 23:43 UTC (permalink / raw)
To: Bob Liu, linux-block
Cc: axboe, hare, hch, martin.petersen, bart.vanassche, ming.lei
Thanks for your patch Bob.
Looks good to me.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
On 5/30/19 11:07 PM, Bob Liu wrote:
> Dulicate call of null_del_dev() will trigger null pointer error like below.
> The reason is a race condition between nullb_device_power_store() and
> nullb_group_drop_item().
>
> CPU#0 CPU#1
> ---------------- -----------------
> do_rmdir()
> >configfs_rmdir()
> >client_drop_item()
> >nullb_group_drop_item()
> nullb_device_power_store()
> >null_del_dev()
>
> >test_and_clear_bit(NULLB_DEV_FL_UP
> >null_del_dev()
> ^^^^^
> Duplicated null_dev_dev() triger null pointer error
>
> >clear_bit(NULLB_DEV_FL_UP
>
> The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
>
> [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 698.613608] #PF error: [normal kernel read fault]
> [ 698.613611] PGD 0 P4D 0
> [ 698.613619] Oops: 0000 [#1] SMP PTI
> [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
> [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
> [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
> [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
> [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
> [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
> [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
> [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
> [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
> [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
> [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
> [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
> [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 698.613700] Call Trace:
> [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
> [ 698.613722] client_drop_item+0x29/0x40
> [ 698.613728] configfs_rmdir+0x1ed/0x300
> [ 698.613738] vfs_rmdir+0xb2/0x130
> [ 698.613743] do_rmdir+0x1c7/0x1e0
> [ 698.613750] __x64_sys_rmdir+0x17/0x20
> [ 698.613759] do_syscall_64+0x5a/0x110
> [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> drivers/block/null_blk_main.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
> index 62c9654..99dd0ab 100644
> --- a/drivers/block/null_blk_main.c
> +++ b/drivers/block/null_blk_main.c
> @@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
> set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
> dev->power = newp;
> } else if (dev->power && !newp) {
> - mutex_lock(&lock);
> - dev->power = newp;
> - null_del_dev(dev->nullb);
> - mutex_unlock(&lock);
> - clear_bit(NULLB_DEV_FL_UP, &dev->flags);
> + if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
> + mutex_lock(&lock);
> + dev->power = newp;
> + null_del_dev(dev->nullb);
> + mutex_unlock(&lock);
> + }
> clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: null_blk: fix race condition for null_del_dev
2019-06-02 23:43 ` Chaitanya Kulkarni
@ 2019-06-12 9:11 ` Bob Liu
0 siblings, 0 replies; 6+ messages in thread
From: Bob Liu @ 2019-06-12 9:11 UTC (permalink / raw)
To: Chaitanya Kulkarni, linux-block
Cc: axboe, hare, hch, martin.petersen, bart.vanassche, ming.lei
Ping.
On 6/3/19 7:43 AM, Chaitanya Kulkarni wrote:
> Thanks for your patch Bob.
>
> Looks good to me.
>
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>
> On 5/30/19 11:07 PM, Bob Liu wrote:
>> Dulicate call of null_del_dev() will trigger null pointer error like below.
>> The reason is a race condition between nullb_device_power_store() and
>> nullb_group_drop_item().
>>
>> CPU#0 CPU#1
>> ---------------- -----------------
>> do_rmdir()
>> >configfs_rmdir()
>> >client_drop_item()
>> >nullb_group_drop_item()
>> nullb_device_power_store()
>> >null_del_dev()
>>
>> >test_and_clear_bit(NULLB_DEV_FL_UP
>> >null_del_dev()
>> ^^^^^
>> Duplicated null_dev_dev() triger null pointer error
>>
>> >clear_bit(NULLB_DEV_FL_UP
>>
>> The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
>>
>> [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> [ 698.613608] #PF error: [normal kernel read fault]
>> [ 698.613611] PGD 0 P4D 0
>> [ 698.613619] Oops: 0000 [#1] SMP PTI
>> [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
>> [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
>> [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
>> [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
>> [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
>> [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
>> [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
>> [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
>> [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
>> [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
>> [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
>> [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
>> [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 698.613700] Call Trace:
>> [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
>> [ 698.613722] client_drop_item+0x29/0x40
>> [ 698.613728] configfs_rmdir+0x1ed/0x300
>> [ 698.613738] vfs_rmdir+0xb2/0x130
>> [ 698.613743] do_rmdir+0x1c7/0x1e0
>> [ 698.613750] __x64_sys_rmdir+0x17/0x20
>> [ 698.613759] do_syscall_64+0x5a/0x110
>> [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/null_blk_main.c | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
>> index 62c9654..99dd0ab 100644
>> --- a/drivers/block/null_blk_main.c
>> +++ b/drivers/block/null_blk_main.c
>> @@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
>> set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>> dev->power = newp;
>> } else if (dev->power && !newp) {
>> - mutex_lock(&lock);
>> - dev->power = newp;
>> - null_del_dev(dev->nullb);
>> - mutex_unlock(&lock);
>> - clear_bit(NULLB_DEV_FL_UP, &dev->flags);
>> + if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
>> + mutex_lock(&lock);
>> + dev->power = newp;
>> + null_del_dev(dev->nullb);
>> + mutex_unlock(&lock);
>> + }
>> clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>> }
>>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: null_blk: fix race condition for null_del_dev
2019-05-31 6:05 [PATCH] block: null_blk: fix race condition for null_del_dev Bob Liu
2019-06-02 23:43 ` Chaitanya Kulkarni
@ 2019-06-13 8:45 ` Jens Axboe
2019-06-13 12:40 ` Bob Liu
1 sibling, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2019-06-13 8:45 UTC (permalink / raw)
To: Bob Liu, linux-block; +Cc: hare, hch, martin.petersen, bart.vanassche, ming.lei
On 5/31/19 12:05 AM, Bob Liu wrote:
> Dulicate call of null_del_dev() will trigger null pointer error like below.
> The reason is a race condition between nullb_device_power_store() and
> nullb_group_drop_item().
>
> CPU#0 CPU#1
> ---------------- -----------------
> do_rmdir()
> >configfs_rmdir()
> >client_drop_item()
> >nullb_group_drop_item()
> nullb_device_power_store()
> >null_del_dev()
>
> >test_and_clear_bit(NULLB_DEV_FL_UP
> >null_del_dev()
> ^^^^^
> Duplicated null_dev_dev() triger null pointer error
>
> >clear_bit(NULLB_DEV_FL_UP
>
> The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
>
> [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 698.613608] #PF error: [normal kernel read fault]
> [ 698.613611] PGD 0 P4D 0
> [ 698.613619] Oops: 0000 [#1] SMP PTI
> [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
> [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
> [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
> [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
> [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
> [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
> [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
> [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
> [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
> [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
> [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
> [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
> [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 698.613700] Call Trace:
> [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
> [ 698.613722] client_drop_item+0x29/0x40
> [ 698.613728] configfs_rmdir+0x1ed/0x300
> [ 698.613738] vfs_rmdir+0xb2/0x130
> [ 698.613743] do_rmdir+0x1c7/0x1e0
> [ 698.613750] __x64_sys_rmdir+0x17/0x20
> [ 698.613759] do_syscall_64+0x5a/0x110
> [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> drivers/block/null_blk_main.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
> index 62c9654..99dd0ab 100644
> --- a/drivers/block/null_blk_main.c
> +++ b/drivers/block/null_blk_main.c
> @@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
> set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
> dev->power = newp;
> } else if (dev->power && !newp) {
> - mutex_lock(&lock);
> - dev->power = newp;
> - null_del_dev(dev->nullb);
> - mutex_unlock(&lock);
> - clear_bit(NULLB_DEV_FL_UP, &dev->flags);
> + if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
> + mutex_lock(&lock);
> + dev->power = newp;
> + null_del_dev(dev->nullb);
> + mutex_unlock(&lock);
> + }
> clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
Is the ->power check safe? Should that be under the lock as well?
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: null_blk: fix race condition for null_del_dev
2019-06-13 8:45 ` Jens Axboe
@ 2019-06-13 12:40 ` Bob Liu
2019-06-15 7:44 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Bob Liu @ 2019-06-13 12:40 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: hare, hch, martin.petersen, bart.vanassche, ming.lei
On 6/13/19 4:45 PM, Jens Axboe wrote:
> On 5/31/19 12:05 AM, Bob Liu wrote:
>> Dulicate call of null_del_dev() will trigger null pointer error like below.
>> The reason is a race condition between nullb_device_power_store() and
>> nullb_group_drop_item().
>>
>> CPU#0 CPU#1
>> ---------------- -----------------
>> do_rmdir()
>> >configfs_rmdir()
>> >client_drop_item()
>> >nullb_group_drop_item()
>> nullb_device_power_store()
>> >null_del_dev()
>>
>> >test_and_clear_bit(NULLB_DEV_FL_UP
>> >null_del_dev()
>> ^^^^^
>> Duplicated null_dev_dev() triger null pointer error
>>
>> >clear_bit(NULLB_DEV_FL_UP
>>
>> The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
>>
>> [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> [ 698.613608] #PF error: [normal kernel read fault]
>> [ 698.613611] PGD 0 P4D 0
>> [ 698.613619] Oops: 0000 [#1] SMP PTI
>> [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
>> [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
>> [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
>> [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
>> [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
>> [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
>> [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
>> [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
>> [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
>> [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
>> [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
>> [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
>> [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 698.613700] Call Trace:
>> [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
>> [ 698.613722] client_drop_item+0x29/0x40
>> [ 698.613728] configfs_rmdir+0x1ed/0x300
>> [ 698.613738] vfs_rmdir+0xb2/0x130
>> [ 698.613743] do_rmdir+0x1c7/0x1e0
>> [ 698.613750] __x64_sys_rmdir+0x17/0x20
>> [ 698.613759] do_syscall_64+0x5a/0x110
>> [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/null_blk_main.c | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
>> index 62c9654..99dd0ab 100644
>> --- a/drivers/block/null_blk_main.c
>> +++ b/drivers/block/null_blk_main.c
>> @@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
>> set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>> dev->power = newp;
>> } else if (dev->power && !newp) {
>> - mutex_lock(&lock);
>> - dev->power = newp;
>> - null_del_dev(dev->nullb);
>> - mutex_unlock(&lock);
>> - clear_bit(NULLB_DEV_FL_UP, &dev->flags);
>> + if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
>> + mutex_lock(&lock);
>> + dev->power = newp;
>> + null_del_dev(dev->nullb);
>> + mutex_unlock(&lock);
>> + }
>> clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>
> Is the ->power check safe? Should that be under the lock as well?
>
I think it's unnecessary.
Even if dev->power is modified after checking, the test_and_clear_bit can still kepp null_dev_dev() won't be wrongly called.
CPU#0 CPU#1
---------------- -----------------
do_rmdir()
>configfs_rmdir()
>client_drop_item()
>nullb_group_drop_item()
nullb_device_power_store()
> if dev->power
>if test_and_clear_bit(NULLB_DEV_FL_UP
> dev->power=false
^^^ Even if dev->power is modifiled after CPU#1 check
> if test_and_clear_bit(NULLB_DEV_FL_UP
^^^^
This test_and_clear_bit can still keep null_del_dev() won't be called twice
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: null_blk: fix race condition for null_del_dev
2019-06-13 12:40 ` Bob Liu
@ 2019-06-15 7:44 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2019-06-15 7:44 UTC (permalink / raw)
To: Bob Liu, linux-block; +Cc: hare, hch, martin.petersen, bart.vanassche, ming.lei
On 6/13/19 6:40 AM, Bob Liu wrote:
> On 6/13/19 4:45 PM, Jens Axboe wrote:
>> On 5/31/19 12:05 AM, Bob Liu wrote:
>>> Dulicate call of null_del_dev() will trigger null pointer error like below.
>>> The reason is a race condition between nullb_device_power_store() and
>>> nullb_group_drop_item().
>>>
>>> CPU#0 CPU#1
>>> ---------------- -----------------
>>> do_rmdir()
>>> >configfs_rmdir()
>>> >client_drop_item()
>>> >nullb_group_drop_item()
>>> nullb_device_power_store()
>>> >null_del_dev()
>>>
>>> >test_and_clear_bit(NULLB_DEV_FL_UP
>>> >null_del_dev()
>>> ^^^^^
>>> Duplicated null_dev_dev() triger null pointer error
>>>
>>> >clear_bit(NULLB_DEV_FL_UP
>>>
>>> The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
>>>
>>> [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>>> [ 698.613608] #PF error: [normal kernel read fault]
>>> [ 698.613611] PGD 0 P4D 0
>>> [ 698.613619] Oops: 0000 [#1] SMP PTI
>>> [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
>>> [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
>>> [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
>>> [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
>>> [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
>>> [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
>>> [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
>>> [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
>>> [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
>>> [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
>>> [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
>>> [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
>>> [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 698.613700] Call Trace:
>>> [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk]
>>> [ 698.613722] client_drop_item+0x29/0x40
>>> [ 698.613728] configfs_rmdir+0x1ed/0x300
>>> [ 698.613738] vfs_rmdir+0xb2/0x130
>>> [ 698.613743] do_rmdir+0x1c7/0x1e0
>>> [ 698.613750] __x64_sys_rmdir+0x17/0x20
>>> [ 698.613759] do_syscall_64+0x5a/0x110
>>> [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>> drivers/block/null_blk_main.c | 11 ++++++-----
>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
>>> index 62c9654..99dd0ab 100644
>>> --- a/drivers/block/null_blk_main.c
>>> +++ b/drivers/block/null_blk_main.c
>>> @@ -326,11 +326,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
>>> set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>>> dev->power = newp;
>>> } else if (dev->power && !newp) {
>>> - mutex_lock(&lock);
>>> - dev->power = newp;
>>> - null_del_dev(dev->nullb);
>>> - mutex_unlock(&lock);
>>> - clear_bit(NULLB_DEV_FL_UP, &dev->flags);
>>> + if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
>>> + mutex_lock(&lock);
>>> + dev->power = newp;
>>> + null_del_dev(dev->nullb);
>>> + mutex_unlock(&lock);
>>> + }
>>> clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
>>
>> Is the ->power check safe? Should that be under the lock as well?
>>
>
> I think it's unnecessary. Even if dev->power is modified after
> checking, the test_and_clear_bit can still kepp null_dev_dev() won't
> be wrongly called.
Fair enough - applied, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-06-15 7:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-31 6:05 [PATCH] block: null_blk: fix race condition for null_del_dev Bob Liu
2019-06-02 23:43 ` Chaitanya Kulkarni
2019-06-12 9:11 ` Bob Liu
2019-06-13 8:45 ` Jens Axboe
2019-06-13 12:40 ` Bob Liu
2019-06-15 7:44 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).