Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/3] btrfs: scrub: fix scrub_lock
@ 2019-01-30  6:44 Anand Jain
  2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Anand Jain @ 2019-01-30  6:44 UTC (permalink / raw)
  To: linux-btrfs

Fixes the circular locking dependency warning as in patch 1/3,
and patch 2/3 adds lockdep_assert_held() to scrub_workers_get().
Patch 3/3 converts scrub_workers_refcnt into refcount_t.

Anand Jain (3):
  btrfs: scrub: fix circular locking dependency warning
  btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get
  btrfs: scrub: convert scrub_workers_refcnt to refcount_t

 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c |  2 +-
 fs/btrfs/scrub.c   | 28 +++++++++++++++-------------
 3 files changed, 17 insertions(+), 15 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
@ 2019-01-30  6:45 ` Anand Jain
  2019-01-30 14:07   ` David Sterba
  2019-02-12 16:45   ` David Sterba
  2019-01-30  6:45 ` [PATCH v4 2/3] btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get Anand Jain
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Anand Jain @ 2019-01-30  6:45 UTC (permalink / raw)
  To: linux-btrfs

Circular locking dependency check reports warning[1], that's because
the btrfs_scrub_dev() calls the stack #0 below with, the
fs_info::scrub_lock held. The test case leading to this warning..

  mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs
  btrfs scrub start -B /btrfs

In fact we have fs_info::scrub_workers_refcnt to tack if the init and
destroy of the scrub workers are needed. So once we have incremented
and decremented the fs_info::scrub_workers_refcnt value in the thread,
its ok to drop the scrub_lock, and then actually do the
btrfs_destroy_workqueue() part. So this patch drops the scrub_lock
before calling btrfs_destroy_workqueue().

[1]
[   76.146826] ======================================================
[   76.147086] WARNING: possible circular locking dependency detected
[   76.147316] 4.20.0-rc3+ #41 Not tainted
[   76.147489] ------------------------------------------------------
[   76.147722] btrfs/4065 is trying to acquire lock:
[   76.147984] 0000000038593bc0 ((wq_completion)"%s-%s""btrfs",
name){+.+.}, at: flush_workqueue+0x70/0x4d0
[   76.148337]
but task is already holding lock:
[   76.148594] 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
btrfs_scrub_dev+0x316/0x5d0 [btrfs]
[   76.148909]
which lock already depends on the new lock.

[   76.149191]
the existing dependency chain (in reverse order) is:
[   76.149446]
-> #3 (&fs_info->scrub_lock){+.+.}:
[   76.149707]        btrfs_scrub_dev+0x11f/0x5d0 [btrfs]
[   76.149924]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.150216]        do_vfs_ioctl+0xa9/0x6d0
[   76.150468]        ksys_ioctl+0x60/0x90
[   76.150716]        __x64_sys_ioctl+0x16/0x20
[   76.150911]        do_syscall_64+0x50/0x180
[   76.151182]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   76.151469]
-> #2 (&fs_devs->device_list_mutex){+.+.}:
[   76.151851]        reada_start_machine_worker+0xca/0x3f0 [btrfs]
[   76.152195]        normal_work_helper+0xf0/0x4c0 [btrfs]
[   76.152489]        process_one_work+0x1f4/0x520
[   76.152751]        worker_thread+0x46/0x3d0
[   76.153715]        kthread+0xf8/0x130
[   76.153912]        ret_from_fork+0x3a/0x50
[   76.154178]
-> #1 ((work_completion)(&work->normal_work)){+.+.}:
[   76.154575]        worker_thread+0x46/0x3d0
[   76.154828]        kthread+0xf8/0x130
[   76.155108]        ret_from_fork+0x3a/0x50
[   76.155357]
-> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}:
[   76.155751]        flush_workqueue+0x9a/0x4d0
[   76.155911]        drain_workqueue+0xca/0x1a0
[   76.156182]        destroy_workqueue+0x17/0x230
[   76.156455]        btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
[   76.156756]        scrub_workers_put+0x2e/0x60 [btrfs]
[   76.156931]        btrfs_scrub_dev+0x329/0x5d0 [btrfs]
[   76.157219]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.157491]        do_vfs_ioctl+0xa9/0x6d0
[   76.157742]        ksys_ioctl+0x60/0x90
[   76.157910]        __x64_sys_ioctl+0x16/0x20
[   76.158177]        do_syscall_64+0x50/0x180
[   76.158429]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   76.158716]
other info that might help us debug this:

[   76.158908] Chain exists of:
  (wq_completion)"%s-%s""btrfs", name --> &fs_devs->device_list_mutex
--> &fs_info->scrub_lock

[   76.159629]  Possible unsafe locking scenario:

[   76.160607]        CPU0                    CPU1
[   76.160934]        ----                    ----
[   76.161210]   lock(&fs_info->scrub_lock);
[   76.161458]
lock(&fs_devs->device_list_mutex);
[   76.161805]
lock(&fs_info->scrub_lock);
[   76.161909]   lock((wq_completion)"%s-%s""btrfs", name);
[   76.162201]
 *** DEADLOCK ***

[   76.162627] 2 locks held by btrfs/4065:
[   76.162897]  #0: 00000000bef2775b (sb_writers#12){.+.+}, at:
mnt_want_write_file+0x24/0x50
[   76.163335]  #1: 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
btrfs_scrub_dev+0x316/0x5d0 [btrfs]
[   76.163796]
stack backtrace:
[   76.163911] CPU: 1 PID: 4065 Comm: btrfs Not tainted 4.20.0-rc3+ #41
[   76.164228] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[   76.164646] Call Trace:
[   76.164872]  dump_stack+0x5e/0x8b
[   76.165128]  print_circular_bug.isra.37+0x1f1/0x1fe
[   76.165398]  __lock_acquire+0x14aa/0x1620
[   76.165652]  lock_acquire+0xb0/0x190
[   76.165910]  ? flush_workqueue+0x70/0x4d0
[   76.166175]  flush_workqueue+0x9a/0x4d0
[   76.166420]  ? flush_workqueue+0x70/0x4d0
[   76.166671]  ? drain_workqueue+0x52/0x1a0
[   76.166911]  drain_workqueue+0xca/0x1a0
[   76.167167]  destroy_workqueue+0x17/0x230
[   76.167428]  btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
[   76.167720]  scrub_workers_put+0x2e/0x60 [btrfs]
[   76.168233]  btrfs_scrub_dev+0x329/0x5d0 [btrfs]
[   76.168504]  ? __sb_start_write+0x121/0x1b0
[   76.168759]  ? mnt_want_write_file+0x24/0x50
[   76.169654]  btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.169934]  ? find_held_lock+0x2d/0x90
[   76.170204]  ? find_held_lock+0x2d/0x90
[   76.170450]  do_vfs_ioctl+0xa9/0x6d0
[   76.170690]  ? __fget+0x101/0x1f0
[   76.170910]  ? __fget+0x5/0x1f0
[   76.171157]  ksys_ioctl+0x60/0x90
[   76.171391]  __x64_sys_ioctl+0x16/0x20
[   76.171634]  do_syscall_64+0x50/0x180
[   76.171892]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   76.172186] RIP: 0033:0x7f61d422e567
[   76.172425] Code: 44 00 00 48 8b 05 29 09 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 08 2d 00 f7 d8 64 89 01 48
[   76.172911] RSP: 002b:00007f61d3936d68 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[   76.173328] RAX: ffffffffffffffda RBX: 00000000019026b0 RCX:
00007f61d422e567
[   76.173649] RDX: 00000000019026b0 RSI: 00000000c400941b RDI:
0000000000000003
[   76.173909] RBP: 0000000000000000 R08: 00007f61d3937700 R09:
0000000000000000
[   76.174244] R10: 00007f61d3937700 R11: 0000000000000246 R12:
0000000000000000
[   76.174566] R13: 0000000000801000 R14: 0000000000000000 R15:
00007f61d3937700
[   76.175217] btrfs (4065) used greatest stack depth: 11424 bytes left

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v3->v4: Fix list corruption as reported by btrfs/073 by David.
           [1]
           https://patchwork.kernel.org/patch/10705741/
	Which I was able to reproduce with an instrumented kernel but not with
	btrfs/073.
        In v3 patch, it releases the fs_info::scrub_lock to destroy the work queue
	which raced with new scrub requests, overwriting the scrub workers
	pointers. So in v4, it kills the function scrub_workers_put(), and
	performs the destroy_workqueue in two stages, with worker pointers
	copied locally.
v2->v3: none
v1->v2: none

 fs/btrfs/scrub.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 33f2793bdee0..f2f0be7864b8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3770,16 +3770,6 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 	return -ENOMEM;
 }
 
-static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info *fs_info)
-{
-	if (--fs_info->scrub_workers_refcnt == 0) {
-		btrfs_destroy_workqueue(fs_info->scrub_workers);
-		btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
-		btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
-	}
-	WARN_ON(fs_info->scrub_workers_refcnt < 0);
-}
-
 int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		    u64 end, struct btrfs_scrub_progress *progress,
 		    int readonly, int is_dev_replace)
@@ -3788,6 +3778,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	int ret;
 	struct btrfs_device *dev;
 	unsigned int nofs_flag;
+	struct btrfs_workqueue *scrub_workers = NULL;
+	struct btrfs_workqueue *scrub_wr_comp = NULL;
+	struct btrfs_workqueue *scrub_parity = NULL;
 
 	if (btrfs_fs_closing(fs_info))
 		return -EINVAL;
@@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 
 	mutex_lock(&fs_info->scrub_lock);
 	dev->scrub_ctx = NULL;
-	scrub_workers_put(fs_info);
+	if (--fs_info->scrub_workers_refcnt == 0) {
+		scrub_workers = fs_info->scrub_workers;
+		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
+		scrub_parity = fs_info->scrub_parity_workers;
+	}
 	mutex_unlock(&fs_info->scrub_lock);
 
+	btrfs_destroy_workqueue(scrub_workers);
+	btrfs_destroy_workqueue(scrub_wr_comp);
+	btrfs_destroy_workqueue(scrub_parity);
 	scrub_put_ctx(sctx);
 
 	return ret;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 2/3] btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
  2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
@ 2019-01-30  6:45 ` Anand Jain
  2019-01-30  6:45 ` [PATCH v4 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-01-30  6:45 UTC (permalink / raw)
  To: linux-btrfs

scrub_workers_refcnt is protected by scrub_lock, add lockdep_assert_held()
in scrub_workers_get().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Suggested-by: Nikolay Borisov <nborisov@suse.com>
---
v4: none
v3: none
v2: born
 fs/btrfs/scrub.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f2f0be7864b8..17925af759ae 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3741,6 +3741,8 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 	unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
 	int max_active = fs_info->thread_pool_size;
 
+	lockdep_assert_held(&fs_info->scrub_lock);
+
 	if (fs_info->scrub_workers_refcnt == 0) {
 		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
 				flags, is_dev_replace ? 1 : max_active, 4);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
  2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
  2019-01-30  6:45 ` [PATCH v4 2/3] btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get Anand Jain
@ 2019-01-30  6:45 ` Anand Jain
  2019-02-08 17:02 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-01-30  6:45 UTC (permalink / raw)
  To: linux-btrfs

Use the refcount_t for fs_info::scrub_workers_refcnt instead of int.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v4: born
 fs/btrfs/ctree.h   | 2 +-
 fs/btrfs/disk-io.c | 2 +-
 fs/btrfs/scrub.c   | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fecc64d8e285..66712610d143 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1074,7 +1074,7 @@ struct btrfs_fs_info {
 	atomic_t scrubs_paused;
 	atomic_t scrub_cancel_req;
 	wait_queue_head_t scrub_pause_wait;
-	int scrub_workers_refcnt;
+	refcount_t scrub_workers_refcnt;
 	struct btrfs_workqueue *scrub_workers;
 	struct btrfs_workqueue *scrub_wr_completion_workers;
 	struct btrfs_workqueue *scrub_nocow_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index de4b7ed02da1..5801e798895f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2109,7 +2109,7 @@ static void btrfs_init_scrub(struct btrfs_fs_info *fs_info)
 	atomic_set(&fs_info->scrubs_paused, 0);
 	atomic_set(&fs_info->scrub_cancel_req, 0);
 	init_waitqueue_head(&fs_info->scrub_pause_wait);
-	fs_info->scrub_workers_refcnt = 0;
+	refcount_set(&fs_info->scrub_workers_refcnt, 0);
 }
 
 static void btrfs_init_balance(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 17925af759ae..ccec3b4bc40b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3743,7 +3743,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 
 	lockdep_assert_held(&fs_info->scrub_lock);
 
-	if (fs_info->scrub_workers_refcnt == 0) {
+	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
 		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
 				flags, is_dev_replace ? 1 : max_active, 4);
 		if (!fs_info->scrub_workers)
@@ -3761,7 +3761,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 		if (!fs_info->scrub_parity_workers)
 			goto fail_scrub_parity_workers;
 	}
-	++fs_info->scrub_workers_refcnt;
+	refcount_inc(&fs_info->scrub_workers_refcnt);
 	return 0;
 
 fail_scrub_parity_workers:
@@ -3927,7 +3927,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 
 	mutex_lock(&fs_info->scrub_lock);
 	dev->scrub_ctx = NULL;
-	if (--fs_info->scrub_workers_refcnt == 0) {
+	if (refcount_dec_and_test(&fs_info->scrub_workers_refcnt)) {
 		scrub_workers = fs_info->scrub_workers;
 		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
 		scrub_parity = fs_info->scrub_parity_workers;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
@ 2019-01-30 14:07   ` David Sterba
  2019-01-31  6:34     ` Anand Jain
  2019-02-12 16:45   ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: David Sterba @ 2019-01-30 14:07 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
> v3->v4: Fix list corruption as reported by btrfs/073 by David.
>            [1]
>            https://patchwork.kernel.org/patch/10705741/
> 	Which I was able to reproduce with an instrumented kernel but not with
> 	btrfs/073.
>         In v3 patch, it releases the fs_info::scrub_lock to destroy the work queue
> 	which raced with new scrub requests, overwriting the scrub workers
> 	pointers. So in v4, it kills the function scrub_workers_put(), and
> 	performs the destroy_workqueue in two stages, with worker pointers
> 	copied locally.

> @@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  
>  	mutex_lock(&fs_info->scrub_lock);
>  	dev->scrub_ctx = NULL;
> -	scrub_workers_put(fs_info);
> +	if (--fs_info->scrub_workers_refcnt == 0) {
> +		scrub_workers = fs_info->scrub_workers;
> +		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
> +		scrub_parity = fs_info->scrub_parity_workers;
> +	}
>  	mutex_unlock(&fs_info->scrub_lock);
>  
> +	btrfs_destroy_workqueue(scrub_workers);
> +	btrfs_destroy_workqueue(scrub_wr_comp);
> +	btrfs_destroy_workqueue(scrub_parity);

https://lore.kernel.org/linux-btrfs/1543554924-17397-2-git-send-email-anand.jain@oracle.com/

Comparing to the previous version, it's almost the same I think. If
scrub_workers_get races between the unlock and destroy_workers, anything
that uses fs_info->scrub_wokers will soon use freed memory.

The difference is that the worker pointers are read from fs_info under a
lock but are still used outside. I haven't tested this version but from
the analysis of previous crash, I don't see how v4 is supposed to be
better.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-01-30 14:07   ` David Sterba
@ 2019-01-31  6:34     ` Anand Jain
  2019-02-12 15:49       ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Anand Jain @ 2019-01-31  6:34 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 1/30/19 10:07 PM, David Sterba wrote:
> On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
>> v3->v4: Fix list corruption as reported by btrfs/073 by David.
>>             [1]
>>             https://patchwork.kernel.org/patch/10705741/
>> 	Which I was able to reproduce with an instrumented kernel but not with
>> 	btrfs/073.
>>          In v3 patch, it releases the fs_info::scrub_lock to destroy the work queue
>> 	which raced with new scrub requests, overwriting the scrub workers
>> 	pointers. So in v4, it kills the function scrub_workers_put(), and
>> 	performs the destroy_workqueue in two stages, with worker pointers
>> 	copied locally.
> 
>> @@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>>   
>>   	mutex_lock(&fs_info->scrub_lock);
>>   	dev->scrub_ctx = NULL;
>> -	scrub_workers_put(fs_info);
>> +	if (--fs_info->scrub_workers_refcnt == 0) {
>> +		scrub_workers = fs_info->scrub_workers;
>> +		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
>> +		scrub_parity = fs_info->scrub_parity_workers;
>> +	}
>>   	mutex_unlock(&fs_info->scrub_lock);
>>   
>> +	btrfs_destroy_workqueue(scrub_workers);
>> +	btrfs_destroy_workqueue(scrub_wr_comp);
>> +	btrfs_destroy_workqueue(scrub_parity);
> 
> https://lore.kernel.org/linux-btrfs/1543554924-17397-2-git-send-email-anand.jain@oracle.com/
> 
> Comparing to the previous version, it's almost the same I think. If
> scrub_workers_get races between the unlock and destroy_workers, anything
> that uses fs_info->scrub_wokers will soon use freed memory.
> 
> The difference is that the worker pointers are read from fs_info under a
> lock but are still used outside. I haven't tested this version but from
> the analysis of previous crash, I don't see how v4 is supposed to be
> better.
>

Consider v3 code as below:

When process-A is at [1] (below) start another
btrfs scrub start, lets call it process-B.
When process-A is at [1] it unlocks the fs_info::scrub_lock so the
process-B can overwrite fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
which the process-A at [1] has not yet called destroyed.

Process-A
---------

btrfs scrub start /mnt

::
         mutex_lock(&fs_info->scrub_lock);
::
         if (dev->scrub_ctx ||
             (!is_dev_replace &&
              btrfs_dev_replace_is_ongoing(&fs_info->dev_replace))) {
                 up_read(&fs_info->dev_replace.rwsem);
                 mutex_unlock(&fs_info->scrub_lock);
                 mutex_unlock(&fs_info->fs_devices->device_list_mutex);
                 ret = -EINPROGRESS;
                 goto out_free_ctx;
         }
::
         ret = scrub_workers_get(fs_info, is_dev_replace);  <-- [2]
::
         dev->scrub_ctx = sctx;
         mutex_unlock(&fs_info->scrub_lock);

::
                 ret = scrub_enumerate_chunks(sctx, dev, start, end);
::
         atomic_dec(&fs_info->scrubs_running);
::

         mutex_lock(&fs_info->scrub_lock);
         dev->scrub_ctx = NULL;
         scrub_workers_put(fs_info);
         mutex_unlock(&fs_info->scrub_lock);



static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info 
*fs_info)
{
         lockdep_assert_held(&fs_info->scrub_lock);
         if (--fs_info->scrub_workers_refcnt == 0) {
                 mutex_unlock(&fs_info->scrub_lock);

		<wait for process-B>  [1]

                 btrfs_destroy_workqueue(fs_info->scrub_workers);
 
btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
                 btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
                 mutex_lock(&fs_info->scrub_lock);
         }
         WARN_ON(fs_info->scrub_workers_refcnt < 0);
}




Process-B
---------
Start when process-A is at [1] (above)
btrfs scrub start /mnt

::
  at [2] (above) the fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
of process-A are overwritten.


So in v4.
--------

Similar to dev::scrub_ctx the fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
are stored locally before fs_info::scrub_lock is released, so the
list pointers aren't corrupted.

Hope this clarifies.

Thanks, Anand

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/3] btrfs: scrub: fix scrub_lock
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
                   ` (2 preceding siblings ...)
  2019-01-30  6:45 ` [PATCH v4 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
@ 2019-02-08 17:02 ` David Sterba
  2019-02-11  8:09   ` Anand Jain
  2019-02-11  8:38 ` [PATCH v5 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2019-02-08 17:02 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Jan 30, 2019 at 02:44:59PM +0800, Anand Jain wrote:
> Fixes the circular locking dependency warning as in patch 1/3,
> and patch 2/3 adds lockdep_assert_held() to scrub_workers_get().
> Patch 3/3 converts scrub_workers_refcnt into refcount_t.
> 
> Anand Jain (3):
>   btrfs: scrub: fix circular locking dependency warning
>   btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get
>   btrfs: scrub: convert scrub_workers_refcnt to refcount_t

The test 011 still complains, now user-after-free dected due to the
refcount_t underflow checks.

btrfs/011               [16:50:09][  334.134707] run fstests btrfs/011 at 2019-02-08 16:50:09
[  334.722472] BTRFS: device fsid e6112183-884c-473f-af91-217c504332b2 devid 1 transid 5 /dev/vdb
[  334.740057] BTRFS info (device vdb): disk space caching is enabled
[  334.742457] BTRFS info (device vdb): has skinny extents
[  334.744286] BTRFS info (device vdb): flagging fs with big metadata feature
[  334.766487] BTRFS info (device vdb): checking UUID tree
[  340.960787] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdc started
[  340.966691] ------------[ cut here ]------------
[  340.967918] refcount_t: increment on 0; use-after-free.
[  340.969290] WARNING: CPU: 3 PID: 24675 at lib/refcount.c:153 refcount_inc_checked+0x26/0x30
[  340.970962] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
[  340.973321] CPU: 3 PID: 24675 Comm: btrfs Not tainted 5.0.0-rc5-default+ #459
[  340.975172] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[  340.977686] RIP: 0010:refcount_inc_checked+0x26/0x30
[  340.981291] RSP: 0018:ffff976e4459bc78 EFLAGS: 00010286
[  340.982080] RAX: 0000000000000000 RBX: ffff8b1d65fb4000 RCX: 0000000000000006
[  340.983826] RDX: 0000000000000006 RSI: ffff8b1d74be58f8 RDI: ffff8b1d74be5000
[  340.985504] RBP: 0000000000000006 R08: 0000008abc17cca3 R09: 0000000000000000
[  340.987235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  340.988823] R13: ffff8b1d65fb4000 R14: 0000000000000000 R15: ffff8b1d65fb7620
[  340.990405] FS:  00007f68924908c0(0000) GS:ffff8b1d7da00000(0000) knlGS:0000000000000000
[  340.992198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  340.993689] CR2: 00007fffcc01d830 CR3: 00000000719e8000 CR4: 00000000000006e0
[  340.995453] Call Trace:
[  340.996198]  scrub_workers_get+0xc2/0x110 [btrfs]
[  340.997462]  btrfs_scrub_dev+0x16f/0x5c0 [btrfs]
[  340.998814]  ? start_transaction+0xa1/0x500 [btrfs]
[  341.000155]  btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
[  341.001500]  btrfs_ioctl+0x262d/0x2e10 [btrfs]
[  341.002186]  ? __lock_acquire+0x263/0xf10
[  341.002798]  ? kvm_clock_read+0x14/0x30
[  341.003386]  ? kvm_sched_clock_read+0x5/0x10
[  341.004023]  ? sched_clock+0x5/0x10
[  341.004574]  ? sched_clock_cpu+0xc/0xc0
[  341.005171]  ? do_vfs_ioctl+0xa2/0x6d0
[  341.005750]  do_vfs_ioctl+0xa2/0x6d0
[  341.006580]  ? do_sigaction+0x1a7/0x210
[  341.007521]  ksys_ioctl+0x3a/0x70
[  341.008273]  __x64_sys_ioctl+0x16/0x20
[  341.009112]  do_syscall_64+0x54/0x180
[  341.009939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  341.011166] RIP: 0033:0x7f6892585a97

[  341.015696] RSP: 002b:00007fffcc022d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  341.017384] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6892585a97
[  341.018344] RDX: 00007fffcc0231a0 RSI: 00000000ca289435 RDI: 0000000000000003
[  341.019964] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  341.021533] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fffcc02524a
[  341.023195] R13: 0000000000000001 R14: 0000000000000004 R15: 000056333a1b82a0
[  341.025017] irq event stamp: 32590
[  341.025812] hardirqs last  enabled at (32589): [<ffffffff960c5345>] console_unlock+0x435/0x5d0
[  341.027987] hardirqs last disabled at (32590): [<ffffffff96001ac9>] trace_hardirqs_off_thunk+0x1a/0x1c
[  341.030440] softirqs last  enabled at (31906): [<ffffffff96a00364>] __do_softirq+0x364/0x47b
[  341.032599] softirqs last disabled at (31631): [<ffffffff96065781>] irq_exit+0xd1/0xe0
[  341.033772] ---[ end trace 01d834fdbc0fe47a ]---
[  341.129440] ------------[ cut here ]------------
[  341.130750] refcount_t: underflow; use-after-free.
[  341.131746] WARNING: CPU: 0 PID: 24675 at lib/refcount.c:187 refcount_sub_and_test_checked+0x46/0x50
[  341.133045] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
[  341.134504] CPU: 0 PID: 24675 Comm: btrfs Tainted: G        W         5.0.0-rc5-default+ #459
[  341.136356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[  341.137931] RIP: 0010:refcount_sub_and_test_checked+0x46/0x50
[  341.142184] RSP: 0018:ffff976e4459bc98 EFLAGS: 00010282
[  341.143343] RAX: 0000000000000000 RBX: ffff8b1d64ea6800 RCX: 0000000000000006
[  341.144886] RDX: 0000000000000006 RSI: ffff8b1d74be58c0 RDI: ffff8b1d74be5000
[  341.146340] RBP: ffff8b1d64329c00 R08: 0000008accffd043 R09: 0000000000000000
[  341.147652] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b1d65fb76c0
[  341.149023] R13: ffff8b1d65fb4000 R14: 0000000000404100 R15: 0000000000000000
[  341.150876] FS:  00007f68924908c0(0000) GS:ffff8b1d7d400000(0000) knlGS:0000000000000000
[  341.152551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  341.153367] CR2: 00007f535aa68d40 CR3: 00000000719e8000 CR4: 00000000000006f0
[  341.154523] Call Trace:
[  341.155559]  btrfs_scrub_dev+0x35c/0x5c0 [btrfs]
[  341.156563]  ? start_transaction+0xa1/0x500 [btrfs]
[  341.157520]  btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
[  341.158951]  btrfs_ioctl+0x262d/0x2e10 [btrfs]
[  341.160373]  ? __lock_acquire+0x263/0xf10
[  341.161821]  ? kvm_clock_read+0x14/0x30
[  341.163052]  ? kvm_sched_clock_read+0x5/0x10
[  341.164568]  ? sched_clock+0x5/0x10
[  341.165809]  ? sched_clock_cpu+0xc/0xc0
[  341.166941]  ? do_vfs_ioctl+0xa2/0x6d0
[  341.167609]  do_vfs_ioctl+0xa2/0x6d0
[  341.168163]  ? do_sigaction+0x1a7/0x210
[  341.168743]  ksys_ioctl+0x3a/0x70
[  341.169266]  __x64_sys_ioctl+0x16/0x20
[  341.169836]  do_syscall_64+0x54/0x180
[  341.170758]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/3] btrfs: scrub: fix scrub_lock
  2019-02-08 17:02 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
@ 2019-02-11  8:09   ` Anand Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-02-11  8:09 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2/9/19 1:02 AM, David Sterba wrote:
> On Wed, Jan 30, 2019 at 02:44:59PM +0800, Anand Jain wrote:
>> Fixes the circular locking dependency warning as in patch 1/3,
>> and patch 2/3 adds lockdep_assert_held() to scrub_workers_get().
>> Patch 3/3 converts scrub_workers_refcnt into refcount_t.
>>
>> Anand Jain (3):
>>    btrfs: scrub: fix circular locking dependency warning
>>    btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get
>>    btrfs: scrub: convert scrub_workers_refcnt to refcount_t
> 
> The test 011 still complains, now user-after-free dected due to the
> refcount_t underflow checks.
> 
> btrfs/011               [16:50:09][  334.134707] run fstests btrfs/011 at 2019-02-08 16:50:09
> [  334.722472] BTRFS: device fsid e6112183-884c-473f-af91-217c504332b2 devid 1 transid 5 /dev/vdb
> [  334.740057] BTRFS info (device vdb): disk space caching is enabled
> [  334.742457] BTRFS info (device vdb): has skinny extents
> [  334.744286] BTRFS info (device vdb): flagging fs with big metadata feature
> [  334.766487] BTRFS info (device vdb): checking UUID tree
> [  340.960787] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdc started
> [  340.966691] ------------[ cut here ]------------
> [  340.967918] refcount_t: increment on 0; use-after-free.
> [  340.969290] WARNING: CPU: 3 PID: 24675 at lib/refcount.c:153 refcount_inc_checked+0x26/0x30
> [  340.970962] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
> [  340.973321] CPU: 3 PID: 24675 Comm: btrfs Not tainted 5.0.0-rc5-default+ #459
> [  340.975172] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
> [  340.977686] RIP: 0010:refcount_inc_checked+0x26/0x30


Ah. I didn't had the CONFIG_REFCOUNT_FULL enabled, which when 
refcount_inc will become refcount_inc_checked.

#define refcount_inc	refcount_inc_checked

And with this, the increment from 0 to 1 should be by set. There isn't
real use-after-free as such. The fix [1] is under test. Will send v5.
Thanks.

[1]
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ccec3b4bc40b..bf2294798b6c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3760,8 +3760,12 @@ static noinline_for_stack int 
scrub_workers_get(struct btrfs_fs_info *fs_info,
                                               max_active, 2);
                 if (!fs_info->scrub_parity_workers)
                         goto fail_scrub_parity_workers;
+
+               refcount_set(&fs_info->scrub_workers_refcnt, 1);
+       } else {
+               refcount_inc(&fs_info->scrub_workers_refcnt);
         }
-       refcount_inc(&fs_info->scrub_workers_refcnt);
+
         return 0;

  fail_scrub_parity_workers:



> [  340.981291] RSP: 0018:ffff976e4459bc78 EFLAGS: 00010286
> [  340.982080] RAX: 0000000000000000 RBX: ffff8b1d65fb4000 RCX: 0000000000000006
> [  340.983826] RDX: 0000000000000006 RSI: ffff8b1d74be58f8 RDI: ffff8b1d74be5000
> [  340.985504] RBP: 0000000000000006 R08: 0000008abc17cca3 R09: 0000000000000000
> [  340.987235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [  340.988823] R13: ffff8b1d65fb4000 R14: 0000000000000000 R15: ffff8b1d65fb7620
> [  340.990405] FS:  00007f68924908c0(0000) GS:ffff8b1d7da00000(0000) knlGS:0000000000000000
> [  340.992198] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  340.993689] CR2: 00007fffcc01d830 CR3: 00000000719e8000 CR4: 00000000000006e0
> [  340.995453] Call Trace:
> [  340.996198]  scrub_workers_get+0xc2/0x110 [btrfs]
> [  340.997462]  btrfs_scrub_dev+0x16f/0x5c0 [btrfs]
> [  340.998814]  ? start_transaction+0xa1/0x500 [btrfs]
> [  341.000155]  btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
> [  341.001500]  btrfs_ioctl+0x262d/0x2e10 [btrfs]
> [  341.002186]  ? __lock_acquire+0x263/0xf10
> [  341.002798]  ? kvm_clock_read+0x14/0x30
> [  341.003386]  ? kvm_sched_clock_read+0x5/0x10
> [  341.004023]  ? sched_clock+0x5/0x10
> [  341.004574]  ? sched_clock_cpu+0xc/0xc0
> [  341.005171]  ? do_vfs_ioctl+0xa2/0x6d0
> [  341.005750]  do_vfs_ioctl+0xa2/0x6d0
> [  341.006580]  ? do_sigaction+0x1a7/0x210
> [  341.007521]  ksys_ioctl+0x3a/0x70
> [  341.008273]  __x64_sys_ioctl+0x16/0x20
> [  341.009112]  do_syscall_64+0x54/0x180
> [  341.009939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  341.011166] RIP: 0033:0x7f6892585a97
> 
> [  341.015696] RSP: 002b:00007fffcc022d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  341.017384] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6892585a97
> [  341.018344] RDX: 00007fffcc0231a0 RSI: 00000000ca289435 RDI: 0000000000000003
> [  341.019964] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [  341.021533] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fffcc02524a
> [  341.023195] R13: 0000000000000001 R14: 0000000000000004 R15: 000056333a1b82a0
> [  341.025017] irq event stamp: 32590
> [  341.025812] hardirqs last  enabled at (32589): [<ffffffff960c5345>] console_unlock+0x435/0x5d0
> [  341.027987] hardirqs last disabled at (32590): [<ffffffff96001ac9>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  341.030440] softirqs last  enabled at (31906): [<ffffffff96a00364>] __do_softirq+0x364/0x47b
> [  341.032599] softirqs last disabled at (31631): [<ffffffff96065781>] irq_exit+0xd1/0xe0
> [  341.033772] ---[ end trace 01d834fdbc0fe47a ]---
> [  341.129440] ------------[ cut here ]------------
> [  341.130750] refcount_t: underflow; use-after-free.
> [  341.131746] WARNING: CPU: 0 PID: 24675 at lib/refcount.c:187 refcount_sub_and_test_checked+0x46/0x50
> [  341.133045] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
> [  341.134504] CPU: 0 PID: 24675 Comm: btrfs Tainted: G        W         5.0.0-rc5-default+ #459
> [  341.136356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
> [  341.137931] RIP: 0010:refcount_sub_and_test_checked+0x46/0x50
> [  341.142184] RSP: 0018:ffff976e4459bc98 EFLAGS: 00010282
> [  341.143343] RAX: 0000000000000000 RBX: ffff8b1d64ea6800 RCX: 0000000000000006
> [  341.144886] RDX: 0000000000000006 RSI: ffff8b1d74be58c0 RDI: ffff8b1d74be5000
> [  341.146340] RBP: ffff8b1d64329c00 R08: 0000008accffd043 R09: 0000000000000000
> [  341.147652] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b1d65fb76c0
> [  341.149023] R13: ffff8b1d65fb4000 R14: 0000000000404100 R15: 0000000000000000
> [  341.150876] FS:  00007f68924908c0(0000) GS:ffff8b1d7d400000(0000) knlGS:0000000000000000
> [  341.152551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  341.153367] CR2: 00007f535aa68d40 CR3: 00000000719e8000 CR4: 00000000000006f0
> [  341.154523] Call Trace:
> [  341.155559]  btrfs_scrub_dev+0x35c/0x5c0 [btrfs]
> [  341.156563]  ? start_transaction+0xa1/0x500 [btrfs]
> [  341.157520]  btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
> [  341.158951]  btrfs_ioctl+0x262d/0x2e10 [btrfs]
> [  341.160373]  ? __lock_acquire+0x263/0xf10
> [  341.161821]  ? kvm_clock_read+0x14/0x30
> [  341.163052]  ? kvm_sched_clock_read+0x5/0x10
> [  341.164568]  ? sched_clock+0x5/0x10
> [  341.165809]  ? sched_clock_cpu+0xc/0xc0
> [  341.166941]  ? do_vfs_ioctl+0xa2/0x6d0
> [  341.167609]  do_vfs_ioctl+0xa2/0x6d0
> [  341.168163]  ? do_sigaction+0x1a7/0x210
> [  341.168743]  ksys_ioctl+0x3a/0x70
> [  341.169266]  __x64_sys_ioctl+0x16/0x20
> [  341.169836]  do_syscall_64+0x54/0x180
> [  341.170758]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
                   ` (3 preceding siblings ...)
  2019-02-08 17:02 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
@ 2019-02-11  8:38 ` Anand Jain
  2019-02-12 17:15 ` [PATCH] btrfs: scrub: add assertions for worker pointers David Sterba
  2019-02-13 17:35 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
  6 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-02-11  8:38 UTC (permalink / raw)
  To: linux-btrfs

Use the refcount_t for fs_info::scrub_workers_refcnt instead of int.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v5: Fix refcount validation warning.
    Use refcount_set() instead of refcount_inc() when count is 0.
v4: born
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c |  2 +-
 fs/btrfs/scrub.c   | 10 +++++++---
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2399f56d8a54..d53fb37aaed0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1074,7 +1074,7 @@ struct btrfs_fs_info {
 	atomic_t scrubs_paused;
 	atomic_t scrub_cancel_req;
 	wait_queue_head_t scrub_pause_wait;
-	int scrub_workers_refcnt;
+	refcount_t scrub_workers_refcnt;
 	struct btrfs_workqueue *scrub_workers;
 	struct btrfs_workqueue *scrub_wr_completion_workers;
 	struct btrfs_workqueue *scrub_nocow_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4047867473e1..76701b21ab77 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2111,7 +2111,7 @@ static void btrfs_init_scrub(struct btrfs_fs_info *fs_info)
 	atomic_set(&fs_info->scrubs_paused, 0);
 	atomic_set(&fs_info->scrub_cancel_req, 0);
 	init_waitqueue_head(&fs_info->scrub_pause_wait);
-	fs_info->scrub_workers_refcnt = 0;
+	refcount_set(&fs_info->scrub_workers_refcnt, 0);
 }
 
 static void btrfs_init_balance(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 17925af759ae..bf2294798b6c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3743,7 +3743,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 
 	lockdep_assert_held(&fs_info->scrub_lock);
 
-	if (fs_info->scrub_workers_refcnt == 0) {
+	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
 		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
 				flags, is_dev_replace ? 1 : max_active, 4);
 		if (!fs_info->scrub_workers)
@@ -3760,8 +3760,12 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 					      max_active, 2);
 		if (!fs_info->scrub_parity_workers)
 			goto fail_scrub_parity_workers;
+
+		refcount_set(&fs_info->scrub_workers_refcnt, 1);
+	} else {
+		refcount_inc(&fs_info->scrub_workers_refcnt);
 	}
-	++fs_info->scrub_workers_refcnt;
+
 	return 0;
 
 fail_scrub_parity_workers:
@@ -3927,7 +3931,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 
 	mutex_lock(&fs_info->scrub_lock);
 	dev->scrub_ctx = NULL;
-	if (--fs_info->scrub_workers_refcnt == 0) {
+	if (refcount_dec_and_test(&fs_info->scrub_workers_refcnt)) {
 		scrub_workers = fs_info->scrub_workers;
 		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
 		scrub_parity = fs_info->scrub_parity_workers;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-01-31  6:34     ` Anand Jain
@ 2019-02-12 15:49       ` David Sterba
  0 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2019-02-12 15:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, linux-btrfs

On Thu, Jan 31, 2019 at 02:34:54PM +0800, Anand Jain wrote:
> 
> 
> On 1/30/19 10:07 PM, David Sterba wrote:
> > On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
> >> v3->v4: Fix list corruption as reported by btrfs/073 by David.
> >>             [1]
> >>             https://patchwork.kernel.org/patch/10705741/
> >> 	Which I was able to reproduce with an instrumented kernel but not with
> >> 	btrfs/073.
> >>          In v3 patch, it releases the fs_info::scrub_lock to destroy the work queue
> >> 	which raced with new scrub requests, overwriting the scrub workers
> >> 	pointers. So in v4, it kills the function scrub_workers_put(), and
> >> 	performs the destroy_workqueue in two stages, with worker pointers
> >> 	copied locally.
> > 
> >> @@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> >>   
> >>   	mutex_lock(&fs_info->scrub_lock);
> >>   	dev->scrub_ctx = NULL;
> >> -	scrub_workers_put(fs_info);
> >> +	if (--fs_info->scrub_workers_refcnt == 0) {
> >> +		scrub_workers = fs_info->scrub_workers;
> >> +		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
> >> +		scrub_parity = fs_info->scrub_parity_workers;
> >> +	}
> >>   	mutex_unlock(&fs_info->scrub_lock);
> >>   
> >> +	btrfs_destroy_workqueue(scrub_workers);
> >> +	btrfs_destroy_workqueue(scrub_wr_comp);
> >> +	btrfs_destroy_workqueue(scrub_parity);
> > 
> > https://lore.kernel.org/linux-btrfs/1543554924-17397-2-git-send-email-anand.jain@oracle.com/
> > 
> > Comparing to the previous version, it's almost the same I think. If
> > scrub_workers_get races between the unlock and destroy_workers, anything
> > that uses fs_info->scrub_wokers will soon use freed memory.
> > 
> > The difference is that the worker pointers are read from fs_info under a
> > lock but are still used outside. I haven't tested this version but from
> > the analysis of previous crash, I don't see how v4 is supposed to be
> > better.
> >
> 
> Consider v3 code as below:
> 
> When process-A is at [1] (below) start another
> btrfs scrub start, lets call it process-B.
> When process-A is at [1] it unlocks the fs_info::scrub_lock so the
> process-B can overwrite fs_info::scrub_workers,
> fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
> which the process-A at [1] has not yet called destroyed.
> 
> Process-A
> ---------
> 
> btrfs scrub start /mnt
> 
> ::
>          mutex_lock(&fs_info->scrub_lock);
> ::
>          if (dev->scrub_ctx ||
>              (!is_dev_replace &&
>               btrfs_dev_replace_is_ongoing(&fs_info->dev_replace))) {
>                  up_read(&fs_info->dev_replace.rwsem);
>                  mutex_unlock(&fs_info->scrub_lock);
>                  mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>                  ret = -EINPROGRESS;
>                  goto out_free_ctx;
>          }
> ::
>          ret = scrub_workers_get(fs_info, is_dev_replace);  <-- [2]
> ::
>          dev->scrub_ctx = sctx;
>          mutex_unlock(&fs_info->scrub_lock);
> 
> ::
>                  ret = scrub_enumerate_chunks(sctx, dev, start, end);
> ::
>          atomic_dec(&fs_info->scrubs_running);
> ::
> 
>          mutex_lock(&fs_info->scrub_lock);
>          dev->scrub_ctx = NULL;
>          scrub_workers_put(fs_info);
>          mutex_unlock(&fs_info->scrub_lock);
> 
> 
> 
> static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info 
> *fs_info)
> {
>          lockdep_assert_held(&fs_info->scrub_lock);
>          if (--fs_info->scrub_workers_refcnt == 0) {
>                  mutex_unlock(&fs_info->scrub_lock);
> 
> 		<wait for process-B>  [1]
> 
>                  btrfs_destroy_workqueue(fs_info->scrub_workers);
>  
> btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
>                  btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
>                  mutex_lock(&fs_info->scrub_lock);
>          }
>          WARN_ON(fs_info->scrub_workers_refcnt < 0);
> }
> 
> 
> 
> 
> Process-B
> ---------
> Start when process-A is at [1] (above)
> btrfs scrub start /mnt
> 
> ::
>   at [2] (above) the fs_info::scrub_workers,
> fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
> of process-A are overwritten.
> 
> 
> So in v4.
> --------
> 
> Similar to dev::scrub_ctx the fs_info::scrub_workers,
> fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
> are stored locally before fs_info::scrub_lock is released, so the
> list pointers aren't corrupted.
> 
> Hope this clarifies.

Yes, thanks. I'd like to add some assertions, to make the worker
pointers always NULL if the refcount is 0. Ie. initially they're 0 due
to kzalloc of fs_info, so the additional part would go to the block.

--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3744,17 +3744,20 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
        lockdep_assert_held(&fs_info->scrub_lock);
 
        if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
+               ASSERT(fs_info->scrub_workers == NULL);
                fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
                                flags, is_dev_replace ? 1 : max_active, 4);
                if (!fs_info->scrub_workers)
                        goto fail_scrub_workers;
 
+               ASSERT(fs_info->scrub_wr_completion_workers == NULL);
                fs_info->scrub_wr_completion_workers =
                        btrfs_alloc_workqueue(fs_info, "scrubwrc", flags,
                                              max_active, 2);
                if (!fs_info->scrub_wr_completion_workers)
                        goto fail_scrub_wr_completion_workers;
 
+               ASSERT(fs_info->scrub_parity_workers == NULL);
                fs_info->scrub_parity_workers =
                        btrfs_alloc_workqueue(fs_info, "scrubparity", flags,
                                              max_active, 2);
@@ -3934,6 +3937,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
                scrub_workers = fs_info->scrub_workers;
                scrub_wr_comp = fs_info->scrub_wr_completion_workers;
                scrub_parity = fs_info->scrub_parity_workers;
+
+               fs_info->scrub_workers = NULL;
+               fs_info->scrub_wr_completion_workers = NULL;
+               fs_info->scrub_parity_workers = NULL;
        }
        mutex_unlock(&fs_info->scrub_lock);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
  2019-01-30 14:07   ` David Sterba
@ 2019-02-12 16:45   ` David Sterba
  2019-02-13 16:02     ` Anand Jain
  1 sibling, 1 reply; 15+ messages in thread
From: David Sterba @ 2019-02-12 16:45 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
> Circular locking dependency check reports warning[1], that's because
> the btrfs_scrub_dev() calls the stack #0 below with, the
> fs_info::scrub_lock held. The test case leading to this warning..
> 
>   mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs
>   btrfs scrub start -B /btrfs
> 
> In fact we have fs_info::scrub_workers_refcnt to tack if the init and
> destroy of the scrub workers are needed. So once we have incremented
> and decremented the fs_info::scrub_workers_refcnt value in the thread,
> its ok to drop the scrub_lock, and then actually do the
> btrfs_destroy_workqueue() part. So this patch drops the scrub_lock
> before calling btrfs_destroy_workqueue().
> 
> [1]
> [   76.146826] ======================================================
> [   76.147086] WARNING: possible circular locking dependency detected
> [   76.147316] 4.20.0-rc3+ #41 Not tainted
> [   76.147489] ------------------------------------------------------
> [   76.147722] btrfs/4065 is trying to acquire lock:
> [   76.147984] 0000000038593bc0 ((wq_completion)"%s-%s""btrfs",
> name){+.+.}, at: flush_workqueue+0x70/0x4d0
> [   76.148337]
> but task is already holding lock:
> [   76.148594] 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
> btrfs_scrub_dev+0x316/0x5d0 [btrfs]
> [   76.148909]
> which lock already depends on the new lock.
> 
> [   76.149191]
> the existing dependency chain (in reverse order) is:
> [   76.149446]
> -> #3 (&fs_info->scrub_lock){+.+.}:
> [   76.149707]        btrfs_scrub_dev+0x11f/0x5d0 [btrfs]
> [   76.149924]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
> [   76.150216]        do_vfs_ioctl+0xa9/0x6d0
> [   76.150468]        ksys_ioctl+0x60/0x90
> [   76.150716]        __x64_sys_ioctl+0x16/0x20
> [   76.150911]        do_syscall_64+0x50/0x180
> [   76.151182]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   76.151469]
> -> #2 (&fs_devs->device_list_mutex){+.+.}:
> [   76.151851]        reada_start_machine_worker+0xca/0x3f0 [btrfs]
> [   76.152195]        normal_work_helper+0xf0/0x4c0 [btrfs]
> [   76.152489]        process_one_work+0x1f4/0x520
> [   76.152751]        worker_thread+0x46/0x3d0
> [   76.153715]        kthread+0xf8/0x130
> [   76.153912]        ret_from_fork+0x3a/0x50
> [   76.154178]
> -> #1 ((work_completion)(&work->normal_work)){+.+.}:
> [   76.154575]        worker_thread+0x46/0x3d0
> [   76.154828]        kthread+0xf8/0x130
> [   76.155108]        ret_from_fork+0x3a/0x50
> [   76.155357]
> -> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}:
> [   76.155751]        flush_workqueue+0x9a/0x4d0
> [   76.155911]        drain_workqueue+0xca/0x1a0
> [   76.156182]        destroy_workqueue+0x17/0x230
> [   76.156455]        btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
> [   76.156756]        scrub_workers_put+0x2e/0x60 [btrfs]
> [   76.156931]        btrfs_scrub_dev+0x329/0x5d0 [btrfs]
> [   76.157219]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
> [   76.157491]        do_vfs_ioctl+0xa9/0x6d0
> [   76.157742]        ksys_ioctl+0x60/0x90
> [   76.157910]        __x64_sys_ioctl+0x16/0x20
> [   76.158177]        do_syscall_64+0x50/0x180
> [   76.158429]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   76.158716]
> other info that might help us debug this:
> 
> [   76.158908] Chain exists of:
>   (wq_completion)"%s-%s""btrfs", name --> &fs_devs->device_list_mutex
> --> &fs_info->scrub_lock
> 
> [   76.159629]  Possible unsafe locking scenario:
> 
> [   76.160607]        CPU0                    CPU1
> [   76.160934]        ----                    ----
> [   76.161210]   lock(&fs_info->scrub_lock);
> [   76.161458]
> lock(&fs_devs->device_list_mutex);
> [   76.161805]
> lock(&fs_info->scrub_lock);
> [   76.161909]   lock((wq_completion)"%s-%s""btrfs", name);
> [   76.162201]
>  *** DEADLOCK ***

Please next time make sure the log messages are copied without
whitespace damage. In many cases it's easy to glue the lines together,
but specially the above locking scenario needs to be exact as the
indentaion matters.

I'll replace the whole report with the one in my logs as it does not
have the device_list_mutex in the locking chain. This would be confusing
as the problem with device_list_mutex was fixed already.

> 
> [   76.162627] 2 locks held by btrfs/4065:
> [   76.162897]  #0: 00000000bef2775b (sb_writers#12){.+.+}, at:
> mnt_want_write_file+0x24/0x50
> [   76.163335]  #1: 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
> btrfs_scrub_dev+0x316/0x5d0 [btrfs]
> [   76.163796]
> stack backtrace:
> [   76.163911] CPU: 1 PID: 4065 Comm: btrfs Not tainted 4.20.0-rc3+ #41
> [   76.164228] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> [   76.164646] Call Trace:
> [   76.164872]  dump_stack+0x5e/0x8b
> [   76.165128]  print_circular_bug.isra.37+0x1f1/0x1fe
> [   76.165398]  __lock_acquire+0x14aa/0x1620
> [   76.165652]  lock_acquire+0xb0/0x190
> [   76.165910]  ? flush_workqueue+0x70/0x4d0
> [   76.166175]  flush_workqueue+0x9a/0x4d0
> [   76.166420]  ? flush_workqueue+0x70/0x4d0
> [   76.166671]  ? drain_workqueue+0x52/0x1a0
> [   76.166911]  drain_workqueue+0xca/0x1a0
> [   76.167167]  destroy_workqueue+0x17/0x230
> [   76.167428]  btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
> [   76.167720]  scrub_workers_put+0x2e/0x60 [btrfs]
> [   76.168233]  btrfs_scrub_dev+0x329/0x5d0 [btrfs]
> [   76.168504]  ? __sb_start_write+0x121/0x1b0
> [   76.168759]  ? mnt_want_write_file+0x24/0x50
> [   76.169654]  btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
> [   76.169934]  ? find_held_lock+0x2d/0x90
> [   76.170204]  ? find_held_lock+0x2d/0x90
> [   76.170450]  do_vfs_ioctl+0xa9/0x6d0
> [   76.170690]  ? __fget+0x101/0x1f0
> [   76.170910]  ? __fget+0x5/0x1f0
> [   76.171157]  ksys_ioctl+0x60/0x90
> [   76.171391]  __x64_sys_ioctl+0x16/0x20
> [   76.171634]  do_syscall_64+0x50/0x180
> [   76.171892]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   76.172186] RIP: 0033:0x7f61d422e567
> [   76.172425] Code: 44 00 00 48 8b 05 29 09 2d 00 64 c7 00 26 00 00 00
> 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 08 2d 00 f7 d8 64 89 01 48

This like can be ommitted from the log unless we really want to decode
the instructions.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] btrfs: scrub: add assertions for worker pointers
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
                   ` (4 preceding siblings ...)
  2019-02-11  8:38 ` [PATCH v5 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
@ 2019-02-12 17:15 ` David Sterba
  2019-02-13 16:02   ` Anand Jain
  2019-02-13 17:35 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
  6 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2019-02-12 17:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, David Sterba

The scrub worker pointers are not NULL iff the scrub is running, so
reset them back once the last reference is dropped. Add assertions to
the initial phase of scrub to verify that.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h | 4 ++++
 fs/btrfs/scrub.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7efa1edb30cd..f92f97304e69 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1075,6 +1075,10 @@ struct btrfs_fs_info {
 	atomic_t scrubs_paused;
 	atomic_t scrub_cancel_req;
 	wait_queue_head_t scrub_pause_wait;
+	/*
+	 * The worker pointers are NULL iff the refcount is 0, ie. scrub is not
+	 * running.
+	 */
 	refcount_t scrub_workers_refcnt;
 	struct btrfs_workqueue *scrub_workers;
 	struct btrfs_workqueue *scrub_wr_completion_workers;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d20150d68a90..669bedfec4a9 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3744,17 +3744,20 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 	lockdep_assert_held(&fs_info->scrub_lock);
 
 	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
+		ASSERT(fs_info->scrub_workers == NULL);
 		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
 				flags, is_dev_replace ? 1 : max_active, 4);
 		if (!fs_info->scrub_workers)
 			goto fail_scrub_workers;
 
+		ASSERT(fs_info->scrub_wr_completion_workers == NULL);
 		fs_info->scrub_wr_completion_workers =
 			btrfs_alloc_workqueue(fs_info, "scrubwrc", flags,
 					      max_active, 2);
 		if (!fs_info->scrub_wr_completion_workers)
 			goto fail_scrub_wr_completion_workers;
 
+		ASSERT(fs_info->scrub_parity_workers == NULL);
 		fs_info->scrub_parity_workers =
 			btrfs_alloc_workqueue(fs_info, "scrubparity", flags,
 					      max_active, 2);
@@ -3934,6 +3937,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		scrub_workers = fs_info->scrub_workers;
 		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
 		scrub_parity = fs_info->scrub_parity_workers;
+
+		fs_info->scrub_workers = NULL;
+		fs_info->scrub_wr_completion_workers = NULL;
+		fs_info->scrub_parity_workers = NULL;
 	}
 	mutex_unlock(&fs_info->scrub_lock);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning
  2019-02-12 16:45   ` David Sterba
@ 2019-02-13 16:02     ` Anand Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-02-13 16:02 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2/13/19 12:45 AM, David Sterba wrote:
> On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
>> Circular locking dependency check reports warning[1], that's because
>> the btrfs_scrub_dev() calls the stack #0 below with, the
>> fs_info::scrub_lock held. The test case leading to this warning..
>>
>>    mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs
>>    btrfs scrub start -B /btrfs
>>
>> In fact we have fs_info::scrub_workers_refcnt to tack if the init and
>> destroy of the scrub workers are needed. So once we have incremented
>> and decremented the fs_info::scrub_workers_refcnt value in the thread,
>> its ok to drop the scrub_lock, and then actually do the
>> btrfs_destroy_workqueue() part. So this patch drops the scrub_lock
>> before calling btrfs_destroy_workqueue().
>>
>> [1]
>> [   76.146826] ======================================================
>> [   76.147086] WARNING: possible circular locking dependency detected
>> [   76.147316] 4.20.0-rc3+ #41 Not tainted
>> [   76.147489] ------------------------------------------------------
>> [   76.147722] btrfs/4065 is trying to acquire lock:
>> [   76.147984] 0000000038593bc0 ((wq_completion)"%s-%s""btrfs",
>> name){+.+.}, at: flush_workqueue+0x70/0x4d0
>> [   76.148337]
>> but task is already holding lock:
>> [   76.148594] 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
>> btrfs_scrub_dev+0x316/0x5d0 [btrfs]
>> [   76.148909]
>> which lock already depends on the new lock.
>>
>> [   76.149191]
>> the existing dependency chain (in reverse order) is:
>> [   76.149446]
>> -> #3 (&fs_info->scrub_lock){+.+.}:
>> [   76.149707]        btrfs_scrub_dev+0x11f/0x5d0 [btrfs]
>> [   76.149924]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
>> [   76.150216]        do_vfs_ioctl+0xa9/0x6d0
>> [   76.150468]        ksys_ioctl+0x60/0x90
>> [   76.150716]        __x64_sys_ioctl+0x16/0x20
>> [   76.150911]        do_syscall_64+0x50/0x180
>> [   76.151182]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [   76.151469]
>> -> #2 (&fs_devs->device_list_mutex){+.+.}:
>> [   76.151851]        reada_start_machine_worker+0xca/0x3f0 [btrfs]
>> [   76.152195]        normal_work_helper+0xf0/0x4c0 [btrfs]
>> [   76.152489]        process_one_work+0x1f4/0x520
>> [   76.152751]        worker_thread+0x46/0x3d0
>> [   76.153715]        kthread+0xf8/0x130
>> [   76.153912]        ret_from_fork+0x3a/0x50
>> [   76.154178]
>> -> #1 ((work_completion)(&work->normal_work)){+.+.}:
>> [   76.154575]        worker_thread+0x46/0x3d0
>> [   76.154828]        kthread+0xf8/0x130
>> [   76.155108]        ret_from_fork+0x3a/0x50
>> [   76.155357]
>> -> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}:
>> [   76.155751]        flush_workqueue+0x9a/0x4d0
>> [   76.155911]        drain_workqueue+0xca/0x1a0
>> [   76.156182]        destroy_workqueue+0x17/0x230
>> [   76.156455]        btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
>> [   76.156756]        scrub_workers_put+0x2e/0x60 [btrfs]
>> [   76.156931]        btrfs_scrub_dev+0x329/0x5d0 [btrfs]
>> [   76.157219]        btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
>> [   76.157491]        do_vfs_ioctl+0xa9/0x6d0
>> [   76.157742]        ksys_ioctl+0x60/0x90
>> [   76.157910]        __x64_sys_ioctl+0x16/0x20
>> [   76.158177]        do_syscall_64+0x50/0x180
>> [   76.158429]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [   76.158716]
>> other info that might help us debug this:
>>
>> [   76.158908] Chain exists of:
>>    (wq_completion)"%s-%s""btrfs", name --> &fs_devs->device_list_mutex
>> --> &fs_info->scrub_lock
>>
>> [   76.159629]  Possible unsafe locking scenario:
>>
>> [   76.160607]        CPU0                    CPU1
>> [   76.160934]        ----                    ----
>> [   76.161210]   lock(&fs_info->scrub_lock);
>> [   76.161458]
>> lock(&fs_devs->device_list_mutex);
>> [   76.161805]
>> lock(&fs_info->scrub_lock);
>> [   76.161909]   lock((wq_completion)"%s-%s""btrfs", name);
>> [   76.162201]
>>   *** DEADLOCK ***
> 
> Please next time make sure the log messages are copied without
> whitespace damage. In many cases it's easy to glue the lines together,
> but specially the above locking scenario needs to be exact as the
> indentaion matters.
> 
> I'll replace the whole report with the one in my logs as it does not
> have the device_list_mutex in the locking chain. This would be confusing
> as the problem with device_list_mutex was fixed already.

  ok. Thanks.

-Anand

>>
>> [   76.162627] 2 locks held by btrfs/4065:
>> [   76.162897]  #0: 00000000bef2775b (sb_writers#12){.+.+}, at:
>> mnt_want_write_file+0x24/0x50
>> [   76.163335]  #1: 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at:
>> btrfs_scrub_dev+0x316/0x5d0 [btrfs]
>> [   76.163796]
>> stack backtrace:
>> [   76.163911] CPU: 1 PID: 4065 Comm: btrfs Not tainted 4.20.0-rc3+ #41
>> [   76.164228] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>> VirtualBox 12/01/2006
>> [   76.164646] Call Trace:
>> [   76.164872]  dump_stack+0x5e/0x8b
>> [   76.165128]  print_circular_bug.isra.37+0x1f1/0x1fe
>> [   76.165398]  __lock_acquire+0x14aa/0x1620
>> [   76.165652]  lock_acquire+0xb0/0x190
>> [   76.165910]  ? flush_workqueue+0x70/0x4d0
>> [   76.166175]  flush_workqueue+0x9a/0x4d0
>> [   76.166420]  ? flush_workqueue+0x70/0x4d0
>> [   76.166671]  ? drain_workqueue+0x52/0x1a0
>> [   76.166911]  drain_workqueue+0xca/0x1a0
>> [   76.167167]  destroy_workqueue+0x17/0x230
>> [   76.167428]  btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
>> [   76.167720]  scrub_workers_put+0x2e/0x60 [btrfs]
>> [   76.168233]  btrfs_scrub_dev+0x329/0x5d0 [btrfs]
>> [   76.168504]  ? __sb_start_write+0x121/0x1b0
>> [   76.168759]  ? mnt_want_write_file+0x24/0x50
>> [   76.169654]  btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
>> [   76.169934]  ? find_held_lock+0x2d/0x90
>> [   76.170204]  ? find_held_lock+0x2d/0x90
>> [   76.170450]  do_vfs_ioctl+0xa9/0x6d0
>> [   76.170690]  ? __fget+0x101/0x1f0
>> [   76.170910]  ? __fget+0x5/0x1f0
>> [   76.171157]  ksys_ioctl+0x60/0x90
>> [   76.171391]  __x64_sys_ioctl+0x16/0x20
>> [   76.171634]  do_syscall_64+0x50/0x180
>> [   76.171892]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [   76.172186] RIP: 0033:0x7f61d422e567
>> [   76.172425] Code: 44 00 00 48 8b 05 29 09 2d 00 64 c7 00 26 00 00 00
>> 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 08 2d 00 f7 d8 64 89 01 48
> 
> This like can be ommitted from the log unless we really want to decode
> the instructions.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] btrfs: scrub: add assertions for worker pointers
  2019-02-12 17:15 ` [PATCH] btrfs: scrub: add assertions for worker pointers David Sterba
@ 2019-02-13 16:02   ` Anand Jain
  0 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2019-02-13 16:02 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 2/13/19 1:15 AM, David Sterba wrote:
> The scrub worker pointers are not NULL iff the scrub is running, so
> reset them back once the last reference is dropped. Add assertions to
> the initial phase of scrub to verify that.

makes sense.

> Signed-off-by: David Sterba <dsterba@suse.com>


Reviewed-by: Anand Jain <anand.jain@oracle.com>



> ---
>   fs/btrfs/ctree.h | 4 ++++
>   fs/btrfs/scrub.c | 7 +++++++
>   2 files changed, 11 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7efa1edb30cd..f92f97304e69 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1075,6 +1075,10 @@ struct btrfs_fs_info {
>   	atomic_t scrubs_paused;
>   	atomic_t scrub_cancel_req;
>   	wait_queue_head_t scrub_pause_wait;
> +	/*
> +	 * The worker pointers are NULL iff the refcount is 0, ie. scrub is not
> +	 * running.
> +	 */
>   	refcount_t scrub_workers_refcnt;
>   	struct btrfs_workqueue *scrub_workers;
>   	struct btrfs_workqueue *scrub_wr_completion_workers;
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index d20150d68a90..669bedfec4a9 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3744,17 +3744,20 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
>   	lockdep_assert_held(&fs_info->scrub_lock);
>   
>   	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
> +		ASSERT(fs_info->scrub_workers == NULL);
>   		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
>   				flags, is_dev_replace ? 1 : max_active, 4);
>   		if (!fs_info->scrub_workers)
>   			goto fail_scrub_workers;
>   
> +		ASSERT(fs_info->scrub_wr_completion_workers == NULL);
>   		fs_info->scrub_wr_completion_workers =
>   			btrfs_alloc_workqueue(fs_info, "scrubwrc", flags,
>   					      max_active, 2);
>   		if (!fs_info->scrub_wr_completion_workers)
>   			goto fail_scrub_wr_completion_workers;
>   
> +		ASSERT(fs_info->scrub_parity_workers == NULL);
>   		fs_info->scrub_parity_workers =
>   			btrfs_alloc_workqueue(fs_info, "scrubparity", flags,
>   					      max_active, 2);
> @@ -3934,6 +3937,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>   		scrub_workers = fs_info->scrub_workers;
>   		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
>   		scrub_parity = fs_info->scrub_parity_workers;
> +
> +		fs_info->scrub_workers = NULL;
> +		fs_info->scrub_wr_completion_workers = NULL;
> +		fs_info->scrub_parity_workers = NULL;
>   	}
>   	mutex_unlock(&fs_info->scrub_lock);
>   
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/3] btrfs: scrub: fix scrub_lock
  2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
                   ` (5 preceding siblings ...)
  2019-02-12 17:15 ` [PATCH] btrfs: scrub: add assertions for worker pointers David Sterba
@ 2019-02-13 17:35 ` David Sterba
  6 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2019-02-13 17:35 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Jan 30, 2019 at 02:44:59PM +0800, Anand Jain wrote:
> Fixes the circular locking dependency warning as in patch 1/3,
> and patch 2/3 adds lockdep_assert_held() to scrub_workers_get().
> Patch 3/3 converts scrub_workers_refcnt into refcount_t.
> 
> Anand Jain (3):
>   btrfs: scrub: fix circular locking dependency warning
>   btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get
>   btrfs: scrub: convert scrub_workers_refcnt to refcount_t

1-3 + my patch adding the assertions is now in misc-next, will be in
5.1. Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30  6:44 [PATCH v4 0/3] btrfs: scrub: fix scrub_lock Anand Jain
2019-01-30  6:45 ` [PATCH v4 1/3] btrfs: scrub: fix circular locking dependency warning Anand Jain
2019-01-30 14:07   ` David Sterba
2019-01-31  6:34     ` Anand Jain
2019-02-12 15:49       ` David Sterba
2019-02-12 16:45   ` David Sterba
2019-02-13 16:02     ` Anand Jain
2019-01-30  6:45 ` [PATCH v4 2/3] btrfs: scrub: add scrub_lock lockdep check in scrub_workers_get Anand Jain
2019-01-30  6:45 ` [PATCH v4 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
2019-02-08 17:02 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba
2019-02-11  8:09   ` Anand Jain
2019-02-11  8:38 ` [PATCH v5 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t Anand Jain
2019-02-12 17:15 ` [PATCH] btrfs: scrub: add assertions for worker pointers David Sterba
2019-02-13 16:02   ` Anand Jain
2019-02-13 17:35 ` [PATCH v4 0/3] btrfs: scrub: fix scrub_lock David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox