All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com
Subject: [PATCH] btrfs: fix lockdep warning chunk_mutex vs device_list_mutex
Date: Thu, 14 May 2020 03:46:59 +0800	[thread overview]
Message-ID: <20200513194659.34493-1-anand.jain@oracle.com> (raw)
In-Reply-To: <52b6ff4c2da5838393f5bd754310cfa6abcfcc7b3efb3c63c8d95824cb163d6d.dsterba@suse.com>

ABBA locking order lockdep warning reported during fstests btrfs/161 as
below.

[ 5174.262652] WARNING: possible circular locking dependency detected
[ 5174.264662] 5.7.0-rc3-default+ #1094 Not tainted
[ 5174.266245] ------------------------------------------------------
[ 5174.268215] mount/30761 is trying to acquire lock:
[ 5174.269838] ffff8d950e4164e8 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x3f/0x170 [btrfs]
[ 5174.272880]
[ 5174.272880] but task is already holding lock:
[ 5174.275081] ffff8d952ae80980 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x5a/0x2a0 [btrfs]
[ 5174.278232]
[ 5174.278232] which lock already depends on the new lock.
[ 5174.278232]
[ 5174.281372]
[ 5174.281372] the existing dependency chain (in reverse order) is:
[ 5174.283784]
[ 5174.283784] -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
[ 5174.286134]        __lock_acquire+0x581/0xae0
[ 5174.287563]        lock_acquire+0xa3/0x400
[ 5174.289033]        __mutex_lock+0xa0/0xaf0
[ 5174.290488]        btrfs_init_new_device+0x316/0x12f0 [btrfs]
[ 5174.292209]        btrfs_ioctl+0xc3c/0x2590 [btrfs]
[ 5174.293673]        ksys_ioctl+0x68/0xa0
[ 5174.294883]        __x64_sys_ioctl+0x16/0x20
[ 5174.296231]        do_syscall_64+0x50/0x210
[ 5174.297548]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 5174.299278]
[ 5174.299278] -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
[ 5174.301760]        check_prev_add+0x98/0xa20
[ 5174.303219]        validate_chain+0xa6c/0x29e0
[ 5174.304770]        __lock_acquire+0x581/0xae0
[ 5174.306274]        lock_acquire+0xa3/0x400
[ 5174.307716]        __mutex_lock+0xa0/0xaf0
[ 5174.309145]        clone_fs_devices+0x3f/0x170 [btrfs]
[ 5174.310757]        read_one_dev+0xc4/0x500 [btrfs]
[ 5174.312293]        btrfs_read_chunk_tree+0x202/0x2a0 [btrfs]
[ 5174.313946]        open_ctree+0x7a3/0x10db [btrfs]
[ 5174.315411]        btrfs_mount_root.cold+0xe/0xcc [btrfs]
[ 5174.317122]        legacy_get_tree+0x2d/0x60
[ 5174.318543]        vfs_get_tree+0x1d/0xb0
[ 5174.319844]        fc_mount+0xe/0x40
[ 5174.321122]        vfs_kern_mount.part.0+0x71/0x90
[ 5174.322688]        btrfs_mount+0x147/0x3e0 [btrfs]
[ 5174.324250]        legacy_get_tree+0x2d/0x60
[ 5174.325644]        vfs_get_tree+0x1d/0xb0
[ 5174.326978]        do_mount+0x7d5/0xa40
[ 5174.328294]        __x64_sys_mount+0x8e/0xd0
[ 5174.329829]        do_syscall_64+0x50/0x210
[ 5174.331260]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 5174.333102]
[ 5174.333102] other info that might help us debug this:
[ 5174.333102]
[ 5174.335988]  Possible unsafe locking scenario:
[ 5174.335988]
[ 5174.338051]        CPU0                    CPU1
[ 5174.339490]        ----                    ----
[ 5174.340810]   lock(&fs_info->chunk_mutex);
[ 5174.342203]                                lock(&fs_devs->device_list_mutex);
[ 5174.344228]                                lock(&fs_info->chunk_mutex);
[ 5174.346161]   lock(&fs_devs->device_list_mutex);
[ 5174.347666]
[ 5174.347666]  *** DEADLOCK ***

The test case btrfs/161 creates seed device and adds sprout device to
it using ioctl.

The ioctl thread which adds the sprout device to the mounted seed device
calls btrfs_prepare_sprout() and holds the lock in the following order.

 mutex_lock(&fs_devices->device_list_mutex);
 mutex_lock(&fs_info->chunk_mutex);

The add thread still in the CPU#1 however as the test case would the mount
the sprout device this time is on CPU#0. As it is mounting the sprout
device btrfs_read_chunk_tree() it calls clone_fs_devices() establishing
the lock order.

 mutex_lock(&fs_info->chunk_mutex);
 mutex_lock(&fs_devices->device_list_mutex);

But the address of these two fs_devices and fs_info are different
so this ABBA warning won't be true in real.

However as per our design the chunk_mutex and device_list_mutex must
follow the locking order of the former thread so there is something
to fix.

volume.c:
 279  * Lock nesting
 280  * ============
 281  *
 282  * uuid_mutex
 283  *     device_list_mutex
 284  *       chunk_mutex

To fix the idea is to move out the lock from clone_fs_devices() to
its call chain up at btrfs_read_chunk_tree().

btrfs_read_chunk_tree()
+  mutex_lock(&orig->device_list_mutex);
   mutex_lock(&fs_info->chunk_mutex);
   read_one_dev(leaf, dev_item); (single parent fn)
     open_seed_devices(fs_info, fs_uuid); (single parent fn)
       clone_fs_devices(fs_devices);  (also called by btrfs_prepare_sprout only)
-        mutex_lock(&orig->device_list_mutex);

As clone_fs_devices() is also called by btrfs_prepare_sprout() add the
required device_list_mutex there.

btrfs_prepare_sprout()
+  mutex_lock(&fs_devices->device_list_mutex);
   clone_fs_devices()
+  mutex_unlock(&fs_devices->device_list_mutex);

Next, in btrfs_read_chunk_tree() there are several helper functions
which are now under device_list_mutex which are fine to be unless
there is already device_list_mutex lock some other incompatible lock.
So each of those helper functions were checked as below, and they
don't hold any locks.

open_seed_devices() ok
find_fsid() ok
clone_fs_devices() ok with fix
open_fs_devices() ok
free_fs_devices() ok
btrfs_find_device() ok
add_missing_dev() ok

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---

Testing:
fstests groups volume and seed ran fine.
A full list of tests just started.

 fs/btrfs/volumes.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 60ab41c12e50..ebc8565d0f73 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -984,7 +984,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
-	mutex_lock(&orig->device_list_mutex);
 	fs_devices->total_devices = orig->total_devices;
 
 	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
@@ -1016,10 +1015,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 		device->fs_devices = fs_devices;
 		fs_devices->num_devices++;
 	}
-	mutex_unlock(&orig->device_list_mutex);
 	return fs_devices;
 error:
-	mutex_unlock(&orig->device_list_mutex);
 	free_fs_devices(fs_devices);
 	return ERR_PTR(ret);
 }
@@ -2363,11 +2360,14 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	if (IS_ERR(seed_devices))
 		return PTR_ERR(seed_devices);
 
+	mutex_lock(&fs_devices->device_list_mutex);
 	old_devices = clone_fs_devices(fs_devices);
 	if (IS_ERR(old_devices)) {
+		mutex_unlock(&fs_devices->device_list_mutex);
 		kfree(seed_devices);
 		return PTR_ERR(old_devices);
 	}
+	mutex_unlock(&fs_devices->device_list_mutex);
 
 	list_add(&old_devices->fs_list, &fs_uuids);
 
@@ -7049,6 +7049,7 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	 * otherwise we don't need it.
 	 */
 	mutex_lock(&uuid_mutex);
+	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	mutex_lock(&fs_info->chunk_mutex);
 
 	/*
@@ -7117,6 +7118,7 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	ret = 0;
 error:
 	mutex_unlock(&fs_info->chunk_mutex);
+	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 	mutex_unlock(&uuid_mutex);
 
 	btrfs_free_path(path);
-- 
2.25.1


  parent reply	other threads:[~2020-05-13 19:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12 14:12 Pending bugs for 5.7 David Sterba
2020-05-12 14:14 ` Bug 5.7-rc: root leak, eb leak David Sterba
2020-05-12 23:03   ` David Sterba
2020-05-13 11:54     ` Johannes Thumshirn
2020-05-13 11:57       ` Qu Wenruo
2020-05-13 12:06         ` Johannes Thumshirn
2020-05-13 12:11           ` Qu Wenruo
2020-05-13 12:17             ` Johannes Thumshirn
2020-05-13 12:29               ` Johannes Thumshirn
2020-05-12 14:15 ` Bug 5.7-rc: write-time leaf corruption detected David Sterba
2020-05-12 14:26   ` Filipe Manana
2020-05-13  3:10   ` Qu Wenruo
2020-05-13  3:17     ` Qu Wenruo
2020-05-13  9:25     ` Filipe Manana
2020-05-19 14:26   ` Bug 5.7-rc: write-time leaf corruption detected (fixed) David Sterba
2020-05-12 14:15 ` Bug 5.7-rc: lockdep warning, chunk_mutex/device_list_mutex David Sterba
2020-05-12 23:28   ` David Sterba
2020-05-12 19:25     ` Anand Jain
2020-05-13 19:46   ` Anand Jain [this message]
2020-05-15 17:40     ` [PATCH] btrfs: fix lockdep warning chunk_mutex vs device_list_mutex David Sterba
2020-05-16  3:43       ` Anand Jain
2020-05-18 11:07         ` Anand Jain
2020-05-18 15:28           ` David Sterba
2020-05-12 14:15 ` Bug 5.7-rc: lockdep warning, fs_reclaim David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513194659.34493-1-anand.jain@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.