From: Anand Jain <anand.jain@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com
Subject: [PATCH] btrfs: fix lockdep warning chunk_mutex vs device_list_mutex
Date: Thu, 14 May 2020 03:46:59 +0800 [thread overview]
Message-ID: <20200513194659.34493-1-anand.jain@oracle.com> (raw)
In-Reply-To: <52b6ff4c2da5838393f5bd754310cfa6abcfcc7b3efb3c63c8d95824cb163d6d.dsterba@suse.com>
ABBA locking order lockdep warning reported during fstests btrfs/161 as
below.
[ 5174.262652] WARNING: possible circular locking dependency detected
[ 5174.264662] 5.7.0-rc3-default+ #1094 Not tainted
[ 5174.266245] ------------------------------------------------------
[ 5174.268215] mount/30761 is trying to acquire lock:
[ 5174.269838] ffff8d950e4164e8 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x3f/0x170 [btrfs]
[ 5174.272880]
[ 5174.272880] but task is already holding lock:
[ 5174.275081] ffff8d952ae80980 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x5a/0x2a0 [btrfs]
[ 5174.278232]
[ 5174.278232] which lock already depends on the new lock.
[ 5174.278232]
[ 5174.281372]
[ 5174.281372] the existing dependency chain (in reverse order) is:
[ 5174.283784]
[ 5174.283784] -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
[ 5174.286134] __lock_acquire+0x581/0xae0
[ 5174.287563] lock_acquire+0xa3/0x400
[ 5174.289033] __mutex_lock+0xa0/0xaf0
[ 5174.290488] btrfs_init_new_device+0x316/0x12f0 [btrfs]
[ 5174.292209] btrfs_ioctl+0xc3c/0x2590 [btrfs]
[ 5174.293673] ksys_ioctl+0x68/0xa0
[ 5174.294883] __x64_sys_ioctl+0x16/0x20
[ 5174.296231] do_syscall_64+0x50/0x210
[ 5174.297548] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 5174.299278]
[ 5174.299278] -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
[ 5174.301760] check_prev_add+0x98/0xa20
[ 5174.303219] validate_chain+0xa6c/0x29e0
[ 5174.304770] __lock_acquire+0x581/0xae0
[ 5174.306274] lock_acquire+0xa3/0x400
[ 5174.307716] __mutex_lock+0xa0/0xaf0
[ 5174.309145] clone_fs_devices+0x3f/0x170 [btrfs]
[ 5174.310757] read_one_dev+0xc4/0x500 [btrfs]
[ 5174.312293] btrfs_read_chunk_tree+0x202/0x2a0 [btrfs]
[ 5174.313946] open_ctree+0x7a3/0x10db [btrfs]
[ 5174.315411] btrfs_mount_root.cold+0xe/0xcc [btrfs]
[ 5174.317122] legacy_get_tree+0x2d/0x60
[ 5174.318543] vfs_get_tree+0x1d/0xb0
[ 5174.319844] fc_mount+0xe/0x40
[ 5174.321122] vfs_kern_mount.part.0+0x71/0x90
[ 5174.322688] btrfs_mount+0x147/0x3e0 [btrfs]
[ 5174.324250] legacy_get_tree+0x2d/0x60
[ 5174.325644] vfs_get_tree+0x1d/0xb0
[ 5174.326978] do_mount+0x7d5/0xa40
[ 5174.328294] __x64_sys_mount+0x8e/0xd0
[ 5174.329829] do_syscall_64+0x50/0x210
[ 5174.331260] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 5174.333102]
[ 5174.333102] other info that might help us debug this:
[ 5174.333102]
[ 5174.335988] Possible unsafe locking scenario:
[ 5174.335988]
[ 5174.338051] CPU0 CPU1
[ 5174.339490] ---- ----
[ 5174.340810] lock(&fs_info->chunk_mutex);
[ 5174.342203] lock(&fs_devs->device_list_mutex);
[ 5174.344228] lock(&fs_info->chunk_mutex);
[ 5174.346161] lock(&fs_devs->device_list_mutex);
[ 5174.347666]
[ 5174.347666] *** DEADLOCK ***
The test case btrfs/161 creates seed device and adds sprout device to
it using ioctl.
The ioctl thread which adds the sprout device to the mounted seed device
calls btrfs_prepare_sprout() and holds the lock in the following order.
mutex_lock(&fs_devices->device_list_mutex);
mutex_lock(&fs_info->chunk_mutex);
The add thread still in the CPU#1 however as the test case would the mount
the sprout device this time is on CPU#0. As it is mounting the sprout
device btrfs_read_chunk_tree() it calls clone_fs_devices() establishing
the lock order.
mutex_lock(&fs_info->chunk_mutex);
mutex_lock(&fs_devices->device_list_mutex);
But the address of these two fs_devices and fs_info are different
so this ABBA warning won't be true in real.
However as per our design the chunk_mutex and device_list_mutex must
follow the locking order of the former thread so there is something
to fix.
volume.c:
279 * Lock nesting
280 * ============
281 *
282 * uuid_mutex
283 * device_list_mutex
284 * chunk_mutex
To fix the idea is to move out the lock from clone_fs_devices() to
its call chain up at btrfs_read_chunk_tree().
btrfs_read_chunk_tree()
+ mutex_lock(&orig->device_list_mutex);
mutex_lock(&fs_info->chunk_mutex);
read_one_dev(leaf, dev_item); (single parent fn)
open_seed_devices(fs_info, fs_uuid); (single parent fn)
clone_fs_devices(fs_devices); (also called by btrfs_prepare_sprout only)
- mutex_lock(&orig->device_list_mutex);
As clone_fs_devices() is also called by btrfs_prepare_sprout() add the
required device_list_mutex there.
btrfs_prepare_sprout()
+ mutex_lock(&fs_devices->device_list_mutex);
clone_fs_devices()
+ mutex_unlock(&fs_devices->device_list_mutex);
Next, in btrfs_read_chunk_tree() there are several helper functions
which are now under device_list_mutex which are fine to be unless
there is already device_list_mutex lock some other incompatible lock.
So each of those helper functions were checked as below, and they
don't hold any locks.
open_seed_devices() ok
find_fsid() ok
clone_fs_devices() ok with fix
open_fs_devices() ok
free_fs_devices() ok
btrfs_find_device() ok
add_missing_dev() ok
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
Testing:
fstests groups volume and seed ran fine.
A full list of tests just started.
fs/btrfs/volumes.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 60ab41c12e50..ebc8565d0f73 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -984,7 +984,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
if (IS_ERR(fs_devices))
return fs_devices;
- mutex_lock(&orig->device_list_mutex);
fs_devices->total_devices = orig->total_devices;
list_for_each_entry(orig_dev, &orig->devices, dev_list) {
@@ -1016,10 +1015,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
device->fs_devices = fs_devices;
fs_devices->num_devices++;
}
- mutex_unlock(&orig->device_list_mutex);
return fs_devices;
error:
- mutex_unlock(&orig->device_list_mutex);
free_fs_devices(fs_devices);
return ERR_PTR(ret);
}
@@ -2363,11 +2360,14 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
if (IS_ERR(seed_devices))
return PTR_ERR(seed_devices);
+ mutex_lock(&fs_devices->device_list_mutex);
old_devices = clone_fs_devices(fs_devices);
if (IS_ERR(old_devices)) {
+ mutex_unlock(&fs_devices->device_list_mutex);
kfree(seed_devices);
return PTR_ERR(old_devices);
}
+ mutex_unlock(&fs_devices->device_list_mutex);
list_add(&old_devices->fs_list, &fs_uuids);
@@ -7049,6 +7049,7 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
* otherwise we don't need it.
*/
mutex_lock(&uuid_mutex);
+ mutex_lock(&fs_info->fs_devices->device_list_mutex);
mutex_lock(&fs_info->chunk_mutex);
/*
@@ -7117,6 +7118,7 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
ret = 0;
error:
mutex_unlock(&fs_info->chunk_mutex);
+ mutex_unlock(&fs_info->fs_devices->device_list_mutex);
mutex_unlock(&uuid_mutex);
btrfs_free_path(path);
--
2.25.1
next prev parent reply other threads:[~2020-05-13 19:47 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-12 14:12 Pending bugs for 5.7 David Sterba
2020-05-12 14:14 ` Bug 5.7-rc: root leak, eb leak David Sterba
2020-05-12 23:03 ` David Sterba
2020-05-13 11:54 ` Johannes Thumshirn
2020-05-13 11:57 ` Qu Wenruo
2020-05-13 12:06 ` Johannes Thumshirn
2020-05-13 12:11 ` Qu Wenruo
2020-05-13 12:17 ` Johannes Thumshirn
2020-05-13 12:29 ` Johannes Thumshirn
2020-05-12 14:15 ` Bug 5.7-rc: write-time leaf corruption detected David Sterba
2020-05-12 14:26 ` Filipe Manana
2020-05-13 3:10 ` Qu Wenruo
2020-05-13 3:17 ` Qu Wenruo
2020-05-13 9:25 ` Filipe Manana
2020-05-19 14:26 ` Bug 5.7-rc: write-time leaf corruption detected (fixed) David Sterba
2020-05-12 14:15 ` Bug 5.7-rc: lockdep warning, chunk_mutex/device_list_mutex David Sterba
2020-05-12 23:28 ` David Sterba
2020-05-12 19:25 ` Anand Jain
2020-05-13 19:46 ` Anand Jain [this message]
2020-05-15 17:40 ` [PATCH] btrfs: fix lockdep warning chunk_mutex vs device_list_mutex David Sterba
2020-05-16 3:43 ` Anand Jain
2020-05-18 11:07 ` Anand Jain
2020-05-18 15:28 ` David Sterba
2020-05-12 14:15 ` Bug 5.7-rc: lockdep warning, fs_reclaim David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200513194659.34493-1-anand.jain@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).