linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Device fixes and cleanups
@ 2017-10-31 17:44 David Sterba
  2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Cleanups for device locking and documentation of the various mutexes.
Patch 1 is a bugfix that could be considered for 4.15 as we could leak bios
under some rare condition, but so far this hasn't been reported so I don't rate
is as critical.

David Sterba (11):
  btrfs: add missing device::flush_bio puts
  btrfs: rename device free rcu helper to free_device_rcu
  btrfs: introduce free_device helper
  btrfs: use free_device where opencoded
  btrfs: simplify exit paths in btrfs_init_new_device
  btrfs: document device locking
  btrfs: dev_alloc_list is not protected by RCU, use normal list_del
  btrfs: simplify btrfs_close_bdev
  btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info
  btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info
  btrfs: use non-RCU list traversal in write_all_supers callees

 fs/btrfs/disk-io.c |   9 ++--
 fs/btrfs/ioctl.c   |  15 +++---
 fs/btrfs/volumes.c | 134 ++++++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 114 insertions(+), 44 deletions(-)

-- 
2.14.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/11] btrfs: add missing device::flush_bio puts
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-11-02  9:41   ` Nikolay Borisov
  2017-11-02 10:40   ` Anand Jain
  2017-10-31 17:44 ` [PATCH 02/11] btrfs: rename device free rcu helper to free_device_rcu David Sterba
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This fixes potential bio leaks, in several error paths. Unfortunatelly
the device structure freeing is opencoded in many places and I missed
them when introducing the flush_bio.

Most of the time, devices get freed through call_rcu(..., free_device),
so it at least it's not that easy to hit the leak, but it's still
possible through the path that frees stale devices.

Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ea8b20839ac0..08fb4b5609b7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -189,6 +189,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 				    struct btrfs_device, dev_list);
 		list_del(&device->dev_list);
 		rcu_string_free(device->name);
+		bio_put(device->flush_bio);
 		kfree(device);
 	}
 	kfree(fs_devices);
@@ -578,6 +579,7 @@ static void btrfs_free_stale_device(struct btrfs_device *cur_dev)
 				fs_devs->num_devices--;
 				list_del(&dev->dev_list);
 				rcu_string_free(dev->name);
+				bio_put(dev->flush_bio);
 				kfree(dev);
 			}
 			break;
@@ -630,6 +632,7 @@ static noinline int device_list_add(const char *path,
 
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name) {
+			bio_put(device->flush_bio);
 			kfree(device);
 			return -ENOMEM;
 		}
@@ -742,6 +745,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 			name = rcu_string_strdup(orig_dev->name->str,
 					GFP_KERNEL);
 			if (!name) {
+				bio_put(device->flush_bio);
 				kfree(device);
 				goto error;
 			}
@@ -807,6 +811,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 		list_del_init(&device->dev_list);
 		fs_devices->num_devices--;
 		rcu_string_free(device->name);
+		bio_put(device->flush_bio);
 		kfree(device);
 	}
 
@@ -2337,6 +2342,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
+		bio_put(device->flush_bio);
 		kfree(device);
 		ret = -ENOMEM;
 		goto error;
@@ -2346,6 +2352,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	trans = btrfs_start_transaction(root, 0);
 	if (IS_ERR(trans)) {
 		rcu_string_free(device->name);
+		bio_put(device->flush_bio);
 		kfree(device);
 		ret = PTR_ERR(trans);
 		goto error;
@@ -2489,6 +2496,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	if (trans)
 		btrfs_end_transaction(trans);
 	rcu_string_free(device->name);
+	bio_put(device->flush_bio);
 	kfree(device);
 error:
 	blkdev_put(bdev, FMODE_EXCL);
@@ -2555,6 +2563,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
+		bio_put(device->flush_bio);
 		kfree(device);
 		ret = -ENOMEM;
 		goto error;
@@ -6271,6 +6280,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 
 		ret = find_next_devid(fs_info, &tmp);
 		if (ret) {
+			bio_put(dev->flush_bio);
 			kfree(dev);
 			return ERR_PTR(ret);
 		}
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/11] btrfs: rename device free rcu helper to free_device_rcu
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
  2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 03/11] btrfs: introduce free_device helper David Sterba
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Make it clear that it is an RCU helper, we want to use the name
free_device for a wrapper freeing all device members.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 08fb4b5609b7..ab03e712aa49 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -825,7 +825,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 	mutex_unlock(&uuid_mutex);
 }
 
-static void free_device(struct rcu_head *head)
+static void free_device_rcu(struct rcu_head *head)
 {
 	struct btrfs_device *device;
 
@@ -907,7 +907,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 				struct btrfs_device, dev_list);
 		list_del(&device->dev_list);
 		btrfs_close_bdev(device);
-		call_rcu(&device->rcu, free_device);
+		call_rcu(&device->rcu, free_device_rcu);
 	}
 
 	WARN_ON(fs_devices->open_devices);
@@ -1938,7 +1938,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		btrfs_scratch_superblocks(device->bdev, device->name->str);
 
 	btrfs_close_bdev(device);
-	call_rcu(&device->rcu, free_device);
+	call_rcu(&device->rcu, free_device_rcu);
 
 	if (cur_devices->open_devices == 0) {
 		struct btrfs_fs_devices *fs_devices;
@@ -2009,7 +2009,7 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_fs_info *fs_info,
 	}
 
 	btrfs_close_bdev(srcdev);
-	call_rcu(&srcdev->rcu, free_device);
+	call_rcu(&srcdev->rcu, free_device_rcu);
 
 	/* if this is no devs we rather delete the fs_devices */
 	if (!fs_devices->num_devices) {
@@ -2068,7 +2068,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	btrfs_scratch_superblocks(tgtdev->bdev, tgtdev->name->str);
 
 	btrfs_close_bdev(tgtdev);
-	call_rcu(&tgtdev->rcu, free_device);
+	call_rcu(&tgtdev->rcu, free_device_rcu);
 }
 
 static int btrfs_find_device_by_path(struct btrfs_fs_info *fs_info,
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/11] btrfs: introduce free_device helper
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
  2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
  2017-10-31 17:44 ` [PATCH 02/11] btrfs: rename device free rcu helper to free_device_rcu David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 04/11] btrfs: use free_device where opencoded David Sterba
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

A helper to free a device and all it's dynamically allocated members,
like the rcu_string name or flush_bio. This is going to replace all
open coded places.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ab03e712aa49..35c546aade83 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -180,6 +180,13 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
 	return fs_devs;
 }
 
+static void free_device(struct btrfs_device *device)
+{
+	rcu_string_free(device->name);
+	bio_put(device->flush_bio);
+	kfree(device);
+}
+
 static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_device *device;
@@ -220,6 +227,11 @@ void btrfs_cleanup_fs_uuids(void)
 	}
 }
 
+/*
+ * Returns a pointer to a new btrfs_device on success; ERR_PTR() on error.
+ * Returned struct is not linked onto any lists and must be destroyed using
+ * free_device.
+ */
 static struct btrfs_device *__alloc_device(void)
 {
 	struct btrfs_device *dev;
@@ -6256,8 +6268,8 @@ static struct btrfs_device *add_missing_dev(struct btrfs_fs_devices *fs_devices,
  *		is generated.
  *
  * Return: a pointer to a new &struct btrfs_device on success; ERR_PTR()
- * on error.  Returned struct is not linked onto any lists and can be
- * destroyed with kfree() right away.
+ * on error.  Returned struct is not linked onto any lists and must be
+ * destroyed with free_device.
  */
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u64 *devid,
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/11] btrfs: use free_device where opencoded
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (2 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 03/11] btrfs: introduce free_device helper David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-11-02 11:25   ` Anand Jain
  2017-10-31 17:44 ` [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device David Sterba
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 35c546aade83..f206e83d4f52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -195,9 +195,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 		device = list_entry(fs_devices->devices.next,
 				    struct btrfs_device, dev_list);
 		list_del(&device->dev_list);
-		rcu_string_free(device->name);
-		bio_put(device->flush_bio);
-		kfree(device);
+		free_device(device);
 	}
 	kfree(fs_devices);
 }
@@ -590,9 +588,7 @@ static void btrfs_free_stale_device(struct btrfs_device *cur_dev)
 			} else {
 				fs_devs->num_devices--;
 				list_del(&dev->dev_list);
-				rcu_string_free(dev->name);
-				bio_put(dev->flush_bio);
-				kfree(dev);
+				free_device(dev);
 			}
 			break;
 		}
@@ -644,8 +640,7 @@ static noinline int device_list_add(const char *path,
 
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name) {
-			bio_put(device->flush_bio);
-			kfree(device);
+			free_device(device);
 			return -ENOMEM;
 		}
 		rcu_assign_pointer(device->name, name);
@@ -757,8 +752,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 			name = rcu_string_strdup(orig_dev->name->str,
 					GFP_KERNEL);
 			if (!name) {
-				bio_put(device->flush_bio);
-				kfree(device);
+				free_device(device);
 				goto error;
 			}
 			rcu_assign_pointer(device->name, name);
@@ -822,9 +816,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 		}
 		list_del_init(&device->dev_list);
 		fs_devices->num_devices--;
-		rcu_string_free(device->name);
-		bio_put(device->flush_bio);
-		kfree(device);
+		free_device(device);
 	}
 
 	if (fs_devices->seed) {
@@ -842,9 +834,7 @@ static void free_device_rcu(struct rcu_head *head)
 	struct btrfs_device *device;
 
 	device = container_of(head, struct btrfs_device, rcu);
-	rcu_string_free(device->name);
-	bio_put(device->flush_bio);
-	kfree(device);
+	free_device(device);
 }
 
 static void btrfs_close_bdev(struct btrfs_device *device)
@@ -2354,8 +2344,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
-		bio_put(device->flush_bio);
-		kfree(device);
+		free_device(device);
 		ret = -ENOMEM;
 		goto error;
 	}
@@ -2363,9 +2352,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	trans = btrfs_start_transaction(root, 0);
 	if (IS_ERR(trans)) {
-		rcu_string_free(device->name);
-		bio_put(device->flush_bio);
-		kfree(device);
+		free_device(device);
 		ret = PTR_ERR(trans);
 		goto error;
 	}
@@ -2507,9 +2494,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		sb->s_flags |= MS_RDONLY;
 	if (trans)
 		btrfs_end_transaction(trans);
-	rcu_string_free(device->name);
-	bio_put(device->flush_bio);
-	kfree(device);
+	free_device(device);
 error:
 	blkdev_put(bdev, FMODE_EXCL);
 	if (seeding_dev && !unlocked) {
@@ -2575,8 +2560,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
-		bio_put(device->flush_bio);
-		kfree(device);
+		free_device(device);
 		ret = -ENOMEM;
 		goto error;
 	}
@@ -6292,8 +6276,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 
 		ret = find_next_devid(fs_info, &tmp);
 		if (ret) {
-			bio_put(dev->flush_bio);
-			kfree(dev);
+			free_device(dev);
 			return ERR_PTR(ret);
 		}
 	}
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (3 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 04/11] btrfs: use free_device where opencoded David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-11-06  1:53   ` Anand Jain
  2017-10-31 17:44 ` [PATCH 06/11] btrfs: document device locking David Sterba
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f206e83d4f52..75aed8ec64bd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2344,17 +2344,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
-		free_device(device);
 		ret = -ENOMEM;
-		goto error;
+		goto error_free_device;
 	}
 	rcu_assign_pointer(device->name, name);
 
 	trans = btrfs_start_transaction(root, 0);
 	if (IS_ERR(trans)) {
-		free_device(device);
 		ret = PTR_ERR(trans);
-		goto error;
+		goto error_free_device;
 	}
 
 	q = bdev_get_queue(bdev);
@@ -2494,6 +2492,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		sb->s_flags |= MS_RDONLY;
 	if (trans)
 		btrfs_end_transaction(trans);
+error_free_device:
 	free_device(device);
 error:
 	blkdev_put(bdev, FMODE_EXCL);
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/11] btrfs: document device locking
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (4 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-11-02 10:29   ` Nikolay Borisov
  2017-11-03 11:13   ` Anand Jain
  2017-10-31 17:44 ` [PATCH 07/11] btrfs: dev_alloc_list is not protected by RCU, use normal list_del David Sterba
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Overview of the main locks protecting various device-related structures.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 75aed8ec64bd..098affc58361 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 			     struct btrfs_bio **bbio_ret,
 			     int mirror_num, int need_raid_map);
 
+/*
+ * Device locking
+ * ==============
+ *
+ * There are several mutexes that protect manipulation of devices and low-level
+ * structures like chunks but not block groups, extents or files
+ *
+ * - uuid_mutex (global lock)
+ *
+ *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
+ *   from the SCAN_DEV ioctl registration or from mount either implicitly
+ *   (the first device) or requested by the device= mount option
+ *
+ *   the mutex can be very coarse and can cover long-running operations
+ *
+ *   protects: updates to fs_devices counters like missing devices, rw devices,
+ *   seeding, structure cloning, openning/closing devices at mount/umount time
+ *
+ *   global::fs_devs - add, remove, updates to the global list
+ *
+ *   does not protect: manipulation of the fs_devices::devices list!
+ *
+ *   btrfs_device::name - renames (write side), read is RCU
+ *
+ * - fs_devices::device_list_mutex (per-fs, with RCU)
+ *
+ *   protects updates to fs_devices::devices, ie. adding and deleting
+ *
+ *   simple list traversal with read-only actions can be done with RCU
+ *   protection
+ *
+ *   may be used to exclude some operations from running concurrently without
+ *   any modifications to the list (see write_all_supers)
+ *
+ * - volume_mutex
+ *
+ *   coarse lock owned by a mounted filesystem; used to exclude some operations
+ *   that cannot run in parallel and affect the higher-level properties of the
+ *   filesystem like: device add/deleting/resize/replace, or balance
+ *
+ * - balance_mutex
+ *
+ *   protects balance structures (status, state) and context accessed from
+ *   several places (internally, ioctl)
+ *
+ * - chunk_mutex
+ *
+ *   protects chunks, adding or removing during allocation, trim or when
+ *   a new device is added/removed
+ *
+ * - cleaner_mutex
+ *
+ *   a big lock that is held by the cleaner thread and prevents running
+ *   subvolume cleaning together with relocation or delayed iputs
+ *
+ *
+ * Lock nesting
+ * ------------
+ *
+ * uuid_mutex
+ *   volume_mutex
+ *     device_list_mutex
+ *       chunk_mutex
+ *     balance_mutex
+ *
+ */
 DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
 struct list_head *btrfs_get_fs_uuids(void)
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/11] btrfs: dev_alloc_list is not protected by RCU, use normal list_del
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (5 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 06/11] btrfs: document device locking David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 08/11] btrfs: simplify btrfs_close_bdev David Sterba
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The dev_alloc_list list could be protected by various mutexes,
depending on the context. The list tracks devices that can take part of
allocating new chunks, so the closest mutex is chunk_mutex. Adding a new
device from inside the ADD_DEV ioctl will need device_list_mutex and
registering a new device from the ioctl needs uuid_mutex.

All mutexes naturally guarantee exclusivity against the same context.
The device ownership can move between the contexts and the exclusivity
is guaranteed by other means, eg. during the mount with the uuid_mutex.

There's no RCU involved for dev_alloc_list.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 098affc58361..6e45751ed957 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2054,7 +2054,7 @@ void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_fs_info *fs_info,
 	fs_devices = srcdev->fs_devices;
 
 	list_del_rcu(&srcdev->dev_list);
-	list_del_rcu(&srcdev->dev_alloc_list);
+	list_del(&srcdev->dev_alloc_list);
 	fs_devices->num_devices--;
 	if (srcdev->missing)
 		fs_devices->missing_devices--;
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/11] btrfs: simplify btrfs_close_bdev
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (6 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 07/11] btrfs: dev_alloc_list is not protected by RCU, use normal list_del David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 09/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info David Sterba
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Split the conditions a bit.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6e45751ed957..3af2a7fd2d05 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -905,13 +905,15 @@ static void free_device_rcu(struct rcu_head *head)
 
 static void btrfs_close_bdev(struct btrfs_device *device)
 {
-	if (device->bdev && device->writeable) {
+	if (!device->bdev)
+		return;
+
+	if (device->writeable) {
 		sync_blockdev(device->bdev);
 		invalidate_bdev(device->bdev);
 	}
 
-	if (device->bdev)
-		blkdev_put(device->bdev, device->mode);
+	blkdev_put(device->bdev, device->mode);
 }
 
 static void btrfs_prepare_close_one_device(struct btrfs_device *device)
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (7 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 08/11] btrfs: simplify btrfs_close_bdev David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 10/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info David Sterba
  2017-10-31 17:44 ` [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees David Sterba
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We don't need to use the mutex as we do not modify the devices nor the
list itself and just read some information:

does not change during device lifetime:
- devid
- uuid
- name (ie. the path)

may change in parallel to the ioctl call, but can lead only to reporting
inacurracy:
- bytes_used
- total_bytes

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fd172a93d11a..f0a3308a72c5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2779,7 +2779,6 @@ static long btrfs_ioctl_dev_info(struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_ioctl_dev_info_args *di_args;
 	struct btrfs_device *dev;
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	int ret = 0;
 	char *s_uuid = NULL;
 
@@ -2790,7 +2789,7 @@ static long btrfs_ioctl_dev_info(struct btrfs_fs_info *fs_info,
 	if (!btrfs_is_empty_uuid(di_args->uuid))
 		s_uuid = di_args->uuid;
 
-	mutex_lock(&fs_devices->device_list_mutex);
+	rcu_read_lock();
 	dev = btrfs_find_device(fs_info, di_args->devid, s_uuid, NULL);
 
 	if (!dev) {
@@ -2805,17 +2804,15 @@ static long btrfs_ioctl_dev_info(struct btrfs_fs_info *fs_info,
 	if (dev->name) {
 		struct rcu_string *name;
 
-		rcu_read_lock();
 		name = rcu_dereference(dev->name);
 		strncpy(di_args->path, name->str, sizeof(di_args->path));
-		rcu_read_unlock();
 		di_args->path[sizeof(di_args->path) - 1] = 0;
 	} else {
 		di_args->path[0] = '\0';
 	}
 
 out:
-	mutex_unlock(&fs_devices->device_list_mutex);
+	rcu_read_unlock();
 	if (ret == 0 && copy_to_user(arg, di_args, sizeof(*di_args)))
 		ret = -EFAULT;
 
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (8 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 09/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-10-31 17:44 ` [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees David Sterba
  10 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We don't need to use the mutex as we do not modify the devices nor the
list itself and just read information about device counts.
Move copying fsid out of the protected section, not applicable to RCU
same as the rest of the retrieved information.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f0a3308a72c5..b15973cde0b6 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2753,16 +2753,16 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
 	if (!fi_args)
 		return -ENOMEM;
 
-	mutex_lock(&fs_devices->device_list_mutex);
+	rcu_read_lock();
 	fi_args->num_devices = fs_devices->num_devices;
-	memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid));
 
-	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
 		if (device->devid > fi_args->max_id)
 			fi_args->max_id = device->devid;
 	}
-	mutex_unlock(&fs_devices->device_list_mutex);
+	rcu_read_unlock();
 
+	memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid));
 	fi_args->nodesize = fs_info->nodesize;
 	fi_args->sectorsize = fs_info->sectorsize;
 	fi_args->clone_alignment = fs_info->sectorsize;
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees
  2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
                   ` (9 preceding siblings ...)
  2017-10-31 17:44 ` [PATCH 10/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info David Sterba
@ 2017-10-31 17:44 ` David Sterba
  2017-11-02  9:49   ` Nikolay Borisov
  10 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2017-10-31 17:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We take the fs_devices::device_list_mutex mutex in write_all_supers
which will prevent any add/del changes to the device list. Therefore we
don't need to use the RCU variant list_for_each_entry_rcu in any of the
called functions.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index efce9a2fa9be..042cf46e4cd0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3396,9 +3396,10 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 	int errors_wait = 0;
 	blk_status_t ret;
 
+	WARN_ON(!mutex_is_locked(&info->fs_devices->device_list_mutex));
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
-	list_for_each_entry_rcu(dev, head, dev_list) {
+	list_for_each_entry(dev, head, dev_list) {
 		if (dev->missing)
 			continue;
 		if (!dev->bdev)
@@ -3411,7 +3412,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 	}
 
 	/* wait for all the barriers */
-	list_for_each_entry_rcu(dev, head, dev_list) {
+	list_for_each_entry(dev, head, dev_list) {
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
@@ -3510,7 +3511,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		}
 	}
 
-	list_for_each_entry_rcu(dev, head, dev_list) {
+	list_for_each_entry(dev, head, dev_list) {
 		if (!dev->bdev) {
 			total_errors++;
 			continue;
@@ -3551,7 +3552,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 	}
 
 	total_errors = 0;
-	list_for_each_entry_rcu(dev, head, dev_list) {
+	list_for_each_entry(dev, head, dev_list) {
 		if (!dev->bdev)
 			continue;
 		if (!dev->in_fs_metadata || !dev->writeable)
-- 
2.14.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/11] btrfs: add missing device::flush_bio puts
  2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
@ 2017-11-02  9:41   ` Nikolay Borisov
  2017-11-06 13:24     ` David Sterba
  2017-11-02 10:40   ` Anand Jain
  1 sibling, 1 reply; 27+ messages in thread
From: Nikolay Borisov @ 2017-11-02  9:41 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 31.10.2017 19:44, David Sterba wrote:
> This fixes potential bio leaks, in several error paths. Unfortunatelly
> the device structure freeing is opencoded in many places and I missed
> them when introducing the flush_bio.
> 
> Most of the time, devices get freed through call_rcu(..., free_device),
> so it at least it's not that easy to hit the leak, but it's still
> possible through the path that frees stale devices.
> 
> Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
> Signed-off-by: David Sterba <dsterba@suse.com>

Verified that every kfree(device) has a matching bio_put via:

grep -ir -B2 "kfree(dev.*)" fs/btrfs/volumes.c

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ea8b20839ac0..08fb4b5609b7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -189,6 +189,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
>  				    struct btrfs_device, dev_list);
>  		list_del(&device->dev_list);
>  		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>  		kfree(device);
>  	}
>  	kfree(fs_devices);
> @@ -578,6 +579,7 @@ static void btrfs_free_stale_device(struct btrfs_device *cur_dev)
>  				fs_devs->num_devices--;
>  				list_del(&dev->dev_list);
>  				rcu_string_free(dev->name);
> +				bio_put(dev->flush_bio);
>  				kfree(dev);
>  			}
>  			break;
> @@ -630,6 +632,7 @@ static noinline int device_list_add(const char *path,
>  
>  		name = rcu_string_strdup(path, GFP_NOFS);
>  		if (!name) {
> +			bio_put(device->flush_bio);
>  			kfree(device);
>  			return -ENOMEM;
>  		}
> @@ -742,6 +745,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>  			name = rcu_string_strdup(orig_dev->name->str,
>  					GFP_KERNEL);
>  			if (!name) {
> +				bio_put(device->flush_bio);
>  				kfree(device);
>  				goto error;
>  			}
> @@ -807,6 +811,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
>  		list_del_init(&device->dev_list);
>  		fs_devices->num_devices--;
>  		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>  		kfree(device);
>  	}
>  
> @@ -2337,6 +2342,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	name = rcu_string_strdup(device_path, GFP_KERNEL);
>  	if (!name) {
> +		bio_put(device->flush_bio);
>  		kfree(device);
>  		ret = -ENOMEM;
>  		goto error;
> @@ -2346,6 +2352,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	trans = btrfs_start_transaction(root, 0);
>  	if (IS_ERR(trans)) {
>  		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>  		kfree(device);
>  		ret = PTR_ERR(trans);
>  		goto error;
> @@ -2489,6 +2496,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	if (trans)
>  		btrfs_end_transaction(trans);
>  	rcu_string_free(device->name);
> +	bio_put(device->flush_bio);
>  	kfree(device);
>  error:
>  	blkdev_put(bdev, FMODE_EXCL);
> @@ -2555,6 +2563,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>  
>  	name = rcu_string_strdup(device_path, GFP_KERNEL);
>  	if (!name) {
> +		bio_put(device->flush_bio);
>  		kfree(device);
>  		ret = -ENOMEM;
>  		goto error;
> @@ -6271,6 +6280,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>  
>  		ret = find_next_devid(fs_info, &tmp);
>  		if (ret) {
> +			bio_put(dev->flush_bio);
>  			kfree(dev);
>  			return ERR_PTR(ret);
>  		}
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees
  2017-10-31 17:44 ` [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees David Sterba
@ 2017-11-02  9:49   ` Nikolay Borisov
  2017-11-06 13:59     ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Nikolay Borisov @ 2017-11-02  9:49 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 31.10.2017 19:44, David Sterba wrote:
> We take the fs_devices::device_list_mutex mutex in write_all_supers
> which will prevent any add/del changes to the device list. Therefore we
> don't need to use the RCU variant list_for_each_entry_rcu in any of the
> called functions.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/disk-io.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index efce9a2fa9be..042cf46e4cd0 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3396,9 +3396,10 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  	int errors_wait = 0;
>  	blk_status_t ret;
>  
> +	WARN_ON(!mutex_is_locked(&info->fs_devices->device_list_mutex));

Don't we want lockdep_assert_held ? I'm assuming enough testing with
lockdep on is performed so it will be caught in time if this invariant
is broken.

>  	/* send down all the barriers */
>  	head = &info->fs_devices->devices;
> -	list_for_each_entry_rcu(dev, head, dev_list) {
> +	list_for_each_entry(dev, head, dev_list) {
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev)
> @@ -3411,7 +3412,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  	}
>  
>  	/* wait for all the barriers */
> -	list_for_each_entry_rcu(dev, head, dev_list) {
> +	list_for_each_entry(dev, head, dev_list) {
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> @@ -3510,7 +3511,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
>  		}
>  	}
>  
> -	list_for_each_entry_rcu(dev, head, dev_list) {
> +	list_for_each_entry(dev, head, dev_list) {
>  		if (!dev->bdev) {
>  			total_errors++;
>  			continue;
> @@ -3551,7 +3552,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
>  	}
>  
>  	total_errors = 0;
> -	list_for_each_entry_rcu(dev, head, dev_list) {
> +	list_for_each_entry(dev, head, dev_list) {
>  		if (!dev->bdev)
>  			continue;
>  		if (!dev->in_fs_metadata || !dev->writeable)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-10-31 17:44 ` [PATCH 06/11] btrfs: document device locking David Sterba
@ 2017-11-02 10:29   ` Nikolay Borisov
  2017-11-06 13:51     ` David Sterba
  2017-11-03 11:13   ` Anand Jain
  1 sibling, 1 reply; 27+ messages in thread
From: Nikolay Borisov @ 2017-11-02 10:29 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 31.10.2017 19:44, David Sterba wrote:
> Overview of the main locks protecting various device-related structures.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 75aed8ec64bd..098affc58361 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>  			     struct btrfs_bio **bbio_ret,
>  			     int mirror_num, int need_raid_map);
>  
> +/*
> + * Device locking
> + * ==============
> + *
> + * There are several mutexes that protect manipulation of devices and low-level
> + * structures like chunks but not block groups, extents or files

This sentence suggests you are describing the various mutexes but it
seems you are also describing data structure below i.e.
global::fs_devs/btrfs_device::name

> + *
> + * - uuid_mutex (global lock)
> + *
> + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
> + *   from the SCAN_DEV ioctl registration or from mount either implicitly
> + *   (the first device) or requested by the device= mount option
> + *
> + *   the mutex can be very coarse and can cover long-running operations
> + *
> + *   protects: updates to fs_devices counters like missing devices, rw devices,
> + *   seeding, structure cloning, openning/closing devices at mount/umount time

Perhaps move the uuid_mutex description after btrfs_device::name. That
way mutexes are grouped together and data structures are grouped as well.

> + *
> + *   global::fs_devs - add, remove, updates to the global list

You seem to be describing a data structure here rather than a mutex

> + *
> + *   does not protect: manipulation of the fs_devices::devices list!

but this sentence is written as if you've just described a mutex. The
end result is that it's not clear what mutex you are referring to here.

> + *
> + *   btrfs_device::name - renames (write side), read is RCU

It's not clear how the write side is protected.

> + *
> + * - fs_devices::device_list_mutex (per-fs, with RCU)
> + *
> + *   protects updates to fs_devices::devices, ie. adding and deleting
> + *
> + *   simple list traversal with read-only actions can be done with RCU
> + *   protection
> + *
> + *   may be used to exclude some operations from running concurrently without
> + *   any modifications to the list (see write_all_supers)
> + *
> + * - volume_mutex
> + *
> + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> + *   that cannot run in parallel and affect the higher-level properties of the
> + *   filesystem like: device add/deleting/resize/replace, or balance
> + *
> + * - balance_mutex
> + *
> + *   protects balance structures (status, state) and context accessed from
> + *   several places (internally, ioctl)
> + *
> + * - chunk_mutex
> + *
> + *   protects chunks, adding or removing during allocation, trim or when
> + *   a new device is added/removed
> + *
> + * - cleaner_mutex
> + *
> + *   a big lock that is held by the cleaner thread and prevents running
> + *   subvolume cleaning together with relocation or delayed iputs
> + *
> + *
> + * Lock nesting
> + * ------------
> + *
> + * uuid_mutex
> + *   volume_mutex
> + *     device_list_mutex
> + *       chunk_mutex
> + *     balance_mutex
> + *
> + */
>  DEFINE_MUTEX(uuid_mutex);
>  static LIST_HEAD(fs_uuids);
>  struct list_head *btrfs_get_fs_uuids(void)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/11] btrfs: add missing device::flush_bio puts
  2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
  2017-11-02  9:41   ` Nikolay Borisov
@ 2017-11-02 10:40   ` Anand Jain
  1 sibling, 0 replies; 27+ messages in thread
From: Anand Jain @ 2017-11-02 10:40 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 11/01/2017 01:44 AM, David Sterba wrote:
> This fixes potential bio leaks, in several error paths. Unfortunatelly
> the device structure freeing is opencoded in many places and I missed
> them when introducing the flush_bio.
> 
> Most of the time, devices get freed through call_rcu(..., free_device),
> so it at least it's not that easy to hit the leak, but it's still
> possible through the path that frees stale devices.
> 
> Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
> Signed-off-by: David Sterba <dsterba@suse.com>

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/volumes.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ea8b20839ac0..08fb4b5609b7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -189,6 +189,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
>   				    struct btrfs_device, dev_list);
>   		list_del(&device->dev_list);
>   		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>   		kfree(device);
>   	}
>   	kfree(fs_devices);
> @@ -578,6 +579,7 @@ static void btrfs_free_stale_device(struct btrfs_device *cur_dev)
>   				fs_devs->num_devices--;
>   				list_del(&dev->dev_list);
>   				rcu_string_free(dev->name);
> +				bio_put(dev->flush_bio);
>   				kfree(dev);
>   			}
>   			break;
> @@ -630,6 +632,7 @@ static noinline int device_list_add(const char *path,
>   
>   		name = rcu_string_strdup(path, GFP_NOFS);
>   		if (!name) {
> +			bio_put(device->flush_bio);
>   			kfree(device);
>   			return -ENOMEM;
>   		}
> @@ -742,6 +745,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>   			name = rcu_string_strdup(orig_dev->name->str,
>   					GFP_KERNEL);
>   			if (!name) {
> +				bio_put(device->flush_bio);
>   				kfree(device);
>   				goto error;
>   			}
> @@ -807,6 +811,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
>   		list_del_init(&device->dev_list);
>   		fs_devices->num_devices--;
>   		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>   		kfree(device);
>   	}
>   
> @@ -2337,6 +2342,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   
>   	name = rcu_string_strdup(device_path, GFP_KERNEL);
>   	if (!name) {
> +		bio_put(device->flush_bio);
>   		kfree(device);
>   		ret = -ENOMEM;
>   		goto error;
> @@ -2346,6 +2352,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	trans = btrfs_start_transaction(root, 0);
>   	if (IS_ERR(trans)) {
>   		rcu_string_free(device->name);
> +		bio_put(device->flush_bio);
>   		kfree(device);
>   		ret = PTR_ERR(trans);
>   		goto error;
> @@ -2489,6 +2496,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	if (trans)
>   		btrfs_end_transaction(trans);
>   	rcu_string_free(device->name);
> +	bio_put(device->flush_bio);
>   	kfree(device);
>   error:
>   	blkdev_put(bdev, FMODE_EXCL);
> @@ -2555,6 +2563,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>   
>   	name = rcu_string_strdup(device_path, GFP_KERNEL);
>   	if (!name) {
> +		bio_put(device->flush_bio);
>   		kfree(device);
>   		ret = -ENOMEM;
>   		goto error;
> @@ -6271,6 +6280,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>   
>   		ret = find_next_devid(fs_info, &tmp);
>   		if (ret) {
> +			bio_put(dev->flush_bio);
>   			kfree(dev);
>   			return ERR_PTR(ret);
>   		}
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/11] btrfs: use free_device where opencoded
  2017-10-31 17:44 ` [PATCH 04/11] btrfs: use free_device where opencoded David Sterba
@ 2017-11-02 11:25   ` Anand Jain
  0 siblings, 0 replies; 27+ messages in thread
From: Anand Jain @ 2017-11-02 11:25 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



Looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


On 11/01/2017 01:44 AM, David Sterba wrote:
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/volumes.c | 39 +++++++++++----------------------------
>   1 file changed, 11 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 35c546aade83..f206e83d4f52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -195,9 +195,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
>   		device = list_entry(fs_devices->devices.next,
>   				    struct btrfs_device, dev_list);
>   		list_del(&device->dev_list);
> -		rcu_string_free(device->name);
> -		bio_put(device->flush_bio);
> -		kfree(device);
> +		free_device(device);
>   	}
>   	kfree(fs_devices);
>   }
> @@ -590,9 +588,7 @@ static void btrfs_free_stale_device(struct btrfs_device *cur_dev)
>   			} else {
>   				fs_devs->num_devices--;
>   				list_del(&dev->dev_list);
> -				rcu_string_free(dev->name);
> -				bio_put(dev->flush_bio);
> -				kfree(dev);
> +				free_device(dev);
>   			}
>   			break;
>   		}
> @@ -644,8 +640,7 @@ static noinline int device_list_add(const char *path,
>   
>   		name = rcu_string_strdup(path, GFP_NOFS);
>   		if (!name) {
> -			bio_put(device->flush_bio);
> -			kfree(device);
> +			free_device(device);
>   			return -ENOMEM;
>   		}
>   		rcu_assign_pointer(device->name, name);
> @@ -757,8 +752,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>   			name = rcu_string_strdup(orig_dev->name->str,
>   					GFP_KERNEL);
>   			if (!name) {
> -				bio_put(device->flush_bio);
> -				kfree(device);
> +				free_device(device);
>   				goto error;
>   			}
>   			rcu_assign_pointer(device->name, name);
> @@ -822,9 +816,7 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
>   		}
>   		list_del_init(&device->dev_list);
>   		fs_devices->num_devices--;
> -		rcu_string_free(device->name);
> -		bio_put(device->flush_bio);
> -		kfree(device);
> +		free_device(device);
>   	}
>   
>   	if (fs_devices->seed) {
> @@ -842,9 +834,7 @@ static void free_device_rcu(struct rcu_head *head)
>   	struct btrfs_device *device;
>   
>   	device = container_of(head, struct btrfs_device, rcu);
> -	rcu_string_free(device->name);
> -	bio_put(device->flush_bio);
> -	kfree(device);
> +	free_device(device);
>   }
>   
>   static void btrfs_close_bdev(struct btrfs_device *device)
> @@ -2354,8 +2344,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   
>   	name = rcu_string_strdup(device_path, GFP_KERNEL);
>   	if (!name) {
> -		bio_put(device->flush_bio);
> -		kfree(device);
> +		free_device(device);
>   		ret = -ENOMEM;
>   		goto error;
>   	}
> @@ -2363,9 +2352,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   
>   	trans = btrfs_start_transaction(root, 0);
>   	if (IS_ERR(trans)) {
> -		rcu_string_free(device->name);
> -		bio_put(device->flush_bio);
> -		kfree(device);
> +		free_device(device);
>   		ret = PTR_ERR(trans);
>   		goto error;
>   	}
> @@ -2507,9 +2494,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   		sb->s_flags |= MS_RDONLY;
>   	if (trans)
>   		btrfs_end_transaction(trans);
> -	rcu_string_free(device->name);
> -	bio_put(device->flush_bio);
> -	kfree(device);
> +	free_device(device);
>   error:
>   	blkdev_put(bdev, FMODE_EXCL);
>   	if (seeding_dev && !unlocked) {
> @@ -2575,8 +2560,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>   
>   	name = rcu_string_strdup(device_path, GFP_KERNEL);
>   	if (!name) {
> -		bio_put(device->flush_bio);
> -		kfree(device);
> +		free_device(device);
>   		ret = -ENOMEM;
>   		goto error;
>   	}
> @@ -6292,8 +6276,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>   
>   		ret = find_next_devid(fs_info, &tmp);
>   		if (ret) {
> -			bio_put(dev->flush_bio);
> -			kfree(dev);
> +			free_device(dev);
>   			return ERR_PTR(ret);
>   		}
>   	}
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-10-31 17:44 ` [PATCH 06/11] btrfs: document device locking David Sterba
  2017-11-02 10:29   ` Nikolay Borisov
@ 2017-11-03 11:13   ` Anand Jain
  2017-11-06  2:32     ` Anand Jain
  2017-11-06 13:36     ` David Sterba
  1 sibling, 2 replies; 27+ messages in thread
From: Anand Jain @ 2017-11-03 11:13 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



Thanks for writing this.

> + * - fs_devices::device_list_mutex (per-fs, with RCU)
> + *
> + *   protects updates to fs_devices::devices, ie. adding and deleting
> + *
> + *   simple list traversal with read-only actions can be done with RCU
> + *   protection
> + *
> + *   may be used to exclude some operations from running concurrently without
> + *   any modifications to the list (see write_all_supers)

> + * - volume_mutex
> + *
> + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> + *   that cannot run in parallel and affect the higher-level properties of the
> + *   filesystem like: device add/deleting/resize/replace, or balance

> + * - chunk_mutex
> + *
> + *   protects chunks, adding or removing during allocation, trim or when
> + *   a new device is added/removed

::

> + * Lock nesting
> + * ------------
> + *
> + * uuid_mutex
> + *   volume_mutex
> + *     device_list_mutex
> + *       chunk_mutex
> + *     balance_mutex


If we have a list of operations that would consume these locks then we 
can map it accordingly for better clarity.

To me it looks like we have too many locks.
  - we don't have to differentiate the mounted and unmounted context
    for device locks.
  - Two lock would be sufficient, one for the device list
    (add/rm,replace,..) and another for device property changes
    (resize, trim,..).

Thanks, Anand


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device
  2017-10-31 17:44 ` [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device David Sterba
@ 2017-11-06  1:53   ` Anand Jain
  0 siblings, 0 replies; 27+ messages in thread
From: Anand Jain @ 2017-11-06  1:53 UTC (permalink / raw)
  To: David Sterba, linux-btrfs


looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

On 11/01/2017 01:44 AM, David Sterba wrote:
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/volumes.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f206e83d4f52..75aed8ec64bd 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2344,17 +2344,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   
>   	name = rcu_string_strdup(device_path, GFP_KERNEL);
>   	if (!name) {
> -		free_device(device);
>   		ret = -ENOMEM;
> -		goto error;
> +		goto error_free_device;
>   	}
>   	rcu_assign_pointer(device->name, name);
>   
>   	trans = btrfs_start_transaction(root, 0);
>   	if (IS_ERR(trans)) {
> -		free_device(device);
>   		ret = PTR_ERR(trans);
> -		goto error;
> +		goto error_free_device;
>   	}
>   
>   	q = bdev_get_queue(bdev);
> @@ -2494,6 +2492,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   		sb->s_flags |= MS_RDONLY;
>   	if (trans)
>   		btrfs_end_transaction(trans);
> +error_free_device:
>   	free_device(device);
>   error:
>   	blkdev_put(bdev, FMODE_EXCL);
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-03 11:13   ` Anand Jain
@ 2017-11-06  2:32     ` Anand Jain
  2017-11-06 13:40       ` David Sterba
  2017-11-06 13:36     ` David Sterba
  1 sibling, 1 reply; 27+ messages in thread
From: Anand Jain @ 2017-11-06  2:32 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 11/03/2017 07:13 PM, Anand Jain wrote:
> 
> 
> Thanks for writing this.
> 
>> + * - fs_devices::device_list_mutex (per-fs, with RCU)
>> + *
>> + *   protects updates to fs_devices::devices, ie. adding and deleting
>> + *
>> + *   simple list traversal with read-only actions can be done with RCU
>> + *   protection
>> + *
>> + *   may be used to exclude some operations from running concurrently 
>> without
>> + *   any modifications to the list (see write_all_supers)
> 
>> + * - volume_mutex
>> + *
>> + *   coarse lock owned by a mounted filesystem; used to exclude some 
>> operations
>> + *   that cannot run in parallel and affect the higher-level 
>> properties of the
>> + *   filesystem like: device add/deleting/resize/replace, or balance
> 
>> + * - chunk_mutex
>> + *
>> + *   protects chunks, adding or removing during allocation, trim or when
>> + *   a new device is added/removed
> 
> ::
> 
>> + * Lock nesting
>> + * ------------
>> + *
>> + * uuid_mutex
>> + *   volume_mutex

I think it should be nested like this, but as of now its not. Ref [1]
  [1]
  btrfs: move volume_mutex into the btrfs_rm_device()

Thanks, Anand

>> + *     device_list_mutex
>> + *       chunk_mutex
>> + *     balance_mutex
> 
> 
> If we have a list of operations that would consume these locks then we 
> can map it accordingly for better clarity.
> 
> To me it looks like we have too many locks.
>   - we don't have to differentiate the mounted and unmounted context
>     for device locks.
>   - Two lock would be sufficient, one for the device list
>     (add/rm,replace,..) and another for device property changes
>     (resize, trim,..).
> 
> Thanks, Anand
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/11] btrfs: add missing device::flush_bio puts
  2017-11-02  9:41   ` Nikolay Borisov
@ 2017-11-06 13:24     ` David Sterba
  0 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-11-06 13:24 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Thu, Nov 02, 2017 at 11:41:44AM +0200, Nikolay Borisov wrote:
> 
> 
> On 31.10.2017 19:44, David Sterba wrote:
> > This fixes potential bio leaks, in several error paths. Unfortunatelly
> > the device structure freeing is opencoded in many places and I missed
> > them when introducing the flush_bio.
> > 
> > Most of the time, devices get freed through call_rcu(..., free_device),
> > so it at least it's not that easy to hit the leak, but it's still
> > possible through the path that frees stale devices.
> > 
> > Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
> > Signed-off-by: David Sterba <dsterba@suse.com>
> 
> Verified that every kfree(device) has a matching bio_put via:
> 
> grep -ir -B2 "kfree(dev.*)" fs/btrfs/volumes.c

I've used this coccinelle script to cross-check, in case the variable is
is not named 'dev*':

<smpl>
@@
struct btrfs_device *DEV;
@@
* kfree(DEV);
</smpl>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-03 11:13   ` Anand Jain
  2017-11-06  2:32     ` Anand Jain
@ 2017-11-06 13:36     ` David Sterba
  1 sibling, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-11-06 13:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Nov 03, 2017 at 07:13:01PM +0800, Anand Jain wrote:
> 
> 
> Thanks for writing this.
> 
> > + * - fs_devices::device_list_mutex (per-fs, with RCU)
> > + *
> > + *   protects updates to fs_devices::devices, ie. adding and deleting
> > + *
> > + *   simple list traversal with read-only actions can be done with RCU
> > + *   protection
> > + *
> > + *   may be used to exclude some operations from running concurrently without
> > + *   any modifications to the list (see write_all_supers)
> 
> > + * - volume_mutex
> > + *
> > + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> > + *   that cannot run in parallel and affect the higher-level properties of the
> > + *   filesystem like: device add/deleting/resize/replace, or balance
> 
> > + * - chunk_mutex
> > + *
> > + *   protects chunks, adding or removing during allocation, trim or when
> > + *   a new device is added/removed
> 
> ::
> 
> > + * Lock nesting
> > + * ------------
> > + *
> > + * uuid_mutex
> > + *   volume_mutex
> > + *     device_list_mutex
> > + *       chunk_mutex
> > + *     balance_mutex
> 
> 
> If we have a list of operations that would consume these locks then we 
> can map it accordingly for better clarity.
> 
> To me it looks like we have too many locks.

I agree. Trivially we can remove the volume_mutex, because it copies
setting of BTRFS_FS_EXCL_OP, but this still needs some preparatory work
in the mount path. I have patches for that but I wanted to send out the
documentation first, that is supposed to reflect the current state.

>   - we don't have to differentiate the mounted and unmounted context
>     for device locks.

I'm not sure about that, the device registering happens outside of
mounted filesystems yet we touch the same structures on mounted
filesystem.

>   - Two lock would be sufficient, one for the device list
>     (add/rm,replace,..) and another for device property changes
>     (resize, trim,..).

I think this will be sorted once the volume_mutex is gone. The nature of
add/remove and trim is different, as trim does not change the high-level
status of devices, only locks the bloc groups. Resize and add/remove are
mutually exclusive so they need to be protected by a common lock.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-06  2:32     ` Anand Jain
@ 2017-11-06 13:40       ` David Sterba
  0 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-11-06 13:40 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Mon, Nov 06, 2017 at 10:32:48AM +0800, Anand Jain wrote:
> >> + * Lock nesting
> >> + * ------------
> >> + *
> >> + * uuid_mutex
> >> + *   volume_mutex
> 
> I think it should be nested like this, but as of now its not. Ref [1]
>   [1]
>   btrfs: move volume_mutex into the btrfs_rm_device()

Right, this violates the documented rules and I think that's because of
the ambiguous semantic of the uuid_mutex. Here we're using the "on a
mounted filesystem" part so we lock only in the context where this
applies. Again, once volume_mutex is gone, this will be fixed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-02 10:29   ` Nikolay Borisov
@ 2017-11-06 13:51     ` David Sterba
  2017-11-06 15:02       ` Nikolay Borisov
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2017-11-06 13:51 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Thu, Nov 02, 2017 at 12:29:17PM +0200, Nikolay Borisov wrote:
> On 31.10.2017 19:44, David Sterba wrote:
> > Overview of the main locks protecting various device-related structures.
> > 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 66 insertions(+)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 75aed8ec64bd..098affc58361 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> >  			     struct btrfs_bio **bbio_ret,
> >  			     int mirror_num, int need_raid_map);
> >  
> > +/*
> > + * Device locking
> > + * ==============
> > + *
> > + * There are several mutexes that protect manipulation of devices and low-level
> > + * structures like chunks but not block groups, extents or files
> 
> This sentence suggests you are describing the various mutexes but it
> seems you are also describing data structure below i.e.
> global::fs_devs/btrfs_device::name

In my understanding, documenting a mutex means defining the context and
data that are protected in the context. So I can start with the mutex,
or the data itself, but the mutex makes IMO more sense to start.


> > + * - uuid_mutex (global lock)
> > + *
> > + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
> > + *   from the SCAN_DEV ioctl registration or from mount either implicitly
> > + *   (the first device) or requested by the device= mount option
> > + *
> > + *   the mutex can be very coarse and can cover long-running operations
> > + *
> > + *   protects: updates to fs_devices counters like missing devices, rw devices,
> > + *   seeding, structure cloning, openning/closing devices at mount/umount time
> 
> Perhaps move the uuid_mutex description after btrfs_device::name. That
> way mutexes are grouped together and data structures are grouped as well.

The grouping is by mutex.

> > + *
> > + *   global::fs_devs - add, remove, updates to the global list
> 
> You seem to be describing a data structure here rather than a mutex

As I read it, this means "if this mutex is taken, it protects
global::fs_devs in this way".

> > + *   does not protect: manipulation of the fs_devices::devices list!
> 
> but this sentence is written as if you've just described a mutex. The
> end result is that it's not clear what mutex you are referring to here.

That's the mutex in the section above "- uuid_mutex (global lock)".

> > + *   btrfs_device::name - renames (write side), read is RCU
> 
> It's not clear how the write side is protected.

same answer, it's under the 'uuid_mutex', so the write side is protected
by that, with exception of reads protected by RCU.

I wonder what kind of documentation structure do you expect as it seems
to me you missed it completely. Which is a valuable feedback to me, but
I don't know how to fix it, other than make the sections more visible
with underlining or indenting or more whitespace.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees
  2017-11-02  9:49   ` Nikolay Borisov
@ 2017-11-06 13:59     ` David Sterba
  0 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-11-06 13:59 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Thu, Nov 02, 2017 at 11:49:15AM +0200, Nikolay Borisov wrote:
> 
> 
> On 31.10.2017 19:44, David Sterba wrote:
> > We take the fs_devices::device_list_mutex mutex in write_all_supers
> > which will prevent any add/del changes to the device list. Therefore we
> > don't need to use the RCU variant list_for_each_entry_rcu in any of the
> > called functions.
> > 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >  fs/btrfs/disk-io.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index efce9a2fa9be..042cf46e4cd0 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -3396,9 +3396,10 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
> >  	int errors_wait = 0;
> >  	blk_status_t ret;
> >  
> > +	WARN_ON(!mutex_is_locked(&info->fs_devices->device_list_mutex));
> 
> Don't we want lockdep_assert_held ? I'm assuming enough testing with
> lockdep on is performed so it will be caught in time if this invariant
> is broken.

Yes, lockdep_assert_held should enough, this is a development-time
warning. Will be fixed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-06 13:51     ` David Sterba
@ 2017-11-06 15:02       ` Nikolay Borisov
  2017-11-06 15:09         ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Nikolay Borisov @ 2017-11-06 15:02 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On  6.11.2017 15:51, David Sterba wrote:
> On Thu, Nov 02, 2017 at 12:29:17PM +0200, Nikolay Borisov wrote:
>> On 31.10.2017 19:44, David Sterba wrote:
>>> Overview of the main locks protecting various device-related structures.
>>>
>>> Signed-off-by: David Sterba <dsterba@suse.com>
>>> ---
>>>  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 66 insertions(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 75aed8ec64bd..098affc58361 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>>>  			     struct btrfs_bio **bbio_ret,
>>>  			     int mirror_num, int need_raid_map);
>>>  
>>> +/*
>>> + * Device locking
>>> + * ==============
>>> + *
>>> + * There are several mutexes that protect manipulation of devices and low-level
>>> + * structures like chunks but not block groups, extents or files
>>
>> This sentence suggests you are describing the various mutexes but it
>> seems you are also describing data structure below i.e.
>> global::fs_devs/btrfs_device::name
> 
> In my understanding, documenting a mutex means defining the context and
> data that are protected in the context. So I can start with the mutex,
> or the data itself, but the mutex makes IMO more sense to start.
> 
> 
>>> + * - uuid_mutex (global lock)
>>> + *
>>> + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
>>> + *   from the SCAN_DEV ioctl registration or from mount either implicitly
>>> + *   (the first device) or requested by the device= mount option
>>> + *
>>> + *   the mutex can be very coarse and can cover long-running operations
>>> + *
>>> + *   protects: updates to fs_devices counters like missing devices, rw devices,
>>> + *   seeding, structure cloning, openning/closing devices at mount/umount time
>>
>> Perhaps move the uuid_mutex description after btrfs_device::name. That
>> way mutexes are grouped together and data structures are grouped as well.
> 
> The grouping is by mutex.
> 
>>> + *
>>> + *   global::fs_devs - add, remove, updates to the global list
>>
>> You seem to be describing a data structure here rather than a mutex
> 
> As I read it, this means "if this mutex is taken, it protects
> global::fs_devs in this way".
> 
>>> + *   does not protect: manipulation of the fs_devices::devices list!
>>
>> but this sentence is written as if you've just described a mutex. The
>> end result is that it's not clear what mutex you are referring to here.
> 
> That's the mutex in the section above "- uuid_mutex (global lock)".
> 
>>> + *   btrfs_device::name - renames (write side), read is RCU
>>
>> It's not clear how the write side is protected.
> 
> same answer, it's under the 'uuid_mutex', so the write side is protected
> by that, with exception of reads protected by RCU.
> 
> I wonder what kind of documentation structure do you expect as it seems
> to me you missed it completely. Which is a valuable feedback to me, but
> I don't know how to fix it, other than make the sections more visible
> with underlining or indenting or more whitespace.

Right, I did a fresh read and indeed the structure now makes sense. I
have completely missed the nesting though. Maybe you can nest more
agressively, now every sentence is aligned to the header, perhaps we
want to have it one level deeper? Or use more than one - at the mutex
level ? I don't really have a clear-cut answer for you.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/11] btrfs: document device locking
  2017-11-06 15:02       ` Nikolay Borisov
@ 2017-11-06 15:09         ` David Sterba
  0 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2017-11-06 15:09 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, linux-btrfs

On Mon, Nov 06, 2017 at 05:02:21PM +0200, Nikolay Borisov wrote:
> Right, I did a fresh read and indeed the structure now makes sense. I
> have completely missed the nesting though. Maybe you can nest more
> agressively, now every sentence is aligned to the header, perhaps we
> want to have it one level deeper? Or use more than one - at the mutex
> level ? I don't really have a clear-cut answer for you.

How about that:

/*
 * Device locking
 * ==============
 *
 * There are several mutexes that protect manipulation of devices and low-level
 * structures like chunks but not block groups, extents or files
 *
 * uuid_mutex (global lock)
 * ------------------------
 * protects the fs_uuids list that tracks all per-fs fs_devices, resulting from
 * the SCAN_DEV ioctl registration or from mount either implicitly (the first
 * device) or requested by the device= mount option
 *
 * the mutex can be very coarse and can cover long-running operations
 *
 * protects: updates to fs_devices counters like missing devices, rw devices,
 * seeding, structure cloning, openning/closing devices at mount/umount time
 *
 * global::fs_devs - add, remove, updates to the global list
 *
 * does not protect: manipulation of the fs_devices::devices list!
 *
 * btrfs_device::name - renames (write side), read is RCU
 *
 * fs_devices::device_list_mutex (per-fs, with RCU)
 * ------------------------------------------------
 * protects updates to fs_devices::devices, ie. adding and deleting
 *
 * simple list traversal with read-only actions can be done with RCU protection
 *
 * may be used to exclude some operations from running concurrently without any
 * modifications to the list (see write_all_supers)
 *
 * volume_mutex
 * ------------
 * coarse lock owned by a mounted filesystem; used to exclude some operations
 * that cannot run in parallel and affect the higher-level properties of the
 * filesystem like: device add/deleting/resize/replace, or balance
 *
 * balance_mutex
 * -------------
 * protects balance structures (status, state) and context accessed from
 * several places (internally, ioctl)
 *
 * chunk_mutex
 * -----------
 * protects chunks, adding or removing during allocation, trim or when a new
 * device is added/removed
 *
 * cleaner_mutex
 * -------------
 * a big lock that is held by the cleaner thread and prevents running subvolume
 * cleaning together with relocation or delayed iputs
 *
 *
 * Lock nesting
 * ============
 *
 * uuid_mutex
 *   volume_mutex
 *     device_list_mutex
 *       chunk_mutex
 *     balance_mutex
 */

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-11-06 15:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 17:44 [PATCH 00/11] Device fixes and cleanups David Sterba
2017-10-31 17:44 ` [PATCH 01/11] btrfs: add missing device::flush_bio puts David Sterba
2017-11-02  9:41   ` Nikolay Borisov
2017-11-06 13:24     ` David Sterba
2017-11-02 10:40   ` Anand Jain
2017-10-31 17:44 ` [PATCH 02/11] btrfs: rename device free rcu helper to free_device_rcu David Sterba
2017-10-31 17:44 ` [PATCH 03/11] btrfs: introduce free_device helper David Sterba
2017-10-31 17:44 ` [PATCH 04/11] btrfs: use free_device where opencoded David Sterba
2017-11-02 11:25   ` Anand Jain
2017-10-31 17:44 ` [PATCH 05/11] btrfs: simplify exit paths in btrfs_init_new_device David Sterba
2017-11-06  1:53   ` Anand Jain
2017-10-31 17:44 ` [PATCH 06/11] btrfs: document device locking David Sterba
2017-11-02 10:29   ` Nikolay Borisov
2017-11-06 13:51     ` David Sterba
2017-11-06 15:02       ` Nikolay Borisov
2017-11-06 15:09         ` David Sterba
2017-11-03 11:13   ` Anand Jain
2017-11-06  2:32     ` Anand Jain
2017-11-06 13:40       ` David Sterba
2017-11-06 13:36     ` David Sterba
2017-10-31 17:44 ` [PATCH 07/11] btrfs: dev_alloc_list is not protected by RCU, use normal list_del David Sterba
2017-10-31 17:44 ` [PATCH 08/11] btrfs: simplify btrfs_close_bdev David Sterba
2017-10-31 17:44 ` [PATCH 09/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info David Sterba
2017-10-31 17:44 ` [PATCH 10/11] btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info David Sterba
2017-10-31 17:44 ` [PATCH 11/11] btrfs: use non-RCU list traversal in write_all_supers callees David Sterba
2017-11-02  9:49   ` Nikolay Borisov
2017-11-06 13:59     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).