All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16 v1] Kill fs_info::volume_mutex
@ 2018-04-03 18:34 David Sterba
  2018-04-03 18:34 ` [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This series gets rid of the volume mutex. The fstests do not pass
cleanly, 2 or more tests fail so this needs to be fixed, but otherwise
majority of the work ready for review.

The merge target is 4.18 and I'll probably not get back to this pathset
during merge window (nor add it to next), so there should be enough time
for review.

David Sterba (16):
  btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  btrfs: make success path out of btrfs_init_dev_replace_tgtdev more
    clear
  btrfs: export and rename free_device
  btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make
    static
  btrfs: move volume_mutex to callers of btrfs_rm_device
  btrfs: move clearing of EXCL_OP out of __cancel_balance
  btrfs: add proper safety check before resuming dev-replace
  btrfs: add sanity check when resuming balance after mount
  btrfs: cleanup helpers that reset balance state
  btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  btrfs: kill btrfs_fs_info::volume_mutex
  btrfs: track running balance in a simpler way
  btrfs: remove redundant read-only check from btrfs_cancel_balance
  btrfs: drop lock parameter from update_ioctl_balance_args and rename
  btrfs: use mutex in btrfs_resume_balance_async
  btrfs: open code set_balance_control

 fs/btrfs/ctree.h       |   9 +-
 fs/btrfs/dev-replace.c | 135 +++++++++++++++++++++-----
 fs/btrfs/disk-io.c     |   2 -
 fs/btrfs/extent-tree.c |   2 +-
 fs/btrfs/ioctl.c       |  41 ++++----
 fs/btrfs/volumes.c     | 252 +++++++++++++------------------------------------
 fs/btrfs/volumes.h     |   5 +-
 7 files changed, 205 insertions(+), 241 deletions(-)

-- 
2.16.2


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:40   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function is called once and is fairly small, we can merge it with
the caller.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 0d203633bb96..c0397cd2a3ba 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -45,8 +45,6 @@ static void btrfs_dev_replace_update_device_in_mapping_tree(
 						struct btrfs_device *srcdev,
 						struct btrfs_device *tgtdev);
 static int btrfs_dev_replace_kthread(void *data);
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info);
-
 
 int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 {
@@ -822,6 +820,7 @@ static int btrfs_dev_replace_kthread(void *data)
 	struct btrfs_fs_info *fs_info = data;
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	u64 progress;
+	int ret;
 
 	progress = btrfs_dev_replace_progress(fs_info);
 	progress = div_u64(progress, 10);
@@ -832,23 +831,14 @@ static int btrfs_dev_replace_kthread(void *data)
 		btrfs_dev_name(dev_replace->tgtdev),
 		(unsigned int)progress);
 
-	btrfs_dev_replace_continue_on_mount(fs_info);
-	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
-
-	return 0;
-}
-
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
-	int ret;
-
 	ret = btrfs_scrub_dev(fs_info, dev_replace->srcdev->devid,
 			      dev_replace->committed_cursor_left,
 			      btrfs_device_get_total_bytes(dev_replace->srcdev),
 			      &dev_replace->scrub_progress, 0, 1);
 	ret = btrfs_dev_replace_finishing(fs_info, ret);
 	WARN_ON(ret);
+
+	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	return 0;
 }
 
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
  2018-04-03 18:34 ` [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:40   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 03/16] btrfs: export and rename free_device David Sterba
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This is a preparatory cleanup that will make clear that the only
successful way out of btrfs_init_dev_replace_tgtdev will also set the
device_out to a valid pointer. With this guarantee, the callers can be
simplified.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 1 -
 fs/btrfs/volumes.c     | 8 +++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index c0397cd2a3ba..e3b8a79c1665 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -370,7 +370,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	dev_replace->cont_reading_from_srcdev_mode = read_src;
 	WARN_ON(!src_device);
 	dev_replace->srcdev = src_device;
-	WARN_ON(!tgt_device);
 	dev_replace->tgtdev = tgt_device;
 
 	btrfs_info_in_rcu(fs_info,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93f8f17cacca..8086c5687b72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,6 +2592,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
 int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 				  const char *device_path,
 				  struct btrfs_device *srcdev,
@@ -2678,7 +2684,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 
 	*device_out = device;
-	return ret;
+	return 0;
 
 error:
 	blkdev_put(bdev, FMODE_EXCL);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/16] btrfs: export and rename free_device
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
  2018-04-03 18:34 ` [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
  2018-04-03 18:34 ` [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:41   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function will be used outside of volumes.c, the allocation
btrfs_alloc_device is also exported.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 24 ++++++++++++------------
 fs/btrfs/volumes.h |  1 +
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8086c5687b72..ca5521cc1a5b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -246,7 +246,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
 	return fs_devs;
 }
 
-static void free_device(struct btrfs_device *device)
+void btrfs_free_device(struct btrfs_device *device)
 {
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
@@ -261,7 +261,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 		device = list_entry(fs_devices->devices.next,
 				    struct btrfs_device, dev_list);
 		list_del(&device->dev_list);
-		free_device(device);
+		btrfs_free_device(device);
 	}
 	kfree(fs_devices);
 }
@@ -294,7 +294,7 @@ void __exit btrfs_cleanup_fs_uuids(void)
 /*
  * Returns a pointer to a new btrfs_device on success; ERR_PTR() on error.
  * Returned struct is not linked onto any lists and must be destroyed using
- * free_device.
+ * btrfs_free_device.
  */
 static struct btrfs_device *__alloc_device(void)
 {
@@ -650,7 +650,7 @@ static void btrfs_free_stale_devices(const char *path,
 			} else {
 				fs_devs->num_devices--;
 				list_del(&dev->dev_list);
-				free_device(dev);
+				btrfs_free_device(dev);
 			}
 		}
 	}
@@ -765,7 +765,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name) {
-			free_device(device);
+			btrfs_free_device(device);
 			return ERR_PTR(-ENOMEM);
 		}
 		rcu_assign_pointer(device->name, name);
@@ -878,7 +878,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 			name = rcu_string_strdup(orig_dev->name->str,
 					GFP_KERNEL);
 			if (!name) {
-				free_device(device);
+				btrfs_free_device(device);
 				goto error;
 			}
 			rcu_assign_pointer(device->name, name);
@@ -950,7 +950,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
 		}
 		list_del_init(&device->dev_list);
 		fs_devices->num_devices--;
-		free_device(device);
+		btrfs_free_device(device);
 	}
 
 	if (fs_devices->seed) {
@@ -968,7 +968,7 @@ static void free_device_rcu(struct rcu_head *head)
 	struct btrfs_device *device;
 
 	device = container_of(head, struct btrfs_device, rcu);
-	free_device(device);
+	btrfs_free_device(device);
 }
 
 static void btrfs_close_bdev(struct btrfs_device *device)
@@ -2582,7 +2582,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	if (trans)
 		btrfs_end_transaction(trans);
 error_free_device:
-	free_device(device);
+	btrfs_free_device(device);
 error:
 	blkdev_put(bdev, FMODE_EXCL);
 	if (seeding_dev && !unlocked) {
@@ -2653,7 +2653,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
-		free_device(device);
+		btrfs_free_device(device);
 		ret = -ENOMEM;
 		goto error;
 	}
@@ -6419,7 +6419,7 @@ static struct btrfs_device *add_missing_dev(struct btrfs_fs_devices *fs_devices,
  *
  * Return: a pointer to a new &struct btrfs_device on success; ERR_PTR()
  * on error.  Returned struct is not linked onto any lists and must be
- * destroyed with free_device.
+ * destroyed with btrfs_free_device.
  */
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u64 *devid,
@@ -6442,7 +6442,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 
 		ret = find_next_devid(fs_info, &tmp);
 		if (ret) {
-			free_device(dev);
+			btrfs_free_device(dev);
 			return ERR_PTR(ret);
 		}
 	}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d1fcaea9fef5..45e3ece21290 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -434,6 +434,7 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u64 *devid,
 					const u8 *uuid);
+void btrfs_free_device(struct btrfs_device *device);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 		    const char *device_path, u64 devid);
 void __exit btrfs_cleanup_fs_uuids(void);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (2 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 03/16] btrfs: export and rename free_device David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:41   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function logically belongs there and there's only a single caller,
no need to export it. No code changes.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c     | 99 --------------------------------------------------
 fs/btrfs/volumes.h     |  4 --
 3 files changed, 99 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e3b8a79c1665..7a87ffad041e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -188,6 +188,105 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
+static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
+				  const char *device_path,
+				  struct btrfs_device *srcdev,
+				  struct btrfs_device **device_out)
+{
+	struct btrfs_device *device;
+	struct block_device *bdev;
+	struct list_head *devices;
+	struct rcu_string *name;
+	u64 devid = BTRFS_DEV_REPLACE_DEVID;
+	int ret = 0;
+
+	*device_out = NULL;
+	if (fs_info->fs_devices->seeding) {
+		btrfs_err(fs_info, "the filesystem is a seed filesystem!");
+		return -EINVAL;
+	}
+
+	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
+				  fs_info->bdev_holder);
+	if (IS_ERR(bdev)) {
+		btrfs_err(fs_info, "target device %s is invalid!", device_path);
+		return PTR_ERR(bdev);
+	}
+
+	filemap_write_and_wait(bdev->bd_inode->i_mapping);
+
+	devices = &fs_info->fs_devices->devices;
+	list_for_each_entry(device, devices, dev_list) {
+		if (device->bdev == bdev) {
+			btrfs_err(fs_info,
+				  "target device is in the filesystem!");
+			ret = -EEXIST;
+			goto error;
+		}
+	}
+
+
+	if (i_size_read(bdev->bd_inode) <
+	    btrfs_device_get_total_bytes(srcdev)) {
+		btrfs_err(fs_info,
+			  "target device is smaller than source device!");
+		ret = -EINVAL;
+		goto error;
+	}
+
+
+	device = btrfs_alloc_device(NULL, &devid, NULL);
+	if (IS_ERR(device)) {
+		ret = PTR_ERR(device);
+		goto error;
+	}
+
+	name = rcu_string_strdup(device_path, GFP_KERNEL);
+	if (!name) {
+		btrfs_free_device(device);
+		ret = -ENOMEM;
+		goto error;
+	}
+	rcu_assign_pointer(device->name, name);
+
+	mutex_lock(&fs_info->fs_devices->device_list_mutex);
+	set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+	device->generation = 0;
+	device->io_width = fs_info->sectorsize;
+	device->io_align = fs_info->sectorsize;
+	device->sector_size = fs_info->sectorsize;
+	device->total_bytes = btrfs_device_get_total_bytes(srcdev);
+	device->disk_total_bytes = btrfs_device_get_disk_total_bytes(srcdev);
+	device->bytes_used = btrfs_device_get_bytes_used(srcdev);
+	device->commit_total_bytes = srcdev->commit_total_bytes;
+	device->commit_bytes_used = device->bytes_used;
+	device->fs_info = fs_info;
+	device->bdev = bdev;
+	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
+	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+	device->mode = FMODE_EXCL;
+	device->dev_stats_valid = 1;
+	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	device->fs_devices = fs_info->fs_devices;
+	list_add(&device->dev_list, &fs_info->fs_devices->devices);
+	fs_info->fs_devices->num_devices++;
+	fs_info->fs_devices->open_devices++;
+	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+
+	*device_out = device;
+	return 0;
+
+error:
+	blkdev_put(bdev, FMODE_EXCL);
+	return ret;
+}
+
 /*
  * called from commit_transaction. Writes changed device replace state to
  * disk.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ca5521cc1a5b..df2d6d06e014 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,105 +2592,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 }
 
-/*
- * Initialize a new device for device replace target from a given source dev
- * and path.
- *
- * Return 0 and new device in @device_out, otherwise return < 0
- */
-int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
-				  const char *device_path,
-				  struct btrfs_device *srcdev,
-				  struct btrfs_device **device_out)
-{
-	struct btrfs_device *device;
-	struct block_device *bdev;
-	struct list_head *devices;
-	struct rcu_string *name;
-	u64 devid = BTRFS_DEV_REPLACE_DEVID;
-	int ret = 0;
-
-	*device_out = NULL;
-	if (fs_info->fs_devices->seeding) {
-		btrfs_err(fs_info, "the filesystem is a seed filesystem!");
-		return -EINVAL;
-	}
-
-	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
-	if (IS_ERR(bdev)) {
-		btrfs_err(fs_info, "target device %s is invalid!", device_path);
-		return PTR_ERR(bdev);
-	}
-
-	filemap_write_and_wait(bdev->bd_inode->i_mapping);
-
-	devices = &fs_info->fs_devices->devices;
-	list_for_each_entry(device, devices, dev_list) {
-		if (device->bdev == bdev) {
-			btrfs_err(fs_info,
-				  "target device is in the filesystem!");
-			ret = -EEXIST;
-			goto error;
-		}
-	}
-
-
-	if (i_size_read(bdev->bd_inode) <
-	    btrfs_device_get_total_bytes(srcdev)) {
-		btrfs_err(fs_info,
-			  "target device is smaller than source device!");
-		ret = -EINVAL;
-		goto error;
-	}
-
-
-	device = btrfs_alloc_device(NULL, &devid, NULL);
-	if (IS_ERR(device)) {
-		ret = PTR_ERR(device);
-		goto error;
-	}
-
-	name = rcu_string_strdup(device_path, GFP_KERNEL);
-	if (!name) {
-		btrfs_free_device(device);
-		ret = -ENOMEM;
-		goto error;
-	}
-	rcu_assign_pointer(device->name, name);
-
-	mutex_lock(&fs_info->fs_devices->device_list_mutex);
-	set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
-	device->generation = 0;
-	device->io_width = fs_info->sectorsize;
-	device->io_align = fs_info->sectorsize;
-	device->sector_size = fs_info->sectorsize;
-	device->total_bytes = btrfs_device_get_total_bytes(srcdev);
-	device->disk_total_bytes = btrfs_device_get_disk_total_bytes(srcdev);
-	device->bytes_used = btrfs_device_get_bytes_used(srcdev);
-	device->commit_total_bytes = srcdev->commit_total_bytes;
-	device->commit_bytes_used = device->bytes_used;
-	device->fs_info = fs_info;
-	device->bdev = bdev;
-	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
-	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
-	device->mode = FMODE_EXCL;
-	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
-	device->fs_devices = fs_info->fs_devices;
-	list_add(&device->dev_list, &fs_info->fs_devices->devices);
-	fs_info->fs_devices->num_devices++;
-	fs_info->fs_devices->open_devices++;
-	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-
-	*device_out = device;
-	return 0;
-
-error:
-	blkdev_put(bdev, FMODE_EXCL);
-	return ret;
-}
-
 static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
 					struct btrfs_device *device)
 {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 45e3ece21290..681be2dc0c6a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -445,10 +445,6 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 				       u8 *uuid, u8 *fsid);
 int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
 int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *path);
-int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
-				  const char *device_path,
-				  struct btrfs_device *srcdev,
-				  struct btrfs_device **device_out);
 int btrfs_balance(struct btrfs_balance_control *bctl,
 		  struct btrfs_ioctl_balance_args *bargs);
 int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (3 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:41   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation
so it's obvious that the two happen at the same time.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   | 4 ++++
 fs/btrfs/volumes.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ac85e07f567b..b93dea445802 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2674,6 +2674,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out;
 	}
+	mutex_lock(&fs_info->volume_mutex);
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
 		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
@@ -2681,6 +2682,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
 		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
 	}
+	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 
 	if (!ret) {
@@ -2716,6 +2718,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out_drop_write;
 	}
+	mutex_lock(&fs_info->volume_mutex);
 
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
@@ -2730,6 +2733,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 	kfree(vol_args);
 out:
+	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out_drop_write:
 	mnt_drop_write_file(file);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index df2d6d06e014..e0bd181dc9e0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1932,7 +1932,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	u64 num_devices;
 	int ret = 0;
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&uuid_mutex);
 
 	num_devices = fs_info->fs_devices->num_devices;
@@ -2048,7 +2047,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 
 out:
 	mutex_unlock(&uuid_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 	return ret;
 
 error_undo:
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (4 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-05  9:42   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Make the clearning visible in the callers so we can pair it with the
test_and_set part.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   |  2 +-
 fs/btrfs/volumes.c | 15 ++++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b93dea445802..582bde5b7eda 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4656,7 +4656,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
 	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
 	 * or, if restriper was paused all the way until unmount, in
-	 * free_fs_info.  The flag is cleared in __cancel_balance.
+	 * free_fs_info.  The flag should be cleared after __cancel_balance.
 	 */
 	need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e0bd181dc9e0..eb78c1d0ce2b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3760,8 +3760,6 @@ static void __cancel_balance(struct btrfs_fs_info *fs_info)
 	ret = del_balance_item(fs_info);
 	if (ret)
 		btrfs_handle_fs_error(fs_info, ret, NULL);
-
-	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 }
 
 /* Non-zero return value signifies invalidity */
@@ -3919,6 +3917,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
 	    balance_need_close(fs_info)) {
 		__cancel_balance(fs_info);
+		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	}
 
 	wake_up(&fs_info->balance_wait_q);
@@ -3926,11 +3925,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	return ret;
 out:
 	if (bctl->flags & BTRFS_BALANCE_RESUME)
-		__cancel_balance(fs_info);
-	else {
+		reset_balance_state(fs_info);
+	else
 		kfree(bctl);
-		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
-	}
+	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
+
 	return ret;
 }
 
@@ -4089,8 +4088,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 
-		if (fs_info->balance_ctl)
+		if (fs_info->balance_ctl) {
 			__cancel_balance(fs_info);
+			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
+		}
 
 		mutex_unlock(&fs_info->volume_mutex);
 	}
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (5 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-06 20:06   ` Sasha Levin
  2018-04-07  6:42   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.

The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.

Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7a87ffad041e..346bd460f8e7 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -908,7 +908,17 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
 	}
 	btrfs_dev_replace_write_unlock(dev_replace);
 
-	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
+	/*
+	 * This could collide with a paused balance, but the exclusive op logic
+	 * should never allow both to start and pause. We don't want to allow
+	 * dev-replace to start anyway.
+	 */
+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+		btrfs_info(fs_info,
+		"cannot resume dev-replace, other exclusive operation running");
+		return 0;
+	}
+
 	task = kthread_run(btrfs_dev_replace_kthread, fs_info, "btrfs-devrepl");
 	return PTR_ERR_OR_ZERO(task);
 }
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (6 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-06 20:06   ` Sasha Levin
                     ` (2 more replies)
  2018-04-03 18:34 ` [PATCH 09/16] btrfs: cleanup helpers that reset balance state David Sterba
                   ` (8 subsequent siblings)
  16 siblings, 3 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.

Also document that the paused balance starts here and owns the exclusive
op status.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index eb78c1d0ce2b..843982a2cbdb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	struct btrfs_key key;
 	int ret;
 
+	/*
+	 * This should never happen, as the paused balance state is recovered
+	 * during mount without any chance of other exclusive ops to collide.
+	 * Let it fail early and do not continue mount.
+	 *
+	 * Otherwise, this gives the exclusive op status to balance and keeps
+	 * in paused state until user intervention (cancel or umount).
+	 */
+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+		btrfs_err(fs_info,
+			"cannot set exclusive op status to resume balance");
+		return -EINVAL;
+	}
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -4018,8 +4032,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	btrfs_balance_sys(leaf, item, &disk_bargs);
 	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
 
-	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
-
 	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
 
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (7 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-09  7:43   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   |  8 ++++----
 fs/btrfs/volumes.c | 27 ++++++++++++---------------
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 582bde5b7eda..2f172122b63d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4653,10 +4653,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 
 do_balance:
 	/*
-	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
-	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
-	 * or, if restriper was paused all the way until unmount, in
-	 * free_fs_info.  The flag should be cleared after __cancel_balance.
+	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP goes to to
+	 * btrfs_balance.  bctl is freed in reset_balance_state, or, if
+	 * restriper was paused all the way until unmount, in free_fs_info.
+	 * The flag should be cleared after reset_balance_state.
 	 */
 	need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 843982a2cbdb..b073ab4c3c70 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3192,7 +3192,7 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 /*
  * Should be called with both balance and volume mutexes held to
  * serialize other volume operations (add_dev/rm_dev/resize) with
- * restriper.  Same goes for unset_balance_control.
+ * restriper.  Same goes for reset_balance_state.
  */
 static void set_balance_control(struct btrfs_balance_control *bctl)
 {
@@ -3205,9 +3205,13 @@ static void set_balance_control(struct btrfs_balance_control *bctl)
 	spin_unlock(&fs_info->balance_lock);
 }
 
-static void unset_balance_control(struct btrfs_fs_info *fs_info)
+/*
+ * Clear the balance status in fs_info and delete the balance item from disk.
+ */
+static void reset_balance_state(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_balance_control *bctl = fs_info->balance_ctl;
+	int ret;
 
 	BUG_ON(!fs_info->balance_ctl);
 
@@ -3216,6 +3220,9 @@ static void unset_balance_control(struct btrfs_fs_info *fs_info)
 	spin_unlock(&fs_info->balance_lock);
 
 	kfree(bctl);
+	ret = del_balance_item(fs_info);
+	if (ret)
+		btrfs_handle_fs_error(fs_info, ret, NULL);
 }
 
 /*
@@ -3752,16 +3759,6 @@ static inline int balance_need_close(struct btrfs_fs_info *fs_info)
 		 atomic_read(&fs_info->balance_cancel_req) == 0);
 }
 
-static void __cancel_balance(struct btrfs_fs_info *fs_info)
-{
-	int ret;
-
-	unset_balance_control(fs_info);
-	ret = del_balance_item(fs_info);
-	if (ret)
-		btrfs_handle_fs_error(fs_info, ret, NULL);
-}
-
 /* Non-zero return value signifies invalidity */
 static inline int validate_convert_profile(struct btrfs_balance_args *bctl_arg,
 		u64 allowed)
@@ -3916,7 +3913,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
 	    balance_need_close(fs_info)) {
-		__cancel_balance(fs_info);
+		reset_balance_state(fs_info);
 		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	}
 
@@ -4095,13 +4092,13 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 			   atomic_read(&fs_info->balance_running) == 0);
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
-		/* __cancel_balance needs volume_mutex */
+		/* reset_balance_state needs volume_mutex */
 		mutex_unlock(&fs_info->balance_mutex);
 		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 
 		if (fs_info->balance_ctl) {
-			__cancel_balance(fs_info);
+			reset_balance_state(fs_info);
 			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 		}
 
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (8 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 09/16] btrfs: cleanup helpers that reset balance state David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-09  8:39   ` Anand Jain
  2018-04-09 14:55   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 7 +------
 fs/btrfs/volumes.c     | 4 ----
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 346bd460f8e7..ba011af9b0d2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	struct btrfs_device *tgt_device = NULL;
 	struct btrfs_device *src_device = NULL;
 
-	/* the disk copy procedure reuses the scrub code */
-	mutex_lock(&fs_info->volume_mutex);
 	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
 					    srcdev_name, &src_device);
-	if (ret) {
-		mutex_unlock(&fs_info->volume_mutex);
+	if (ret)
 		return ret;
-	}
 
 	ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
 					    src_device, &tgt_device);
-	mutex_unlock(&fs_info->volume_mutex);
 	if (ret)
 		return ret;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b073ab4c3c70..0ae29cd69363 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
 		struct btrfs_device *tmp;
 
 		devices = &fs_info->fs_devices->devices;
-		/*
-		 * It is safe to read the devices since the volume_mutex
-		 * is held by the caller.
-		 */
 		list_for_each_entry(tmp, devices, dev_list) {
 			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
 					&tmp->dev_state) && !tmp->bdev) {
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (9 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-09 14:52   ` Anand Jain
  2018-04-13  5:30   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 12/16] btrfs: track running balance in a simpler way David Sterba
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
   add/del/resize copy the EXCL_OP bit next to the volume mutex, here
   it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
   EXCL_OP bit and the volume_mutex is held in the kernel thread that
   calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
   after balance and is excluded by the EXCL_OP bit. It does not take
   the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
   as the ioctls cannot be started in parallel. Nevertheless, a crafted
   image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
   protect against concurrent access to its status data. There's some
   trickery to maintain the right lock nesting in case we need to
   reexamine the status in btrfs_ioctl_balance. The volume_mutex is
   removed and the unlock/lock sequence is left in place as we might
   expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
   btrfs_cancel_balance to allow waiters to continue.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       |  1 -
 fs/btrfs/disk-io.c     |  1 -
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/ioctl.c       | 17 ++++-------------
 fs/btrfs/volumes.c     | 36 ++++++++++--------------------------
 5 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0eb55825862a..011cb9a8a967 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -826,7 +826,6 @@ struct btrfs_fs_info {
 	struct mutex transaction_kthread_mutex;
 	struct mutex cleaner_mutex;
 	struct mutex chunk_mutex;
-	struct mutex volume_mutex;
 
 	/*
 	 * this is taken to make sure we don't set block groups ro after
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b5e6f7df67..c0482ecea11f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2605,7 +2605,6 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
 	mutex_init(&fs_info->cleaner_mutex);
-	mutex_init(&fs_info->volume_mutex);
 	mutex_init(&fs_info->ro_block_group_mutex);
 	init_rwsem(&fs_info->commit_root_sem);
 	init_rwsem(&fs_info->cleanup_work_sem);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f30548d7e0d2..90d28a3727c6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4122,7 +4122,7 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
  * returns target flags in extended format or 0 if restripe for this
  * chunk_type is not in progress
  *
- * should be called with either volume_mutex or balance_lock held
+ * should be called with balance_lock held
  */
 static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2f172122b63d..410037281f2a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1470,7 +1470,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 	}
 
-	mutex_lock(&fs_info->volume_mutex);
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
 		ret = PTR_ERR(vol_args);
@@ -1578,7 +1577,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
 out_free:
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	mnt_drop_write_file(file);
 	return ret;
@@ -2626,7 +2624,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 
-	mutex_lock(&fs_info->volume_mutex);
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
 		ret = PTR_ERR(vol_args);
@@ -2641,7 +2638,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	return ret;
 }
@@ -2674,7 +2670,6 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out;
 	}
-	mutex_lock(&fs_info->volume_mutex);
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
 		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
@@ -2682,7 +2677,6 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
 		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
 	}
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 
 	if (!ret) {
@@ -2718,7 +2712,6 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out_drop_write;
 	}
-	mutex_lock(&fs_info->volume_mutex);
 
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
@@ -2733,7 +2726,6 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out_drop_write:
 	mnt_drop_write_file(file);
@@ -4552,7 +4544,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 
 again:
 	if (!test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
-		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 		need_unlock = true;
 		goto locked;
@@ -4569,8 +4560,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 		/* this is either (2) or (3) */
 		if (!atomic_read(&fs_info->balance_running)) {
 			mutex_unlock(&fs_info->balance_mutex);
-			if (!mutex_trylock(&fs_info->volume_mutex))
-				goto again;
+			/*
+			 * Lock released to allow other waiters to continue,
+			 * we'll reexamine the status again.
+			 */
 			mutex_lock(&fs_info->balance_mutex);
 
 			if (fs_info->balance_ctl &&
@@ -4581,7 +4574,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 			}
 
 			mutex_unlock(&fs_info->balance_mutex);
-			mutex_unlock(&fs_info->volume_mutex);
 			goto again;
 		} else {
 			/* this is (2) */
@@ -4674,7 +4666,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	kfree(bargs);
 out_unlock:
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 	if (need_unlock)
 		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0ae29cd69363..d1e3486343ae 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -179,12 +179,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
  * may be used to exclude some operations from running concurrently without any
  * modifications to the list (see write_all_supers)
  *
- * volume_mutex
- * ------------
- * coarse lock owned by a mounted filesystem; used to exclude some operations
- * that cannot run in parallel and affect the higher-level properties of the
- * filesystem like: device add/deleting/resize/replace, or balance
- *
  * balance_mutex
  * -------------
  * protects balance structures (status, state) and context accessed from
@@ -205,10 +199,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
  * ============
  *
  * uuid_mutex
- *   volume_mutex
- *     device_list_mutex
- *       chunk_mutex
- *     balance_mutex
+ *   device_list_mutex
+ *     chunk_mutex
+ *   balance_mutex
  */
 
 DEFINE_MUTEX(uuid_mutex);
@@ -3186,9 +3179,8 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 }
 
 /*
- * Should be called with both balance and volume mutexes held to
- * serialize other volume operations (add_dev/rm_dev/resize) with
- * restriper.  Same goes for reset_balance_state.
+ * Should be called with balance mutex held to protect against checking the
+ * balance status or progress. Same goes for reset_balance_state.
  */
 static void set_balance_control(struct btrfs_balance_control *bctl)
 {
@@ -3765,7 +3757,7 @@ static inline int validate_convert_profile(struct btrfs_balance_args *bctl_arg,
 }
 
 /*
- * Should be called with both balance and volume mutexes held
+ * Should be called with balance mutexe held
  */
 int btrfs_balance(struct btrfs_balance_control *bctl,
 		  struct btrfs_ioctl_balance_args *bargs)
@@ -3931,16 +3923,12 @@ static int balance_kthread(void *data)
 	struct btrfs_fs_info *fs_info = data;
 	int ret = 0;
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
-
 	if (fs_info->balance_ctl) {
 		btrfs_info(fs_info, "continuing balance");
 		ret = btrfs_balance(fs_info->balance_ctl, NULL);
 	}
-
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 
 	return ret;
 }
@@ -4025,13 +4013,9 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	btrfs_balance_sys(leaf, item, &disk_bargs);
 	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
-
 	set_balance_control(bctl);
-
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 out:
 	btrfs_free_path(path);
 	return ret;
@@ -4088,17 +4072,17 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 			   atomic_read(&fs_info->balance_running) == 0);
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
-		/* reset_balance_state needs volume_mutex */
 		mutex_unlock(&fs_info->balance_mutex);
-		mutex_lock(&fs_info->volume_mutex);
+		/*
+		 * Lock released to allow other waiters to continue, we'll
+		 * reexamine the status again.
+		 */
 		mutex_lock(&fs_info->balance_mutex);
 
 		if (fs_info->balance_ctl) {
 			reset_balance_state(fs_info);
 			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 		}
-
-		mutex_unlock(&fs_info->volume_mutex);
 	}
 
 	BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/16] btrfs: track running balance in a simpler way
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (10 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-03 18:34 ` [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance David Sterba
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Currently fs_info::balance_running is 0 or 1 and does not use the
semantics of atomics. The pause and cancel check for 0, that can happen
only after __btrfs_balance exits for whatever reason.

Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
times but will block on the balance_mutex that protects the
fs_info::flags bit.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h   |  6 +++++-
 fs/btrfs/disk-io.c |  1 -
 fs/btrfs/ioctl.c   |  6 +++---
 fs/btrfs/volumes.c | 18 ++++++++++--------
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 011cb9a8a967..fda94a264eb7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -726,6 +726,11 @@ struct btrfs_delayed_root;
  * (device replace, resize, device add/delete, balance)
  */
 #define BTRFS_FS_EXCL_OP			16
+/*
+ * Indicate that balance has been set up from the ioctl and is in the main
+ * phase. The fs_info::balance_ctl is initialized.
+ */
+#define BTRFS_FS_BALANCE_RUNNING		17
 
 struct btrfs_fs_info {
 	u8 fsid[BTRFS_FSID_SIZE];
@@ -991,7 +996,6 @@ struct btrfs_fs_info {
 	/* restriper state */
 	spinlock_t balance_lock;
 	struct mutex balance_mutex;
-	atomic_t balance_running;
 	atomic_t balance_pause_req;
 	atomic_t balance_cancel_req;
 	struct btrfs_balance_control *balance_ctl;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c0482ecea11f..b62559dfb053 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2168,7 +2168,6 @@ static void btrfs_init_balance(struct btrfs_fs_info *fs_info)
 {
 	spin_lock_init(&fs_info->balance_lock);
 	mutex_init(&fs_info->balance_mutex);
-	atomic_set(&fs_info->balance_running, 0);
 	atomic_set(&fs_info->balance_pause_req, 0);
 	atomic_set(&fs_info->balance_cancel_req, 0);
 	fs_info->balance_ctl = NULL;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 410037281f2a..e6f21f30416a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4506,7 +4506,7 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 
 	bargs->flags = bctl->flags;
 
-	if (atomic_read(&fs_info->balance_running))
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags))
 		bargs->state |= BTRFS_BALANCE_STATE_RUNNING;
 	if (atomic_read(&fs_info->balance_pause_req))
 		bargs->state |= BTRFS_BALANCE_STATE_PAUSE_REQ;
@@ -4558,7 +4558,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	mutex_lock(&fs_info->balance_mutex);
 	if (fs_info->balance_ctl) {
 		/* this is either (2) or (3) */
-		if (!atomic_read(&fs_info->balance_running)) {
+		if (!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 			mutex_unlock(&fs_info->balance_mutex);
 			/*
 			 * Lock released to allow other waiters to continue,
@@ -4567,7 +4567,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 			mutex_lock(&fs_info->balance_mutex);
 
 			if (fs_info->balance_ctl &&
-			    !atomic_read(&fs_info->balance_running)) {
+			    !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 				/* this is (3) */
 				need_unlock = false;
 				goto locked;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d1e3486343ae..a0420af1fad9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3886,13 +3886,14 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 		spin_unlock(&fs_info->balance_lock);
 	}
 
-	atomic_inc(&fs_info->balance_running);
+	ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
+	set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
 	mutex_unlock(&fs_info->balance_mutex);
 
 	ret = __btrfs_balance(fs_info);
 
 	mutex_lock(&fs_info->balance_mutex);
-	atomic_dec(&fs_info->balance_running);
+	clear_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
 
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
@@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 		return -ENOTCONN;
 	}
 
-	if (atomic_read(&fs_info->balance_running)) {
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 		atomic_inc(&fs_info->balance_pause_req);
 		mutex_unlock(&fs_info->balance_mutex);
 
 		wait_event(fs_info->balance_wait_q,
-			   atomic_read(&fs_info->balance_running) == 0);
+			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 
 		mutex_lock(&fs_info->balance_mutex);
 		/* we are good with balance_ctl ripped off from under us */
-		BUG_ON(atomic_read(&fs_info->balance_running));
+		BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 		atomic_dec(&fs_info->balance_pause_req);
 	} else {
 		ret = -ENOTCONN;
@@ -4066,10 +4067,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 	 * if we are running just wait and return, balance item is
 	 * deleted in btrfs_balance in this case
 	 */
-	if (atomic_read(&fs_info->balance_running)) {
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 		mutex_unlock(&fs_info->balance_mutex);
 		wait_event(fs_info->balance_wait_q,
-			   atomic_read(&fs_info->balance_running) == 0);
+			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
 		mutex_unlock(&fs_info->balance_mutex);
@@ -4085,7 +4086,8 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 		}
 	}
 
-	BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
+	BUG_ON(fs_info->balance_ctl ||
+		test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 	atomic_dec(&fs_info->balance_cancel_req);
 	mutex_unlock(&fs_info->balance_mutex);
 	return 0;
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (11 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 12/16] btrfs: track running balance in a simpler way David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-16  9:43   ` Anand Jain
  2018-04-03 18:34 ` [PATCH 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Balance cannot be started on a read-only filesystem and will have to
finish/exit before eg. going to read-only via remount. Cancelling does
not need to check for that.

In case the filesystem is forcibly set to read-only after an error,
balance will finish anyway and if the cancel call is too fast it will
just wait for that to happen. Again does not have to check.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a0420af1fad9..2956e7b4cb9f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4053,9 +4053,6 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
-	if (sb_rdonly(fs_info->sb))
-		return -EROFS;
-
 	mutex_lock(&fs_info->balance_mutex);
 	if (!fs_info->balance_ctl) {
 		mutex_unlock(&fs_info->balance_mutex);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (12 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-03 18:34 ` [PATCH 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The parameter controls locking of the stats part but we can lock it
unconditionally, as this only happens once when balance starts. This is
not performance critical.

Add the prefix for an exported function.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/ioctl.c   | 14 +++++---------
 fs/btrfs/volumes.c |  2 +-
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fda94a264eb7..7ac2e69bfb03 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3258,7 +3258,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		      u64 newer_than, unsigned long max_pages);
 void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
 			       struct btrfs_ioctl_balance_args *bargs);
 ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
 			   struct file *dst_file, u64 dst_loff);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e6f21f30416a..45759f1ea438 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4499,7 +4499,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
 			       struct btrfs_ioctl_balance_args *bargs)
 {
 	struct btrfs_balance_control *bctl = fs_info->balance_ctl;
@@ -4517,13 +4517,9 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 	memcpy(&bargs->meta, &bctl->meta, sizeof(bargs->meta));
 	memcpy(&bargs->sys, &bctl->sys, sizeof(bargs->sys));
 
-	if (lock) {
-		spin_lock(&fs_info->balance_lock);
-		memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
-		spin_unlock(&fs_info->balance_lock);
-	} else {
-		memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
-	}
+	spin_lock(&fs_info->balance_lock);
+	memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
+	spin_unlock(&fs_info->balance_lock);
 }
 
 static long btrfs_ioctl_balance(struct file *file, void __user *arg)
@@ -4709,7 +4705,7 @@ static long btrfs_ioctl_balance_progress(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 
-	update_ioctl_balance_args(fs_info, 1, bargs);
+	btrfs_update_ioctl_balance_args(fs_info, bargs);
 
 	if (copy_to_user(arg, bargs, sizeof(*bargs)))
 		ret = -EFAULT;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2956e7b4cb9f..4b7d54abfa34 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3897,7 +3897,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
-		update_ioctl_balance_args(fs_info, 0, bargs);
+		btrfs_update_ioctl_balance_args(fs_info, bargs);
 	}
 
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/16] btrfs: use mutex in btrfs_resume_balance_async
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (13 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-03 18:34 ` [PATCH 16/16] btrfs: open code set_balance_control David Sterba
  2018-04-05 14:31 ` [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
  16 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

While the spinlock does not cause problems, using the mutex is more
correct and consistent with others. The global status of balance is eg.
checked from btrfs_pause_balance or btrfs_cancel_balance with mutex.

Resuming balance happens during mount or ro->rw remount. In the former
case, no other user of the balance_ctl exists, in the latter, balance
cannot run until the ro/rw transition is finished.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4b7d54abfa34..e93c7ba44d06 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3938,12 +3938,12 @@ int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
 {
 	struct task_struct *tsk;
 
-	spin_lock(&fs_info->balance_lock);
+	mutex_lock(&fs_info->balance_mutex);
 	if (!fs_info->balance_ctl) {
-		spin_unlock(&fs_info->balance_lock);
+		mutex_unlock(&fs_info->balance_mutex);
 		return 0;
 	}
-	spin_unlock(&fs_info->balance_lock);
+	mutex_unlock(&fs_info->balance_mutex);
 
 	if (btrfs_test_opt(fs_info, SKIP_BALANCE)) {
 		btrfs_info(fs_info, "force skipping balance");
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 16/16] btrfs: open code set_balance_control
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (14 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
@ 2018-04-03 18:34 ` David Sterba
  2018-04-09 15:24   ` Anand Jain
  2018-04-05 14:31 ` [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
  16 siblings, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-03 18:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The helper is quite simple and I'd like to see the locking in the
caller.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e93c7ba44d06..db9f72c6af85 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3178,21 +3178,6 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 	}
 }
 
-/*
- * Should be called with balance mutex held to protect against checking the
- * balance status or progress. Same goes for reset_balance_state.
- */
-static void set_balance_control(struct btrfs_balance_control *bctl)
-{
-	struct btrfs_fs_info *fs_info = bctl->fs_info;
-
-	BUG_ON(fs_info->balance_ctl);
-
-	spin_lock(&fs_info->balance_lock);
-	fs_info->balance_ctl = bctl;
-	spin_unlock(&fs_info->balance_lock);
-}
-
 /*
  * Clear the balance status in fs_info and delete the balance item from disk.
  */
@@ -3878,7 +3863,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
 		BUG_ON(ret == -EEXIST);
-		set_balance_control(bctl);
+		BUG_ON(fs_info->balance_ctl);
+		spin_lock(&fs_info->balance_lock);
+		fs_info->balance_ctl = bctl;
+		spin_unlock(&fs_info->balance_lock);
 	} else {
 		BUG_ON(ret != -EEXIST);
 		spin_lock(&fs_info->balance_lock);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  2018-04-03 18:34 ` [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
@ 2018-04-05  9:40   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:40 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The function is called once and is fairly small, we can merge it with
> the caller.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
  2018-04-03 18:34 ` [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
@ 2018-04-05  9:40   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:40 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> This is a preparatory cleanup that will make clear that the only
> successful way out of btrfs_init_dev_replace_tgtdev will also set the
> device_out to a valid pointer. With this guarantee, the callers can be
> simplified.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/16] btrfs: export and rename free_device
  2018-04-03 18:34 ` [PATCH 03/16] btrfs: export and rename free_device David Sterba
@ 2018-04-05  9:41   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:41 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The function will be used outside of volumes.c, the allocation
> btrfs_alloc_device is also exported.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static
  2018-04-03 18:34 ` [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
@ 2018-04-05  9:41   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:41 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The function logically belongs there and there's only a single caller,
> no need to export it. No code changes.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>


Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device
  2018-04-03 18:34 ` [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
@ 2018-04-05  9:41   ` Anand Jain
  2018-04-05 14:28     ` David Sterba
  0 siblings, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:41 UTC (permalink / raw)
  To: David Sterba, linux-btrfs


> @@ -2716,6 +2718,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
>   		goto out_drop_write;
>   	}
> +	mutex_lock(&fs_info->volume_mutex);
>   
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args)) { > @@ -2730,6 +2733,7 @@ static long btrfs_ioctl_rm_dev(struct file 
*file, void __user *arg)
>   		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
>   	kfree(vol_args);
>   out:
> +	mutex_unlock(&fs_info->volume_mutex);
>   	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
>   out_drop_write:
>   	mnt_drop_write_file(file);

  Why not memdup_user() be outside of volume_mutex?
  But not a big deal either.


Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-03 18:34 ` [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
@ 2018-04-05  9:42   ` Anand Jain
  2018-04-05 14:04     ` David Sterba
  0 siblings, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-05  9:42 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



> @@ -3926,11 +3925,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   	return ret;
>   out:
>   	if (bctl->flags & BTRFS_BALANCE_RESUME)
> -		__cancel_balance(fs_info);
> -	else {
> +		reset_balance_state(fs_info);

  reset_balance_state() is something unrelated. More over compile fails 
at this patch.

----
fs/btrfs/volumes.c: In function ‘btrfs_balance’:
fs/btrfs/volumes.c:3928:3: error: implicit declaration of function 
‘reset_balance_state’; did you mean ‘insert_balance_item’? 
[-Werror=implicit-function-declaration]
    reset_balance_state(fs_info);
----

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-05  9:42   ` Anand Jain
@ 2018-04-05 14:04     ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-05 14:04 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Thu, Apr 05, 2018 at 05:42:14PM +0800, Anand Jain wrote:
> 
> 
> > @@ -3926,11 +3925,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
> >   	return ret;
> >   out:
> >   	if (bctl->flags & BTRFS_BALANCE_RESUME)
> > -		__cancel_balance(fs_info);
> > -	else {
> > +		reset_balance_state(fs_info);
> 
>   reset_balance_state() is something unrelated. More over compile fails 
> at this patch.
> 
> ----
> fs/btrfs/volumes.c: In function ‘btrfs_balance’:
> fs/btrfs/volumes.c:3928:3: error: implicit declaration of function 
> ‘reset_balance_state’; did you mean ‘insert_balance_item’? 
> [-Werror=implicit-function-declaration]
>     reset_balance_state(fs_info);

Right, it looks like a misapplied fixup for patch "btrfs: cleanup
helpers that reset balance state" that's farther in the series. Will
fix.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device
  2018-04-05  9:41   ` Anand Jain
@ 2018-04-05 14:28     ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-05 14:28 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Thu, Apr 05, 2018 at 05:41:57PM +0800, Anand Jain wrote:
> 
> > @@ -2716,6 +2718,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
> >   		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
> >   		goto out_drop_write;
> >   	}
> > +	mutex_lock(&fs_info->volume_mutex);
> >   
> >   	vol_args = memdup_user(arg, sizeof(*vol_args));
> >   	if (IS_ERR(vol_args)) { > @@ -2730,6 +2733,7 @@ static long btrfs_ioctl_rm_dev(struct file 
> *file, void __user *arg)
> >   		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
> >   	kfree(vol_args);
> >   out:
> > +	mutex_unlock(&fs_info->volume_mutex);
> >   	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> >   out_drop_write:
> >   	mnt_drop_write_file(file);
> 
>   Why not memdup_user() be outside of volume_mutex?

The point of the patch is to put the mutex_lock right next to the
exclusive operation bit setting. It's not optimal regarding the size of
critical section and normally the memdup should be outside.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/16 v1] Kill fs_info::volume_mutex
  2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
                   ` (15 preceding siblings ...)
  2018-04-03 18:34 ` [PATCH 16/16] btrfs: open code set_balance_control David Sterba
@ 2018-04-05 14:31 ` David Sterba
  16 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-05 14:31 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Tue, Apr 03, 2018 at 08:34:01PM +0200, David Sterba wrote:
> This series gets rid of the volume mutex. The fstests do not pass
> cleanly, 2 or more tests fail so this needs to be fixed, but otherwise
> majority of the work ready for review.
> 
> The merge target is 4.18 and I'll probably not get back to this pathset
> during merge window (nor add it to next), so there should be enough time
> for review.
> 
> David Sterba (16):
>   btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
>   btrfs: make success path out of btrfs_init_dev_replace_tgtdev more
>     clear
>   btrfs: export and rename free_device
>   btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make
>     static
>   btrfs: move volume_mutex to callers of btrfs_rm_device
>   btrfs: move clearing of EXCL_OP out of __cancel_balance
>   btrfs: add proper safety check before resuming dev-replace
>   btrfs: add sanity check when resuming balance after mount
>   btrfs: cleanup helpers that reset balance state
>   btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
>   btrfs: kill btrfs_fs_info::volume_mutex
>   btrfs: track running balance in a simpler way
>   btrfs: remove redundant read-only check from btrfs_cancel_balance
>   btrfs: drop lock parameter from update_ioctl_balance_args and rename
>   btrfs: use mutex in btrfs_resume_balance_async
>   btrfs: open code set_balance_control

Branch updated in

git://github.com/kdave/btrfs-devel dev/remove-volume-mutex

the misapplied patch hunks have been sorted out and now it also compiles.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-03 18:34 ` [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
@ 2018-04-06 20:06   ` Sasha Levin
  2018-04-07  6:42   ` Anand Jain
  1 sibling, 0 replies; 47+ messages in thread
From: Sasha Levin @ 2018-04-06 20:06 UTC (permalink / raw)
  To: Sasha Levin, David Sterba, linux-btrfs; +Cc: David Sterba, stable

Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 34.4419)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
    2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace")

v4.4.126: Failed to apply! Possible dependencies:
    2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
@ 2018-04-06 20:06   ` Sasha Levin
  2018-04-09  7:23   ` Anand Jain
  2018-04-16  6:10   ` Anand Jain
  2 siblings, 0 replies; 47+ messages in thread
From: Sasha Levin @ 2018-04-06 20:06 UTC (permalink / raw)
  To: Sasha Levin, David Sterba, linux-btrfs; +Cc: David Sterba, stable

Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 16.7330)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
    509cdd5c938a ("btrfs: add sanity check when resuming balance after mount")

v4.4.126: Failed to apply! Possible dependencies:
    509cdd5c938a ("btrfs: add sanity check when resuming balance after mount")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-03 18:34 ` [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
  2018-04-06 20:06   ` Sasha Levin
@ 2018-04-07  6:42   ` Anand Jain
  2018-04-07 10:43     ` Anand Jain
  1 sibling, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-07  6:42 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The device replace is paused by unmount or read only remount, and
> resumed on next mount or write remount.
> 
> The exclusive status should be checked properly as it's a global
> invariant and we must not allow 2 operations run. In this case, the
> balance can be also paused and resumed under same conditions. It's
> always checked first so dev-replace could see the EXCL_OP already taken,
> BUT, the ioctl would never let start both at the same time.
> 
> Replace the WARN_ON with message and return 0, indicating no error as
> this is purely theoretical and the user will be informed. Resolving that
> manually should be possible by waiting for the other operation to finish
> or cancel the paused state.

  So if there is a paused balance and replace is triggered, a write
  remount will reverse the process? that is balance will start and
  replace will hit EXCL_OP and thus canceled? How does it work in
  this case?

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-07  6:42   ` Anand Jain
@ 2018-04-07 10:43     ` Anand Jain
  2018-04-09 11:43       ` David Sterba
  0 siblings, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-07 10:43 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/07/2018 02:42 PM, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
>> The device replace is paused by unmount or read only remount, and
>> resumed on next mount or write remount.
>>
>> The exclusive status should be checked properly as it's a global
>> invariant and we must not allow 2 operations run. In this case, the
>> balance can be also paused and resumed under same conditions. It's
>> always checked first so dev-replace could see the EXCL_OP already taken,
>> BUT, the ioctl would never let start both at the same time.
>>
>> Replace the WARN_ON with message and return 0, indicating no error as
>> this is purely theoretical and the user will be informed. Resolving that
>> manually should be possible by waiting for the other operation to finish
>> or cancel the paused state.
> 
>   So if there is a paused balance and replace is triggered, a write
>   remount will reverse the process? 

  Ok, I am answering my own question:
  Even if the balance is paused that's considered as an exclusive
  operation in progress and does not let replace to start. So there
  is no scenario where paused-balance and replace could co-exist.

  So an asset will be better insted of return 0.

+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+		btrfs_info(fs_info,
+		"cannot resume dev-replace, other exclusive operation running");
+		return 0;
+	}


Thanks, Anand

> that is balance will start and
>   replace will hit EXCL_OP and thus canceled? How does it work in
>   this case?
> 
> Thanks, Anand
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
  2018-04-06 20:06   ` Sasha Levin
@ 2018-04-09  7:23   ` Anand Jain
  2018-04-16  6:10   ` Anand Jain
  2 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09  7:23 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> Replace a WARN_ON with a proper check and message in case something goes
> really wrong and resumed balance cannot set up its exclusive status.
> The check is a user friendly assertion, I don't expect to ever happen
> under normal circumstances.
> 
> Also document that the paused balance starts here and owns the exclusive
> op status.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/volumes.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eb78c1d0ce2b..843982a2cbdb 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>   	struct btrfs_key key;
>   	int ret;
>   
> +	/*
> +	 * This should never happen, as the paused balance state is recovered
> +	 * during mount without any chance of other exclusive ops to collide.
> +	 * Let it fail early and do not continue mount.
> +	 *
> +	 * Otherwise, this gives the exclusive op status to balance and keeps
> +	 * in paused state until user intervention (cancel or umount).
> +	 */
> +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> +		btrfs_err(fs_info,
> +			"cannot set exclusive op status to resume balance");
> +		return -EINVAL;
> +	}
> +


  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


>   	path = btrfs_alloc_path();
>   	if (!path)
>   		return -ENOMEM;
> @@ -4018,8 +4032,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>   	btrfs_balance_sys(leaf, item, &disk_bargs);
>   	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
>   
> -	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
> -
>   	mutex_lock(&fs_info->volume_mutex);
>   	mutex_lock(&fs_info->balance_mutex);
>   
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-03 18:34 ` [PATCH 09/16] btrfs: cleanup helpers that reset balance state David Sterba
@ 2018-04-09  7:43   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09  7:43 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The function __cancel_balance name is confusing with the cancel
> operation of balance and it really resets the state of balance back to
> zero. The unset_balance_control helper is called only from one place and
> simple enough to be inlined.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

A nitpick below.

> ---
>   fs/btrfs/ioctl.c   |  8 ++++----
>   fs/btrfs/volumes.c | 27 ++++++++++++---------------
>   2 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 582bde5b7eda..2f172122b63d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4653,10 +4653,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   
>   do_balance:
>   	/*
> -	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
> -	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
> -	 * or, if restriper was paused all the way until unmount, in
> -	 * free_fs_info.  The flag should be cleared after __cancel_balance.


> +	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP goes to to

  delete one 'to'.

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-03 18:34 ` [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
@ 2018-04-09  8:39   ` Anand Jain
  2018-04-09  8:54     ` Nikolay Borisov
  2018-04-09 11:53     ` David Sterba
  2018-04-09 14:55   ` Anand Jain
  1 sibling, 2 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09  8:39 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The volume mutex does not protect against anything in this case, the
> comment about scrub is right but not related to locking and looks
> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
> and confusing too.
> 
> The device_list_mutex is not held here to protect device lookup, but in
> this case device replace cannot run in parallel with device removal (due
> to exclusive op protection), so we don't need further locking here.

Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.

There are unfinished features in btrfs volume, such as device offline 
while it was mounted (patches in the ML).

It's better to replace volume_mutex with device_list_mutex instead of 
removing it, as we might need it in the context mentioned above.

Or it's not a good idea to clean up until all the features are in place 
otherwise we end up adding the locks again at some point.

Thanks, Anand


> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/dev-replace.c | 7 +------
>   fs/btrfs/volumes.c     | 4 ----
>   2 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 346bd460f8e7..ba011af9b0d2 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>   	struct btrfs_device *tgt_device = NULL;
>   	struct btrfs_device *src_device = NULL;
>   
> -	/* the disk copy procedure reuses the scrub code */
> -	mutex_lock(&fs_info->volume_mutex);
>   	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
>   					    srcdev_name, &src_device);
> -	if (ret) {
> -		mutex_unlock(&fs_info->volume_mutex);
> +	if (ret)
>   		return ret;
> -	}
>   
>   	ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
>   					    src_device, &tgt_device);
> -	mutex_unlock(&fs_info->volume_mutex);
>   	if (ret)
>   		return ret;
>   
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b073ab4c3c70..0ae29cd69363 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
>   		struct btrfs_device *tmp;
>   
>   		devices = &fs_info->fs_devices->devices;
> -		/*
> -		 * It is safe to read the devices since the volume_mutex
> -		 * is held by the caller.
> -		 */
>   		list_for_each_entry(tmp, devices, dev_list) {
>   			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>   					&tmp->dev_state) && !tmp->bdev) {
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-09  8:39   ` Anand Jain
@ 2018-04-09  8:54     ` Nikolay Borisov
  2018-04-09  9:31       ` Anand Jain
  2018-04-09 11:53     ` David Sterba
  1 sibling, 1 reply; 47+ messages in thread
From: Nikolay Borisov @ 2018-04-09  8:54 UTC (permalink / raw)
  To: Anand Jain, David Sterba, linux-btrfs



On  9.04.2018 11:39, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
>> The volume mutex does not protect against anything in this case, the
>> comment about scrub is right but not related to locking and looks
>> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
>> and confusing too.
>>
>> The device_list_mutex is not held here to protect device lookup, but in
>> this case device replace cannot run in parallel with device removal (due
>> to exclusive op protection), so we don't need further locking here.
> 
> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
> 
> There are unfinished features in btrfs volume, such as device offline
> while it was mounted (patches in the ML).
> 
> It's better to replace volume_mutex with device_list_mutex instead of
> removing it, as we might need it in the context mentioned above.
> 
> Or it's not a good idea to clean up until all the features are in place
> otherwise we end up adding the locks again at some point.

It's better that cleanups go so they leave the code in a better shape
than it is. Then when/if the missing features are complete and the
patches that implement them contain proper explanation how locking
works the code can be committed.

Taking into account how there is preparatory code in btrfs for features
which haven't landed for quite some time (i.e the in-band dedup stuff)
I'm against people future-proofing without clear path to merging the
feature the future-proofing pertains to.

> 
> Thanks, Anand
> 
> 
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> ---
>>   fs/btrfs/dev-replace.c | 7 +------
>>   fs/btrfs/volumes.c     | 4 ----
>>   2 files changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 346bd460f8e7..ba011af9b0d2 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info
>> *fs_info,
>>       struct btrfs_device *tgt_device = NULL;
>>       struct btrfs_device *src_device = NULL;
>>   -    /* the disk copy procedure reuses the scrub code */
>> -    mutex_lock(&fs_info->volume_mutex);
>>       ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
>>                           srcdev_name, &src_device);
>> -    if (ret) {
>> -        mutex_unlock(&fs_info->volume_mutex);
>> +    if (ret)
>>           return ret;
>> -    }
>>         ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
>>                           src_device, &tgt_device);
>> -    mutex_unlock(&fs_info->volume_mutex);
>>       if (ret)
>>           return ret;
>>   diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b073ab4c3c70..0ae29cd69363 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct
>> btrfs_fs_info *fs_info,
>>           struct btrfs_device *tmp;
>>             devices = &fs_info->fs_devices->devices;
>> -        /*
>> -         * It is safe to read the devices since the volume_mutex
>> -         * is held by the caller.
>> -         */
>>           list_for_each_entry(tmp, devices, dev_list) {
>>               if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>>                       &tmp->dev_state) && !tmp->bdev) {
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-09  8:54     ` Nikolay Borisov
@ 2018-04-09  9:31       ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09  9:31 UTC (permalink / raw)
  To: Nikolay Borisov, David Sterba, linux-btrfs



On 04/09/2018 04:54 PM, Nikolay Borisov wrote:
> 
> 
> On  9.04.2018 11:39, Anand Jain wrote:
>>
>>
>> On 04/04/2018 02:34 AM, David Sterba wrote:
>>> The volume mutex does not protect against anything in this case, the
>>> comment about scrub is right but not related to locking and looks
>>> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
>>> and confusing too.
>>>
>>> The device_list_mutex is not held here to protect device lookup, but in
>>> this case device replace cannot run in parallel with device removal (due
>>> to exclusive op protection), so we don't need further locking here.
>>
>> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
>>
>> There are unfinished features in btrfs volume, such as device offline
>> while it was mounted (patches in the ML).
>>
>> It's better to replace volume_mutex with device_list_mutex instead of
>> removing it, as we might need it in the context mentioned above.
>>
>> Or it's not a good idea to clean up until all the features are in place
>> otherwise we end up adding the locks again at some point.
> 
> It's better that cleanups go so they leave the code in a better shape
> than it is. Then when/if the missing features are complete and the
> patches that implement them contain proper explanation how locking
> works the code can be committed.
> 
> Taking into account how there is preparatory code in btrfs for features
> which haven't landed for quite some time (i.e the in-band dedup stuff)
> I'm against people future-proofing without clear path to merging the
> feature the future-proofing pertains to.

  Ok. Lets clean it up first.

Thanks, Anand

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-07 10:43     ` Anand Jain
@ 2018-04-09 11:43       ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-09 11:43 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Sat, Apr 07, 2018 at 06:43:19PM +0800, Anand Jain wrote:
> On 04/07/2018 02:42 PM, Anand Jain wrote:
> > On 04/04/2018 02:34 AM, David Sterba wrote:
> >> The device replace is paused by unmount or read only remount, and
> >> resumed on next mount or write remount.
> >>
> >> The exclusive status should be checked properly as it's a global
> >> invariant and we must not allow 2 operations run. In this case, the
> >> balance can be also paused and resumed under same conditions. It's
> >> always checked first so dev-replace could see the EXCL_OP already taken,
> >> BUT, the ioctl would never let start both at the same time.
> >>
> >> Replace the WARN_ON with message and return 0, indicating no error as
> >> this is purely theoretical and the user will be informed. Resolving that
> >> manually should be possible by waiting for the other operation to finish
> >> or cancel the paused state.
> > 
> >   So if there is a paused balance and replace is triggered, a write
> >   remount will reverse the process? 
> 
>   Ok, I am answering my own question:
>   Even if the balance is paused that's considered as an exclusive
>   operation in progress and does not let replace to start. So there
>   is no scenario where paused-balance and replace could co-exist.

The code is there to handle a theoretical but possible scenario, when the
balance item and replace items are both on the filesytem image.

>   So an asset will be better insted of return 0.

An assert may not be compile in and we still don't want to let
dev-replace and balance run together. A WARN_ON or BUG in this case is
too intrusive, a human readable error message says what to do, compared
to a stacktrace and insight from somebody else.

> +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> +		btrfs_info(fs_info,
> +		"cannot resume dev-replace, other exclusive operation running");
> +		return 0;
> +	}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-09  8:39   ` Anand Jain
  2018-04-09  8:54     ` Nikolay Borisov
@ 2018-04-09 11:53     ` David Sterba
  2018-04-09 14:44       ` Anand Jain
  1 sibling, 1 reply; 47+ messages in thread
From: David Sterba @ 2018-04-09 11:53 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Mon, Apr 09, 2018 at 04:39:03PM +0800, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > The volume mutex does not protect against anything in this case, the
> > comment about scrub is right but not related to locking and looks
> > confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
> > and confusing too.
> > 
> > The device_list_mutex is not held here to protect device lookup, but in
> > this case device replace cannot run in parallel with device removal (due
> > to exclusive op protection), so we don't need further locking here.
> 
> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
> 
> There are unfinished features in btrfs volume, such as device offline 
> while it was mounted (patches in the ML).
> 
> It's better to replace volume_mutex with device_list_mutex instead of 
> removing it, as we might need it in the context mentioned above.

The device_list_mutex will not do anything useful here. It's taken in
contexts where we need to make sure the device list is locked for writes
or we don't want to let the device disappear (superblock write).

As dev-replace requires the device to exist throughout the whole
operation (src_device), the EXCL_OP bit is the correct protection and we
can't hold device_list_mutex throughout the whole
btrfs_dev_replace_start.

> Or it's not a good idea to clean up until all the features are in place 
> otherwise we end up adding the locks again at some point.

The lock is redundant and no new code should depend on it. If you're
going to add a more fine grained locking of devices, then we can revisit
what's the best way to do it and I'm quite sure this will not require
re-adding the volume_mutex as it is used now.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-09 11:53     ` David Sterba
@ 2018-04-09 14:44       ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09 14:44 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs



On 04/09/2018 07:53 PM, David Sterba wrote:
> On Mon, Apr 09, 2018 at 04:39:03PM +0800, Anand Jain wrote:
>>
>>
>> On 04/04/2018 02:34 AM, David Sterba wrote:
>>> The volume mutex does not protect against anything in this case, the
>>> comment about scrub is right but not related to locking and looks
>>> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
>>> and confusing too.
>>>
>>> The device_list_mutex is not held here to protect device lookup, but in
>>> this case device replace cannot run in parallel with device removal (due
>>> to exclusive op protection), so we don't need further locking here.
>>
>> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
>>
>> There are unfinished features in btrfs volume, such as device offline
>> while it was mounted (patches in the ML).
>>
>> It's better to replace volume_mutex with device_list_mutex instead of
>> removing it, as we might need it in the context mentioned above.
> 
> The device_list_mutex will not do anything useful here. It's taken in
> contexts where we need to make sure the device list is locked for writes
> or we don't want to let the device disappear (superblock write).
> 
> As dev-replace requires the device to exist throughout the whole
> operation (src_device), the EXCL_OP bit is the correct protection and we
> can't hold device_list_mutex throughout the whole
> btrfs_dev_replace_start.
 >
>> Or it's not a good idea to clean up until all the features are in place
>> otherwise we end up adding the locks again at some point.
> 
> The lock is redundant and no new code should depend on it. If you're
> going to add a more fine grained locking of devices, then we can revisit
> what's the best way to do it and I'm quite sure this will not require
> re-adding the volume_mutex as it is used now.

Agreed.

Thanks, Anand


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex
  2018-04-03 18:34 ` [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
@ 2018-04-09 14:52   ` Anand Jain
  2018-04-13  5:30   ` Anand Jain
  1 sibling, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09 14:52 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> Mutual exclusion of device add/rm and balance was done by the volume
> mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow
> mutually exclusive admin operations from user mode") added a bit that
> essentially tracked the same information.
> 
> The status bit has an advantage over a mutex that it can be set without
> restrictions of function context, so it started to be used in the
> mount-time resuming of balance or device replace.
> 
> But we don't really need to track the same information in two ways.
> 
> 1) After the previous cleanups, the main ioctl handlers for
>     add/del/resize copy the EXCL_OP bit next to the volume mutex, here
>     it's clearly safe.
> 
> 2) Resuming balance during mount or after rw remount will set only the
>     EXCL_OP bit and the volume_mutex is held in the kernel thread that
>     calls btrfs_balance.
> 
> 3) Resuming device replace during mount or after rw remount is done
>     after balance and is excluded by the EXCL_OP bit. It does not take
>     the volume_mutex at all and completely relies on the EXCL_OP bit.
> 
> 4) The resuming of balance and dev-replace cannot hapen at the same time
>     as the ioctls cannot be started in parallel. Nevertheless, a crafted
>     image could trigger that and a warning is printed.
> 
> 5) Balance is normally excluded by EXCL_OP and also uses own mutex to
>     protect against concurrent access to its status data. There's some
>     trickery to maintain the right lock nesting in case we need to
>     reexamine the status in btrfs_ioctl_balance. The volume_mutex is
>     removed and the unlock/lock sequence is left in place as we might
>     expect other waiters to proceed.
> 
> 6) Similar to 5, the unlock/lock sequence is kept in
>     btrfs_cancel_balance to allow waiters to continue. >
> Signed-off-by: David Sterba <dsterba@suse.com>

  Thanks for cleaning volume_mutex.

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Anand


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-03 18:34 ` [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
  2018-04-09  8:39   ` Anand Jain
@ 2018-04-09 14:55   ` Anand Jain
  1 sibling, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09 14:55 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The volume mutex does not protect against anything in this case, the
> comment about scrub is right but not related to locking and looks
> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
> and confusing too.
> 
> The device_list_mutex is not held here to protect device lookup, but in
> this case device replace cannot run in parallel with device removal (due
> to exclusive op protection), so we don't need further locking here.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 16/16] btrfs: open code set_balance_control
  2018-04-03 18:34 ` [PATCH 16/16] btrfs: open code set_balance_control David Sterba
@ 2018-04-09 15:24   ` Anand Jain
  0 siblings, 0 replies; 47+ messages in thread
From: Anand Jain @ 2018-04-09 15:24 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> The helper is quite simple and I'd like to see the locking in the
> caller.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

For the version at:
  git://github.com/kdave/btrfs-devel dev/remove-volume-mutex

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/volumes.c | 20 ++++----------------
>   1 file changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e93c7ba44d06..db9f72c6af85 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3178,21 +3178,6 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
>   	}
>   }
>   
> -/*
> - * Should be called with balance mutex held to protect against checking the
> - * balance status or progress. Same goes for reset_balance_state.
> - */
> -static void set_balance_control(struct btrfs_balance_control *bctl)
> -{
> -	struct btrfs_fs_info *fs_info = bctl->fs_info;
> -
> -	BUG_ON(fs_info->balance_ctl);
> -
> -	spin_lock(&fs_info->balance_lock);
> -	fs_info->balance_ctl = bctl;
> -	spin_unlock(&fs_info->balance_lock);
> -}
> -
>   /*
>    * Clear the balance status in fs_info and delete the balance item from disk.
>    */
> @@ -3878,7 +3863,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   
>   	if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
>   		BUG_ON(ret == -EEXIST);
> -		set_balance_control(bctl);
> +		BUG_ON(fs_info->balance_ctl);
> +		spin_lock(&fs_info->balance_lock);
> +		fs_info->balance_ctl = bctl;
> +		spin_unlock(&fs_info->balance_lock);
>   	} else {
>   		BUG_ON(ret != -EEXIST);
>   		spin_lock(&fs_info->balance_lock);
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex
  2018-04-03 18:34 ` [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
  2018-04-09 14:52   ` Anand Jain
@ 2018-04-13  5:30   ` Anand Jain
  2018-04-13 13:15     ` David Sterba
  1 sibling, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-13  5:30 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



> @@ -4569,8 +4560,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   		/* this is either (2) or (3) */
>   		if (!atomic_read(&fs_info->balance_running)) {
>   			mutex_unlock(&fs_info->balance_mutex);
> -			if (!mutex_trylock(&fs_info->volume_mutex))
> -				goto again;


> +			/*
> +			 * Lock released to allow other waiters to continue,
> +			 * we'll reexamine the status again.
> +			 */

  I wonder if there was any case where performing the unlock and lock
  sequence on the balance_mutex has helped?
  Otherwise IMO we can clean this up as well. It looks like this sequence
  was needed only to acquire the volume_mutex.

Thanks, Anand


>   			mutex_lock(&fs_info->balance_mutex);
>   
>   			if (fs_info->balance_ctl &&


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex
  2018-04-13  5:30   ` Anand Jain
@ 2018-04-13 13:15     ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-13 13:15 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Fri, Apr 13, 2018 at 01:30:22PM +0800, Anand Jain wrote:
> 
> 
> > @@ -4569,8 +4560,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
> >   		/* this is either (2) or (3) */
> >   		if (!atomic_read(&fs_info->balance_running)) {
> >   			mutex_unlock(&fs_info->balance_mutex);
> > -			if (!mutex_trylock(&fs_info->volume_mutex))
> > -				goto again;
> 
> 
> > +			/*
> > +			 * Lock released to allow other waiters to continue,
> > +			 * we'll reexamine the status again.
> > +			 */
> 
>   I wonder if there was any case where performing the unlock and lock
>   sequence on the balance_mutex has helped?

It might help in cases something is waiting on the balance mutex, like
reading status, calling pause in parallel, but I doubt that it has some
significant impact.

>   Otherwise IMO we can clean this up as well. It looks like this sequence
>   was needed only to acquire the volume_mutex.

Agreed and I probably had removed in some early versions of the patch,
but this needs to be considered separately from volume_mutex so this
patch preserves the code though it's likely not needed.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
  2018-04-06 20:06   ` Sasha Levin
  2018-04-09  7:23   ` Anand Jain
@ 2018-04-16  6:10   ` Anand Jain
  2018-04-17 17:38     ` David Sterba
  2 siblings, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-16  6:10 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> Replace a WARN_ON with a proper check and message in case something goes
> really wrong and resumed balance cannot set up its exclusive status.

> The check is a user friendly assertion, I don't expect to ever happen
> under normal circumstances.
> 
> Also document that the paused balance starts here and owns the exclusive
> op status.


> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/volumes.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eb78c1d0ce2b..843982a2cbdb 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>   	struct btrfs_key key;
>   	int ret;
>   
> +	/*
> +	 * This should never happen, as the paused balance state is recovered
> +	 * during mount without any chance of other exclusive ops to collide.
> +	 * Let it fail early and do not continue mount.
> +	 *
> +	 * Otherwise, this gives the exclusive op status to balance and keeps
> +	 * in paused state until user intervention (cancel or umount).
> +	 */
> +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> +		btrfs_err(fs_info,
> +			"cannot set exclusive op status to resume balance");
> +		return -EINVAL;
> +	}
>

  We need the test_and_set_bit(BTRFS_FS_EXCL_OP) only if we confirm that
  there is a pending balance. Its better to test and set at the same
  place as WARN_ON before.

Thanks, Anand


>   	path = btrfs_alloc_path();
>   	if (!path)
>   		return -ENOMEM;
> @@ -4018,8 +4032,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>   	btrfs_balance_sys(leaf, item, &disk_bargs);
>   	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
>   
> -	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
> -
>   	mutex_lock(&fs_info->volume_mutex);
>   	mutex_lock(&fs_info->balance_mutex);
>   
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance
  2018-04-03 18:34 ` [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance David Sterba
@ 2018-04-16  9:43   ` Anand Jain
  2018-04-17 17:47     ` David Sterba
  0 siblings, 1 reply; 47+ messages in thread
From: Anand Jain @ 2018-04-16  9:43 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/04/2018 02:34 AM, David Sterba wrote:
> Balance cannot be started on a read-only filesystem and will have to
> finish/exit before eg. going to read-only via remount. Cancelling does
> not need to check for that.
> 
> In case the filesystem is forcibly set to read-only after an error,
> balance will finish anyway and if the cancel call is too fast it will
> just wait for that to happen. Again does not have to check.

  What if there is a power recycle and mounted as readonly
  after the reboot?

Thanks, Anand


> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/volumes.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a0420af1fad9..2956e7b4cb9f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4053,9 +4053,6 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>   
>   int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   {
> -	if (sb_rdonly(fs_info->sb))
> -		return -EROFS;
> -
>   	mutex_lock(&fs_info->balance_mutex);
>   	if (!fs_info->balance_ctl) {
>   		mutex_unlock(&fs_info->balance_mutex);
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-16  6:10   ` Anand Jain
@ 2018-04-17 17:38     ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-17 17:38 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Mon, Apr 16, 2018 at 02:10:08PM +0800, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > Replace a WARN_ON with a proper check and message in case something goes
> > really wrong and resumed balance cannot set up its exclusive status.
> 
> > The check is a user friendly assertion, I don't expect to ever happen
> > under normal circumstances.
> > 
> > Also document that the paused balance starts here and owns the exclusive
> > op status.
> 
> 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >   fs/btrfs/volumes.c | 16 ++++++++++++++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index eb78c1d0ce2b..843982a2cbdb 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
> >   	struct btrfs_key key;
> >   	int ret;
> >   
> > +	/*
> > +	 * This should never happen, as the paused balance state is recovered
> > +	 * during mount without any chance of other exclusive ops to collide.
> > +	 * Let it fail early and do not continue mount.
> > +	 *
> > +	 * Otherwise, this gives the exclusive op status to balance and keeps
> > +	 * in paused state until user intervention (cancel or umount).
> > +	 */
> > +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> > +		btrfs_err(fs_info,
> > +			"cannot set exclusive op status to resume balance");
> > +		return -EINVAL;
> > +	}
> >
> 
>   We need the test_and_set_bit(BTRFS_FS_EXCL_OP) only if we confirm that
>   there is a pending balance. Its better to test and set at the same
>   place as WARN_ON before.

Right.  And the patch is wrong, because we need to set up
fs_info::balance_ctl in all cases, otherwise unpausing balance would not
work as expected. The only change should be a better message and maybe
not even that, as it's just preparing the state but the exclusive op is
claimed later in btrfs_resume_balance_async.

I'll revisit how the error handling is done after resuming balance or
dev-replace, this is considered a hard failure (mount, rw remount) but I
don't think it's necessary.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance
  2018-04-16  9:43   ` Anand Jain
@ 2018-04-17 17:47     ` David Sterba
  0 siblings, 0 replies; 47+ messages in thread
From: David Sterba @ 2018-04-17 17:47 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Mon, Apr 16, 2018 at 05:43:46PM +0800, Anand Jain wrote:
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > Balance cannot be started on a read-only filesystem and will have to
> > finish/exit before eg. going to read-only via remount. Cancelling does
> > not need to check for that.
> > 
> > In case the filesystem is forcibly set to read-only after an error,
> > balance will finish anyway and if the cancel call is too fast it will
> > just wait for that to happen. Again does not have to check.
> 
>   What if there is a power recycle and mounted as readonly
>   after the reboot?

So the balance item would be stored on disk from previous run, mount
will set it up but balance will not be resumed (sb_rdonly check in
open_ctree happens earlier).

Calling btrfs_cancel_balance by ioctl would reset the in-memory state
and also would want to delete the balance item on disk, so yes, the
check is needed.

I'll add a comment explaining why the read-only check is there instead.
Thanks for catching it.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-04-17 17:50 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 18:34 [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba
2018-04-03 18:34 ` [PATCH 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
2018-04-05  9:40   ` Anand Jain
2018-04-03 18:34 ` [PATCH 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
2018-04-05  9:40   ` Anand Jain
2018-04-03 18:34 ` [PATCH 03/16] btrfs: export and rename free_device David Sterba
2018-04-05  9:41   ` Anand Jain
2018-04-03 18:34 ` [PATCH 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
2018-04-05  9:41   ` Anand Jain
2018-04-03 18:34 ` [PATCH 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
2018-04-05  9:41   ` Anand Jain
2018-04-05 14:28     ` David Sterba
2018-04-03 18:34 ` [PATCH 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
2018-04-05  9:42   ` Anand Jain
2018-04-05 14:04     ` David Sterba
2018-04-03 18:34 ` [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
2018-04-06 20:06   ` Sasha Levin
2018-04-07  6:42   ` Anand Jain
2018-04-07 10:43     ` Anand Jain
2018-04-09 11:43       ` David Sterba
2018-04-03 18:34 ` [PATCH 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
2018-04-06 20:06   ` Sasha Levin
2018-04-09  7:23   ` Anand Jain
2018-04-16  6:10   ` Anand Jain
2018-04-17 17:38     ` David Sterba
2018-04-03 18:34 ` [PATCH 09/16] btrfs: cleanup helpers that reset balance state David Sterba
2018-04-09  7:43   ` Anand Jain
2018-04-03 18:34 ` [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
2018-04-09  8:39   ` Anand Jain
2018-04-09  8:54     ` Nikolay Borisov
2018-04-09  9:31       ` Anand Jain
2018-04-09 11:53     ` David Sterba
2018-04-09 14:44       ` Anand Jain
2018-04-09 14:55   ` Anand Jain
2018-04-03 18:34 ` [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
2018-04-09 14:52   ` Anand Jain
2018-04-13  5:30   ` Anand Jain
2018-04-13 13:15     ` David Sterba
2018-04-03 18:34 ` [PATCH 12/16] btrfs: track running balance in a simpler way David Sterba
2018-04-03 18:34 ` [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance David Sterba
2018-04-16  9:43   ` Anand Jain
2018-04-17 17:47     ` David Sterba
2018-04-03 18:34 ` [PATCH 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
2018-04-03 18:34 ` [PATCH 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
2018-04-03 18:34 ` [PATCH 16/16] btrfs: open code set_balance_control David Sterba
2018-04-09 15:24   ` Anand Jain
2018-04-05 14:31 ` [PATCH 00/16 v1] Kill fs_info::volume_mutex David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.