All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Kill fs_info::volume_mutex
@ 2018-04-19 16:33 David Sterba
  2018-04-19 16:33 ` [PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This series gets rid of the volume mutex because it's redundant. Updated
branch: git://github.com/kdave/btrfs-devel dev/remove-volume-mutex

Changes for v2:
- sanity check in balance resume is only a warning
- read-only check in balance cancel remains and is only moved
- typo fixes

The fstests seem to pass all relevant tests now and qualifies for
conditional addition to for-next.

David Sterba (16):
  btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
  btrfs: export and rename free_device
  btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static
  btrfs: move volume_mutex to callers of btrfs_rm_device
  btrfs: move clearing of EXCL_OP out of __cancel_balance
  btrfs: add proper safety check before resuming dev-replace
  btrfs: add sanity check when resuming balance after mount
  btrfs: cleanup helpers that reset balance state
  btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  btrfs: kill btrfs_fs_info::volume_mutex
  btrfs: track running balance in a simpler way
  btrfs: move and comment read-only check in btrfs_cancel_balance
  btrfs: drop lock parameter from update_ioctl_balance_args and rename
  btrfs: use mutex in btrfs_resume_balance_async
  btrfs: open code set_balance_control

 fs/btrfs/ctree.h       |   9 +-
 fs/btrfs/dev-replace.c | 135 +++++++++++++++++++++----
 fs/btrfs/disk-io.c     |   2 -
 fs/btrfs/extent-tree.c |   2 +-
 fs/btrfs/ioctl.c       |  41 +++-----
 fs/btrfs/volumes.c     | 263 +++++++++++++++----------------------------------
 fs/btrfs/volumes.h     |   5 +-
 7 files changed, 216 insertions(+), 241 deletions(-)

-- 
2.16.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function is called once and is fairly small, we can merge it with
the caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 0d203633bb96..c0397cd2a3ba 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -45,8 +45,6 @@ static void btrfs_dev_replace_update_device_in_mapping_tree(
 						struct btrfs_device *srcdev,
 						struct btrfs_device *tgtdev);
 static int btrfs_dev_replace_kthread(void *data);
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info);
-
 
 int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 {
@@ -822,6 +820,7 @@ static int btrfs_dev_replace_kthread(void *data)
 	struct btrfs_fs_info *fs_info = data;
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	u64 progress;
+	int ret;
 
 	progress = btrfs_dev_replace_progress(fs_info);
 	progress = div_u64(progress, 10);
@@ -832,23 +831,14 @@ static int btrfs_dev_replace_kthread(void *data)
 		btrfs_dev_name(dev_replace->tgtdev),
 		(unsigned int)progress);
 
-	btrfs_dev_replace_continue_on_mount(fs_info);
-	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
-
-	return 0;
-}
-
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
-	int ret;
-
 	ret = btrfs_scrub_dev(fs_info, dev_replace->srcdev->devid,
 			      dev_replace->committed_cursor_left,
 			      btrfs_device_get_total_bytes(dev_replace->srcdev),
 			      &dev_replace->scrub_progress, 0, 1);
 	ret = btrfs_dev_replace_finishing(fs_info, ret);
 	WARN_ON(ret);
+
+	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	return 0;
 }
 
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
  2018-04-19 16:33 ` [PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 03/16] btrfs: export and rename free_device David Sterba
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This is a preparatory cleanup that will make clear that the only
successful way out of btrfs_init_dev_replace_tgtdev will also set the
device_out to a valid pointer. With this guarantee, the callers can be
simplified.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 1 -
 fs/btrfs/volumes.c     | 8 +++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index c0397cd2a3ba..e3b8a79c1665 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -370,7 +370,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	dev_replace->cont_reading_from_srcdev_mode = read_src;
 	WARN_ON(!src_device);
 	dev_replace->srcdev = src_device;
-	WARN_ON(!tgt_device);
 	dev_replace->tgtdev = tgt_device;
 
 	btrfs_info_in_rcu(fs_info,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93f8f17cacca..8086c5687b72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,6 +2592,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
 int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 				  const char *device_path,
 				  struct btrfs_device *srcdev,
@@ -2678,7 +2684,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 
 	*device_out = device;
-	return ret;
+	return 0;
 
 error:
 	blkdev_put(bdev, FMODE_EXCL);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 03/16] btrfs: export and rename free_device
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
  2018-04-19 16:33 ` [PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
  2018-04-19 16:33 ` [PATCH v2 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function will be used outside of volumes.c, the allocation
btrfs_alloc_device is also exported.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 24 ++++++++++++------------
 fs/btrfs/volumes.h |  1 +
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8086c5687b72..ca5521cc1a5b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -246,7 +246,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
 	return fs_devs;
 }
 
-static void free_device(struct btrfs_device *device)
+void btrfs_free_device(struct btrfs_device *device)
 {
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
@@ -261,7 +261,7 @@ static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 		device = list_entry(fs_devices->devices.next,
 				    struct btrfs_device, dev_list);
 		list_del(&device->dev_list);
-		free_device(device);
+		btrfs_free_device(device);
 	}
 	kfree(fs_devices);
 }
@@ -294,7 +294,7 @@ void __exit btrfs_cleanup_fs_uuids(void)
 /*
  * Returns a pointer to a new btrfs_device on success; ERR_PTR() on error.
  * Returned struct is not linked onto any lists and must be destroyed using
- * free_device.
+ * btrfs_free_device.
  */
 static struct btrfs_device *__alloc_device(void)
 {
@@ -650,7 +650,7 @@ static void btrfs_free_stale_devices(const char *path,
 			} else {
 				fs_devs->num_devices--;
 				list_del(&dev->dev_list);
-				free_device(dev);
+				btrfs_free_device(dev);
 			}
 		}
 	}
@@ -765,7 +765,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name) {
-			free_device(device);
+			btrfs_free_device(device);
 			return ERR_PTR(-ENOMEM);
 		}
 		rcu_assign_pointer(device->name, name);
@@ -878,7 +878,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 			name = rcu_string_strdup(orig_dev->name->str,
 					GFP_KERNEL);
 			if (!name) {
-				free_device(device);
+				btrfs_free_device(device);
 				goto error;
 			}
 			rcu_assign_pointer(device->name, name);
@@ -950,7 +950,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
 		}
 		list_del_init(&device->dev_list);
 		fs_devices->num_devices--;
-		free_device(device);
+		btrfs_free_device(device);
 	}
 
 	if (fs_devices->seed) {
@@ -968,7 +968,7 @@ static void free_device_rcu(struct rcu_head *head)
 	struct btrfs_device *device;
 
 	device = container_of(head, struct btrfs_device, rcu);
-	free_device(device);
+	btrfs_free_device(device);
 }
 
 static void btrfs_close_bdev(struct btrfs_device *device)
@@ -2582,7 +2582,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	if (trans)
 		btrfs_end_transaction(trans);
 error_free_device:
-	free_device(device);
+	btrfs_free_device(device);
 error:
 	blkdev_put(bdev, FMODE_EXCL);
 	if (seeding_dev && !unlocked) {
@@ -2653,7 +2653,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 
 	name = rcu_string_strdup(device_path, GFP_KERNEL);
 	if (!name) {
-		free_device(device);
+		btrfs_free_device(device);
 		ret = -ENOMEM;
 		goto error;
 	}
@@ -6419,7 +6419,7 @@ static struct btrfs_device *add_missing_dev(struct btrfs_fs_devices *fs_devices,
  *
  * Return: a pointer to a new &struct btrfs_device on success; ERR_PTR()
  * on error.  Returned struct is not linked onto any lists and must be
- * destroyed with free_device.
+ * destroyed with btrfs_free_device.
  */
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u64 *devid,
@@ -6442,7 +6442,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 
 		ret = find_next_devid(fs_info, &tmp);
 		if (ret) {
-			free_device(dev);
+			btrfs_free_device(dev);
 			return ERR_PTR(ret);
 		}
 	}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d1fcaea9fef5..45e3ece21290 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -434,6 +434,7 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid,
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u64 *devid,
 					const u8 *uuid);
+void btrfs_free_device(struct btrfs_device *device);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 		    const char *device_path, u64 devid);
 void __exit btrfs_cleanup_fs_uuids(void);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (2 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 03/16] btrfs: export and rename free_device David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function logically belongs there and there's only a single caller,
no need to export it. No code changes.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c     | 99 --------------------------------------------------
 fs/btrfs/volumes.h     |  4 --
 3 files changed, 99 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e3b8a79c1665..7a87ffad041e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -188,6 +188,105 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
+static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
+				  const char *device_path,
+				  struct btrfs_device *srcdev,
+				  struct btrfs_device **device_out)
+{
+	struct btrfs_device *device;
+	struct block_device *bdev;
+	struct list_head *devices;
+	struct rcu_string *name;
+	u64 devid = BTRFS_DEV_REPLACE_DEVID;
+	int ret = 0;
+
+	*device_out = NULL;
+	if (fs_info->fs_devices->seeding) {
+		btrfs_err(fs_info, "the filesystem is a seed filesystem!");
+		return -EINVAL;
+	}
+
+	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
+				  fs_info->bdev_holder);
+	if (IS_ERR(bdev)) {
+		btrfs_err(fs_info, "target device %s is invalid!", device_path);
+		return PTR_ERR(bdev);
+	}
+
+	filemap_write_and_wait(bdev->bd_inode->i_mapping);
+
+	devices = &fs_info->fs_devices->devices;
+	list_for_each_entry(device, devices, dev_list) {
+		if (device->bdev == bdev) {
+			btrfs_err(fs_info,
+				  "target device is in the filesystem!");
+			ret = -EEXIST;
+			goto error;
+		}
+	}
+
+
+	if (i_size_read(bdev->bd_inode) <
+	    btrfs_device_get_total_bytes(srcdev)) {
+		btrfs_err(fs_info,
+			  "target device is smaller than source device!");
+		ret = -EINVAL;
+		goto error;
+	}
+
+
+	device = btrfs_alloc_device(NULL, &devid, NULL);
+	if (IS_ERR(device)) {
+		ret = PTR_ERR(device);
+		goto error;
+	}
+
+	name = rcu_string_strdup(device_path, GFP_KERNEL);
+	if (!name) {
+		btrfs_free_device(device);
+		ret = -ENOMEM;
+		goto error;
+	}
+	rcu_assign_pointer(device->name, name);
+
+	mutex_lock(&fs_info->fs_devices->device_list_mutex);
+	set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+	device->generation = 0;
+	device->io_width = fs_info->sectorsize;
+	device->io_align = fs_info->sectorsize;
+	device->sector_size = fs_info->sectorsize;
+	device->total_bytes = btrfs_device_get_total_bytes(srcdev);
+	device->disk_total_bytes = btrfs_device_get_disk_total_bytes(srcdev);
+	device->bytes_used = btrfs_device_get_bytes_used(srcdev);
+	device->commit_total_bytes = srcdev->commit_total_bytes;
+	device->commit_bytes_used = device->bytes_used;
+	device->fs_info = fs_info;
+	device->bdev = bdev;
+	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
+	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+	device->mode = FMODE_EXCL;
+	device->dev_stats_valid = 1;
+	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+	device->fs_devices = fs_info->fs_devices;
+	list_add(&device->dev_list, &fs_info->fs_devices->devices);
+	fs_info->fs_devices->num_devices++;
+	fs_info->fs_devices->open_devices++;
+	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+
+	*device_out = device;
+	return 0;
+
+error:
+	blkdev_put(bdev, FMODE_EXCL);
+	return ret;
+}
+
 /*
  * called from commit_transaction. Writes changed device replace state to
  * disk.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ca5521cc1a5b..df2d6d06e014 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,105 +2592,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 }
 
-/*
- * Initialize a new device for device replace target from a given source dev
- * and path.
- *
- * Return 0 and new device in @device_out, otherwise return < 0
- */
-int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
-				  const char *device_path,
-				  struct btrfs_device *srcdev,
-				  struct btrfs_device **device_out)
-{
-	struct btrfs_device *device;
-	struct block_device *bdev;
-	struct list_head *devices;
-	struct rcu_string *name;
-	u64 devid = BTRFS_DEV_REPLACE_DEVID;
-	int ret = 0;
-
-	*device_out = NULL;
-	if (fs_info->fs_devices->seeding) {
-		btrfs_err(fs_info, "the filesystem is a seed filesystem!");
-		return -EINVAL;
-	}
-
-	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
-	if (IS_ERR(bdev)) {
-		btrfs_err(fs_info, "target device %s is invalid!", device_path);
-		return PTR_ERR(bdev);
-	}
-
-	filemap_write_and_wait(bdev->bd_inode->i_mapping);
-
-	devices = &fs_info->fs_devices->devices;
-	list_for_each_entry(device, devices, dev_list) {
-		if (device->bdev == bdev) {
-			btrfs_err(fs_info,
-				  "target device is in the filesystem!");
-			ret = -EEXIST;
-			goto error;
-		}
-	}
-
-
-	if (i_size_read(bdev->bd_inode) <
-	    btrfs_device_get_total_bytes(srcdev)) {
-		btrfs_err(fs_info,
-			  "target device is smaller than source device!");
-		ret = -EINVAL;
-		goto error;
-	}
-
-
-	device = btrfs_alloc_device(NULL, &devid, NULL);
-	if (IS_ERR(device)) {
-		ret = PTR_ERR(device);
-		goto error;
-	}
-
-	name = rcu_string_strdup(device_path, GFP_KERNEL);
-	if (!name) {
-		btrfs_free_device(device);
-		ret = -ENOMEM;
-		goto error;
-	}
-	rcu_assign_pointer(device->name, name);
-
-	mutex_lock(&fs_info->fs_devices->device_list_mutex);
-	set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
-	device->generation = 0;
-	device->io_width = fs_info->sectorsize;
-	device->io_align = fs_info->sectorsize;
-	device->sector_size = fs_info->sectorsize;
-	device->total_bytes = btrfs_device_get_total_bytes(srcdev);
-	device->disk_total_bytes = btrfs_device_get_disk_total_bytes(srcdev);
-	device->bytes_used = btrfs_device_get_bytes_used(srcdev);
-	device->commit_total_bytes = srcdev->commit_total_bytes;
-	device->commit_bytes_used = device->bytes_used;
-	device->fs_info = fs_info;
-	device->bdev = bdev;
-	set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
-	set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
-	device->mode = FMODE_EXCL;
-	device->dev_stats_valid = 1;
-	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
-	device->fs_devices = fs_info->fs_devices;
-	list_add(&device->dev_list, &fs_info->fs_devices->devices);
-	fs_info->fs_devices->num_devices++;
-	fs_info->fs_devices->open_devices++;
-	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-
-	*device_out = device;
-	return 0;
-
-error:
-	blkdev_put(bdev, FMODE_EXCL);
-	return ret;
-}
-
 static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
 					struct btrfs_device *device)
 {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 45e3ece21290..681be2dc0c6a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -445,10 +445,6 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 				       u8 *uuid, u8 *fsid);
 int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
 int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *path);
-int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
-				  const char *device_path,
-				  struct btrfs_device *srcdev,
-				  struct btrfs_device **device_out);
 int btrfs_balance(struct btrfs_balance_control *bctl,
 		  struct btrfs_ioctl_balance_args *bargs);
 int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (3 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation
so it's obvious that the two happen at the same time.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   | 4 ++++
 fs/btrfs/volumes.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ac85e07f567b..b93dea445802 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2674,6 +2674,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out;
 	}
+	mutex_lock(&fs_info->volume_mutex);
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
 		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
@@ -2681,6 +2682,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
 		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
 	}
+	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 
 	if (!ret) {
@@ -2716,6 +2718,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out_drop_write;
 	}
+	mutex_lock(&fs_info->volume_mutex);
 
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
@@ -2730,6 +2733,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 	kfree(vol_args);
 out:
+	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out_drop_write:
 	mnt_drop_write_file(file);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index df2d6d06e014..e0bd181dc9e0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1932,7 +1932,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	u64 num_devices;
 	int ret = 0;
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&uuid_mutex);
 
 	num_devices = fs_info->fs_devices->num_devices;
@@ -2048,7 +2047,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 
 out:
 	mutex_unlock(&uuid_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 	return ret;
 
 error_undo:
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (4 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  7:02   ` Nikolay Borisov
  2018-04-20  7:35   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Make the clearning visible in the callers so we can pair it with the
test_and_set part.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   |  2 +-
 fs/btrfs/volumes.c | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b93dea445802..582bde5b7eda 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4656,7 +4656,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
 	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
 	 * or, if restriper was paused all the way until unmount, in
-	 * free_fs_info.  The flag is cleared in __cancel_balance.
+	 * free_fs_info.  The flag should be cleared after __cancel_balance.
 	 */
 	need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e0bd181dc9e0..5c83ebc8e199 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3760,8 +3760,6 @@ static void __cancel_balance(struct btrfs_fs_info *fs_info)
 	ret = del_balance_item(fs_info);
 	if (ret)
 		btrfs_handle_fs_error(fs_info, ret, NULL);
-
-	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 }
 
 /* Non-zero return value signifies invalidity */
@@ -3919,6 +3917,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
 	    balance_need_close(fs_info)) {
 		__cancel_balance(fs_info);
+		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	}
 
 	wake_up(&fs_info->balance_wait_q);
@@ -3927,10 +3926,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 out:
 	if (bctl->flags & BTRFS_BALANCE_RESUME)
 		__cancel_balance(fs_info);
-	else {
+	else
 		kfree(bctl);
-		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
-	}
+	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
+
 	return ret;
 }
 
@@ -4089,8 +4088,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 
-		if (fs_info->balance_ctl)
+		if (fs_info->balance_ctl) {
 			__cancel_balance(fs_info);
+			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
+		}
 
 		mutex_unlock(&fs_info->volume_mutex);
 	}
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (5 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  7:04   ` Nikolay Borisov
  2018-04-20  7:36   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.

The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.

Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7a87ffad041e..346bd460f8e7 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -908,7 +908,17 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
 	}
 	btrfs_dev_replace_write_unlock(dev_replace);
 
-	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
+	/*
+	 * This could collide with a paused balance, but the exclusive op logic
+	 * should never allow both to start and pause. We don't want to allow
+	 * dev-replace to start anyway.
+	 */
+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+		btrfs_info(fs_info,
+		"cannot resume dev-replace, other exclusive operation running");
+		return 0;
+	}
+
 	task = kthread_run(btrfs_dev_replace_kthread, fs_info, "btrfs-devrepl");
 	return PTR_ERR_OR_ZERO(task);
 }
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (6 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  7:38   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state David Sterba
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.

Also document that the paused balance starts here and owns the exclusive
op status.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5c83ebc8e199..83fbe9d624f5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4018,7 +4018,19 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	btrfs_balance_sys(leaf, item, &disk_bargs);
 	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
 
-	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
+	/*
+	 * This should never happen, as the paused balance state is recovered
+	 * during mount without any chance of other exclusive ops to collide.
+	 *
+	 * This gives the exclusive op status to balance and keeps in paused
+	 * state until user intervention (cancel or umount). If the ownership
+	 * cannot be assigned, show a message but do not fail. The balance
+	 * is in a paused state and must have fs_info::balance_ctl properly
+	 * set up.
+	 */
+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
+		btrfs_warn(fs_info,
+	"cannot set exclusive op status to balance, resume manually");
 
 	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (7 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  7:07   ` Nikolay Borisov
  2018-04-20  9:04   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c   |  8 ++++----
 fs/btrfs/volumes.c | 29 +++++++++++++----------------
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 582bde5b7eda..6724029443fa 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4653,10 +4653,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 
 do_balance:
 	/*
-	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
-	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
-	 * or, if restriper was paused all the way until unmount, in
-	 * free_fs_info.  The flag should be cleared after __cancel_balance.
+	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP goes to
+	 * btrfs_balance.  bctl is freed in reset_balance_state, or, if
+	 * restriper was paused all the way until unmount, in free_fs_info.
+	 * The flag should be cleared after reset_balance_state.
 	 */
 	need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 83fbe9d624f5..3b8085b5655d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3192,7 +3192,7 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 /*
  * Should be called with both balance and volume mutexes held to
  * serialize other volume operations (add_dev/rm_dev/resize) with
- * restriper.  Same goes for unset_balance_control.
+ * restriper.  Same goes for reset_balance_state.
  */
 static void set_balance_control(struct btrfs_balance_control *bctl)
 {
@@ -3205,9 +3205,13 @@ static void set_balance_control(struct btrfs_balance_control *bctl)
 	spin_unlock(&fs_info->balance_lock);
 }
 
-static void unset_balance_control(struct btrfs_fs_info *fs_info)
+/*
+ * Clear the balance status in fs_info and delete the balance item from disk.
+ */
+static void reset_balance_state(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_balance_control *bctl = fs_info->balance_ctl;
+	int ret;
 
 	BUG_ON(!fs_info->balance_ctl);
 
@@ -3216,6 +3220,9 @@ static void unset_balance_control(struct btrfs_fs_info *fs_info)
 	spin_unlock(&fs_info->balance_lock);
 
 	kfree(bctl);
+	ret = del_balance_item(fs_info);
+	if (ret)
+		btrfs_handle_fs_error(fs_info, ret, NULL);
 }
 
 /*
@@ -3752,16 +3759,6 @@ static inline int balance_need_close(struct btrfs_fs_info *fs_info)
 		 atomic_read(&fs_info->balance_cancel_req) == 0);
 }
 
-static void __cancel_balance(struct btrfs_fs_info *fs_info)
-{
-	int ret;
-
-	unset_balance_control(fs_info);
-	ret = del_balance_item(fs_info);
-	if (ret)
-		btrfs_handle_fs_error(fs_info, ret, NULL);
-}
-
 /* Non-zero return value signifies invalidity */
 static inline int validate_convert_profile(struct btrfs_balance_args *bctl_arg,
 		u64 allowed)
@@ -3916,7 +3913,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
 	    balance_need_close(fs_info)) {
-		__cancel_balance(fs_info);
+		reset_balance_state(fs_info);
 		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	}
 
@@ -3925,7 +3922,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	return ret;
 out:
 	if (bctl->flags & BTRFS_BALANCE_RESUME)
-		__cancel_balance(fs_info);
+		reset_balance_state(fs_info);
 	else
 		kfree(bctl);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
@@ -4095,13 +4092,13 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 			   atomic_read(&fs_info->balance_running) == 0);
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
-		/* __cancel_balance needs volume_mutex */
+		/* reset_balance_state needs volume_mutex */
 		mutex_unlock(&fs_info->balance_mutex);
 		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 
 		if (fs_info->balance_ctl) {
-			__cancel_balance(fs_info);
+			reset_balance_state(fs_info);
 			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 		}
 
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (8 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 7 +------
 fs/btrfs/volumes.c     | 4 ----
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 346bd460f8e7..ba011af9b0d2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	struct btrfs_device *tgt_device = NULL;
 	struct btrfs_device *src_device = NULL;
 
-	/* the disk copy procedure reuses the scrub code */
-	mutex_lock(&fs_info->volume_mutex);
 	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
 					    srcdev_name, &src_device);
-	if (ret) {
-		mutex_unlock(&fs_info->volume_mutex);
+	if (ret)
 		return ret;
-	}
 
 	ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
 					    src_device, &tgt_device);
-	mutex_unlock(&fs_info->volume_mutex);
 	if (ret)
 		return ret;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3b8085b5655d..8a5b022e9cde 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info,
 		struct btrfs_device *tmp;
 
 		devices = &fs_info->fs_devices->devices;
-		/*
-		 * It is safe to read the devices since the volume_mutex
-		 * is held by the caller.
-		 */
 		list_for_each_entry(tmp, devices, dev_list) {
 			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
 					&tmp->dev_state) && !tmp->bdev) {
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 11/16] btrfs: kill btrfs_fs_info::volume_mutex
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (9 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-19 16:33 ` [PATCH v2 12/16] btrfs: track running balance in a simpler way David Sterba
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
   add/del/resize copy the EXCL_OP bit next to the volume mutex, here
   it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
   EXCL_OP bit and the volume_mutex is held in the kernel thread that
   calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
   after balance and is excluded by the EXCL_OP bit. It does not take
   the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
   as the ioctls cannot be started in parallel. Nevertheless, a crafted
   image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
   protect against concurrent access to its status data. There's some
   trickery to maintain the right lock nesting in case we need to
   reexamine the status in btrfs_ioctl_balance. The volume_mutex is
   removed and the unlock/lock sequence is left in place as we might
   expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
   btrfs_cancel_balance to allow waiters to continue.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       |  1 -
 fs/btrfs/disk-io.c     |  1 -
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/ioctl.c       | 17 ++++-------------
 fs/btrfs/volumes.c     | 36 ++++++++++--------------------------
 5 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0eb55825862a..011cb9a8a967 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -826,7 +826,6 @@ struct btrfs_fs_info {
 	struct mutex transaction_kthread_mutex;
 	struct mutex cleaner_mutex;
 	struct mutex chunk_mutex;
-	struct mutex volume_mutex;
 
 	/*
 	 * this is taken to make sure we don't set block groups ro after
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b5e6f7df67..c0482ecea11f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2605,7 +2605,6 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
 	mutex_init(&fs_info->cleaner_mutex);
-	mutex_init(&fs_info->volume_mutex);
 	mutex_init(&fs_info->ro_block_group_mutex);
 	init_rwsem(&fs_info->commit_root_sem);
 	init_rwsem(&fs_info->cleanup_work_sem);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f30548d7e0d2..90d28a3727c6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4122,7 +4122,7 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
  * returns target flags in extended format or 0 if restripe for this
  * chunk_type is not in progress
  *
- * should be called with either volume_mutex or balance_lock held
+ * should be called with balance_lock held
  */
 static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6724029443fa..7c99f74c200e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1470,7 +1470,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 	}
 
-	mutex_lock(&fs_info->volume_mutex);
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
 		ret = PTR_ERR(vol_args);
@@ -1578,7 +1577,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
 out_free:
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	mnt_drop_write_file(file);
 	return ret;
@@ -2626,7 +2624,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
 		return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 
-	mutex_lock(&fs_info->volume_mutex);
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
 		ret = PTR_ERR(vol_args);
@@ -2641,7 +2638,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 	return ret;
 }
@@ -2674,7 +2670,6 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out;
 	}
-	mutex_lock(&fs_info->volume_mutex);
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
 		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
@@ -2682,7 +2677,6 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 		vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
 		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
 	}
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 
 	if (!ret) {
@@ -2718,7 +2712,6 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 		goto out_drop_write;
 	}
-	mutex_lock(&fs_info->volume_mutex);
 
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args)) {
@@ -2733,7 +2726,6 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 		btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 	kfree(vol_args);
 out:
-	mutex_unlock(&fs_info->volume_mutex);
 	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out_drop_write:
 	mnt_drop_write_file(file);
@@ -4552,7 +4544,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 
 again:
 	if (!test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
-		mutex_lock(&fs_info->volume_mutex);
 		mutex_lock(&fs_info->balance_mutex);
 		need_unlock = true;
 		goto locked;
@@ -4569,8 +4560,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 		/* this is either (2) or (3) */
 		if (!atomic_read(&fs_info->balance_running)) {
 			mutex_unlock(&fs_info->balance_mutex);
-			if (!mutex_trylock(&fs_info->volume_mutex))
-				goto again;
+			/*
+			 * Lock released to allow other waiters to continue,
+			 * we'll reexamine the status again.
+			 */
 			mutex_lock(&fs_info->balance_mutex);
 
 			if (fs_info->balance_ctl &&
@@ -4581,7 +4574,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 			}
 
 			mutex_unlock(&fs_info->balance_mutex);
-			mutex_unlock(&fs_info->volume_mutex);
 			goto again;
 		} else {
 			/* this is (2) */
@@ -4674,7 +4666,6 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	kfree(bargs);
 out_unlock:
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 	if (need_unlock)
 		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 out:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8a5b022e9cde..e95cc2f09fdd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -179,12 +179,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
  * may be used to exclude some operations from running concurrently without any
  * modifications to the list (see write_all_supers)
  *
- * volume_mutex
- * ------------
- * coarse lock owned by a mounted filesystem; used to exclude some operations
- * that cannot run in parallel and affect the higher-level properties of the
- * filesystem like: device add/deleting/resize/replace, or balance
- *
  * balance_mutex
  * -------------
  * protects balance structures (status, state) and context accessed from
@@ -205,10 +199,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
  * ============
  *
  * uuid_mutex
- *   volume_mutex
- *     device_list_mutex
- *       chunk_mutex
- *     balance_mutex
+ *   device_list_mutex
+ *     chunk_mutex
+ *   balance_mutex
  */
 
 DEFINE_MUTEX(uuid_mutex);
@@ -3186,9 +3179,8 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 }
 
 /*
- * Should be called with both balance and volume mutexes held to
- * serialize other volume operations (add_dev/rm_dev/resize) with
- * restriper.  Same goes for reset_balance_state.
+ * Should be called with balance mutex held to protect against checking the
+ * balance status or progress. Same goes for reset_balance_state.
  */
 static void set_balance_control(struct btrfs_balance_control *bctl)
 {
@@ -3765,7 +3757,7 @@ static inline int validate_convert_profile(struct btrfs_balance_args *bctl_arg,
 }
 
 /*
- * Should be called with both balance and volume mutexes held
+ * Should be called with balance mutexe held
  */
 int btrfs_balance(struct btrfs_balance_control *bctl,
 		  struct btrfs_ioctl_balance_args *bargs)
@@ -3931,16 +3923,12 @@ static int balance_kthread(void *data)
 	struct btrfs_fs_info *fs_info = data;
 	int ret = 0;
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
-
 	if (fs_info->balance_ctl) {
 		btrfs_info(fs_info, "continuing balance");
 		ret = btrfs_balance(fs_info->balance_ctl, NULL);
 	}
-
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 
 	return ret;
 }
@@ -4025,13 +4013,9 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 		btrfs_warn(fs_info,
 	"cannot set exclusive op status to balance, resume manually");
 
-	mutex_lock(&fs_info->volume_mutex);
 	mutex_lock(&fs_info->balance_mutex);
-
 	set_balance_control(bctl);
-
 	mutex_unlock(&fs_info->balance_mutex);
-	mutex_unlock(&fs_info->volume_mutex);
 out:
 	btrfs_free_path(path);
 	return ret;
@@ -4088,17 +4072,17 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 			   atomic_read(&fs_info->balance_running) == 0);
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
-		/* reset_balance_state needs volume_mutex */
 		mutex_unlock(&fs_info->balance_mutex);
-		mutex_lock(&fs_info->volume_mutex);
+		/*
+		 * Lock released to allow other waiters to continue, we'll
+		 * reexamine the status again.
+		 */
 		mutex_lock(&fs_info->balance_mutex);
 
 		if (fs_info->balance_ctl) {
 			reset_balance_state(fs_info);
 			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
 		}
-
-		mutex_unlock(&fs_info->volume_mutex);
 	}
 
 	BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (10 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  7:52   ` Anand Jain
  2018-04-27  2:10   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance David Sterba
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Currently fs_info::balance_running is 0 or 1 and does not use the
semantics of atomics. The pause and cancel check for 0, that can happen
only after __btrfs_balance exits for whatever reason.

Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
times but will block on the balance_mutex that protects the
fs_info::flags bit.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h   |  6 +++++-
 fs/btrfs/disk-io.c |  1 -
 fs/btrfs/ioctl.c   |  6 +++---
 fs/btrfs/volumes.c | 18 ++++++++++--------
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 011cb9a8a967..fda94a264eb7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -726,6 +726,11 @@ struct btrfs_delayed_root;
  * (device replace, resize, device add/delete, balance)
  */
 #define BTRFS_FS_EXCL_OP			16
+/*
+ * Indicate that balance has been set up from the ioctl and is in the main
+ * phase. The fs_info::balance_ctl is initialized.
+ */
+#define BTRFS_FS_BALANCE_RUNNING		17
 
 struct btrfs_fs_info {
 	u8 fsid[BTRFS_FSID_SIZE];
@@ -991,7 +996,6 @@ struct btrfs_fs_info {
 	/* restriper state */
 	spinlock_t balance_lock;
 	struct mutex balance_mutex;
-	atomic_t balance_running;
 	atomic_t balance_pause_req;
 	atomic_t balance_cancel_req;
 	struct btrfs_balance_control *balance_ctl;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c0482ecea11f..b62559dfb053 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2168,7 +2168,6 @@ static void btrfs_init_balance(struct btrfs_fs_info *fs_info)
 {
 	spin_lock_init(&fs_info->balance_lock);
 	mutex_init(&fs_info->balance_mutex);
-	atomic_set(&fs_info->balance_running, 0);
 	atomic_set(&fs_info->balance_pause_req, 0);
 	atomic_set(&fs_info->balance_cancel_req, 0);
 	fs_info->balance_ctl = NULL;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7c99f74c200e..3dfd5ab2807b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4506,7 +4506,7 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 
 	bargs->flags = bctl->flags;
 
-	if (atomic_read(&fs_info->balance_running))
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags))
 		bargs->state |= BTRFS_BALANCE_STATE_RUNNING;
 	if (atomic_read(&fs_info->balance_pause_req))
 		bargs->state |= BTRFS_BALANCE_STATE_PAUSE_REQ;
@@ -4558,7 +4558,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 	mutex_lock(&fs_info->balance_mutex);
 	if (fs_info->balance_ctl) {
 		/* this is either (2) or (3) */
-		if (!atomic_read(&fs_info->balance_running)) {
+		if (!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 			mutex_unlock(&fs_info->balance_mutex);
 			/*
 			 * Lock released to allow other waiters to continue,
@@ -4567,7 +4567,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 			mutex_lock(&fs_info->balance_mutex);
 
 			if (fs_info->balance_ctl &&
-			    !atomic_read(&fs_info->balance_running)) {
+			    !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 				/* this is (3) */
 				need_unlock = false;
 				goto locked;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e95cc2f09fdd..a766d2f988c1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3886,13 +3886,14 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 		spin_unlock(&fs_info->balance_lock);
 	}
 
-	atomic_inc(&fs_info->balance_running);
+	ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
+	set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
 	mutex_unlock(&fs_info->balance_mutex);
 
 	ret = __btrfs_balance(fs_info);
 
 	mutex_lock(&fs_info->balance_mutex);
-	atomic_dec(&fs_info->balance_running);
+	clear_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
 
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
@@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 		return -ENOTCONN;
 	}
 
-	if (atomic_read(&fs_info->balance_running)) {
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 		atomic_inc(&fs_info->balance_pause_req);
 		mutex_unlock(&fs_info->balance_mutex);
 
 		wait_event(fs_info->balance_wait_q,
-			   atomic_read(&fs_info->balance_running) == 0);
+			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 
 		mutex_lock(&fs_info->balance_mutex);
 		/* we are good with balance_ctl ripped off from under us */
-		BUG_ON(atomic_read(&fs_info->balance_running));
+		BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 		atomic_dec(&fs_info->balance_pause_req);
 	} else {
 		ret = -ENOTCONN;
@@ -4066,10 +4067,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 	 * if we are running just wait and return, balance item is
 	 * deleted in btrfs_balance in this case
 	 */
-	if (atomic_read(&fs_info->balance_running)) {
+	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
 		mutex_unlock(&fs_info->balance_mutex);
 		wait_event(fs_info->balance_wait_q,
-			   atomic_read(&fs_info->balance_running) == 0);
+			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 		mutex_lock(&fs_info->balance_mutex);
 	} else {
 		mutex_unlock(&fs_info->balance_mutex);
@@ -4085,7 +4086,8 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 		}
 	}
 
-	BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
+	BUG_ON(fs_info->balance_ctl ||
+		test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 	atomic_dec(&fs_info->balance_cancel_req);
 	mutex_unlock(&fs_info->balance_mutex);
 	return 0;
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (11 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 12/16] btrfs: track running balance in a simpler way David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  9:13   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Balance cannot be started on a read-only filesystem and will have to
finish/exit before eg. going to read-only via remount.

In case the filesystem is forcibly set to read-only after an error,
balance will finish anyway and if the cancel call is too fast it will
just wait for that to happen.

The last case is when the balance is paused after mount but it's
read-only and cancelling would want to delete the item. The test is
moved after the check if balance is running at all, as it looks more
logical to report "no balance running" instead of "read-only
filesystem".

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a766d2f988c1..9e48a4654d3a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4053,15 +4053,20 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
-	if (sb_rdonly(fs_info->sb))
-		return -EROFS;
-
 	mutex_lock(&fs_info->balance_mutex);
 	if (!fs_info->balance_ctl) {
 		mutex_unlock(&fs_info->balance_mutex);
 		return -ENOTCONN;
 	}
 
+	/*
+	 * A paused balance with the item stored on disk can be resumed at
+	 * mount time if the mount is read-write. Otherwise it's still paused
+	 * and we must not allow cancelling as it deletes the item.
+	 */
+	if (sb_rdonly(fs_info->sb))
+		return -EROFS;
+
 	atomic_inc(&fs_info->balance_cancel_req);
 	/*
 	 * if we are running just wait and return, balance item is
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (12 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  9:21   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
  2018-04-19 16:33 ` [PATCH v2 16/16] btrfs: open code set_balance_control David Sterba
  15 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The parameter controls locking of the stats part but we can lock it
unconditionally, as this only happens once when balance starts. This is
not performance critical.

Add the prefix for an exported function.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/ioctl.c   | 14 +++++---------
 fs/btrfs/volumes.c |  2 +-
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fda94a264eb7..7ac2e69bfb03 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3258,7 +3258,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		      u64 newer_than, unsigned long max_pages);
 void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
 			       struct btrfs_ioctl_balance_args *bargs);
 ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
 			   struct file *dst_file, u64 dst_loff);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3dfd5ab2807b..b85d2a71136d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4499,7 +4499,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
 			       struct btrfs_ioctl_balance_args *bargs)
 {
 	struct btrfs_balance_control *bctl = fs_info->balance_ctl;
@@ -4517,13 +4517,9 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 	memcpy(&bargs->meta, &bctl->meta, sizeof(bargs->meta));
 	memcpy(&bargs->sys, &bctl->sys, sizeof(bargs->sys));
 
-	if (lock) {
-		spin_lock(&fs_info->balance_lock);
-		memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
-		spin_unlock(&fs_info->balance_lock);
-	} else {
-		memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
-	}
+	spin_lock(&fs_info->balance_lock);
+	memcpy(&bargs->stat, &bctl->stat, sizeof(bargs->stat));
+	spin_unlock(&fs_info->balance_lock);
 }
 
 static long btrfs_ioctl_balance(struct file *file, void __user *arg)
@@ -4709,7 +4705,7 @@ static long btrfs_ioctl_balance_progress(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 
-	update_ioctl_balance_args(fs_info, 1, bargs);
+	btrfs_update_ioctl_balance_args(fs_info, bargs);
 
 	if (copy_to_user(arg, bargs, sizeof(*bargs)))
 		ret = -EFAULT;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9e48a4654d3a..1abe0fc5c105 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3897,7 +3897,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
-		update_ioctl_balance_args(fs_info, 0, bargs);
+		btrfs_update_ioctl_balance_args(fs_info, bargs);
 	}
 
 	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (13 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
@ 2018-04-19 16:33 ` David Sterba
  2018-04-20  9:25   ` Anand Jain
  2018-04-19 16:33 ` [PATCH v2 16/16] btrfs: open code set_balance_control David Sterba
  15 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

While the spinlock does not cause problems, using the mutex is more
correct and consistent with others. The global status of balance is eg.
checked from btrfs_pause_balance or btrfs_cancel_balance with mutex.

Resuming balance happens during mount or ro->rw remount. In the former
case, no other user of the balance_ctl exists, in the latter, balance
cannot run until the ro/rw transition is finished.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1abe0fc5c105..445380c4ac72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3938,12 +3938,12 @@ int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
 {
 	struct task_struct *tsk;
 
-	spin_lock(&fs_info->balance_lock);
+	mutex_lock(&fs_info->balance_mutex);
 	if (!fs_info->balance_ctl) {
-		spin_unlock(&fs_info->balance_lock);
+		mutex_unlock(&fs_info->balance_mutex);
 		return 0;
 	}
-	spin_unlock(&fs_info->balance_lock);
+	mutex_unlock(&fs_info->balance_mutex);
 
 	if (btrfs_test_opt(fs_info, SKIP_BALANCE)) {
 		btrfs_info(fs_info, "force skipping balance");
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 16/16] btrfs: open code set_balance_control
  2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
                   ` (14 preceding siblings ...)
  2018-04-19 16:33 ` [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
@ 2018-04-19 16:33 ` David Sterba
  15 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-19 16:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The helper is quite simple and I'd like to see the locking in the
caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 445380c4ac72..6bab436a94d5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3178,21 +3178,6 @@ static void update_balance_args(struct btrfs_balance_control *bctl)
 	}
 }
 
-/*
- * Should be called with balance mutex held to protect against checking the
- * balance status or progress. Same goes for reset_balance_state.
- */
-static void set_balance_control(struct btrfs_balance_control *bctl)
-{
-	struct btrfs_fs_info *fs_info = bctl->fs_info;
-
-	BUG_ON(fs_info->balance_ctl);
-
-	spin_lock(&fs_info->balance_lock);
-	fs_info->balance_ctl = bctl;
-	spin_unlock(&fs_info->balance_lock);
-}
-
 /*
  * Clear the balance status in fs_info and delete the balance item from disk.
  */
@@ -3878,7 +3863,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
 	if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
 		BUG_ON(ret == -EEXIST);
-		set_balance_control(bctl);
+		BUG_ON(fs_info->balance_ctl);
+		spin_lock(&fs_info->balance_lock);
+		fs_info->balance_ctl = bctl;
+		spin_unlock(&fs_info->balance_lock);
 	} else {
 		BUG_ON(ret != -EEXIST);
 		spin_lock(&fs_info->balance_lock);
@@ -4015,7 +4003,10 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	"cannot set exclusive op status to balance, resume manually");
 
 	mutex_lock(&fs_info->balance_mutex);
-	set_balance_control(bctl);
+	BUG_ON(fs_info->balance_ctl);
+	spin_lock(&fs_info->balance_lock);
+	fs_info->balance_ctl = bctl;
+	spin_unlock(&fs_info->balance_lock);
 	mutex_unlock(&fs_info->balance_mutex);
 out:
 	btrfs_free_path(path);
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-19 16:33 ` [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
@ 2018-04-20  7:02   ` Nikolay Borisov
  2018-04-20  7:35   ` Anand Jain
  1 sibling, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2018-04-20  7:02 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 19.04.2018 19:33, David Sterba wrote:
> Make the clearning visible in the callers so we can pair it with the
> test_and_set part.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ioctl.c   |  2 +-
>  fs/btrfs/volumes.c | 13 +++++++------
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index b93dea445802..582bde5b7eda 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4656,7 +4656,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>  	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
>  	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
>  	 * or, if restriper was paused all the way until unmount, in
> -	 * free_fs_info.  The flag is cleared in __cancel_balance.
> +	 * free_fs_info.  The flag should be cleared after __cancel_balance.
>  	 */
>  	need_unlock = false;
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e0bd181dc9e0..5c83ebc8e199 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3760,8 +3760,6 @@ static void __cancel_balance(struct btrfs_fs_info *fs_info)
>  	ret = del_balance_item(fs_info);
>  	if (ret)
>  		btrfs_handle_fs_error(fs_info, ret, NULL);
> -
> -	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
>  }
>  
>  /* Non-zero return value signifies invalidity */
> @@ -3919,6 +3917,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>  	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
>  	    balance_need_close(fs_info)) {
>  		__cancel_balance(fs_info);
> +		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
>  	}
>  
>  	wake_up(&fs_info->balance_wait_q);
> @@ -3927,10 +3926,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>  out:
>  	if (bctl->flags & BTRFS_BALANCE_RESUME)
>  		__cancel_balance(fs_info);
> -	else {
> +	else
>  		kfree(bctl);
> -		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> -	}
> +	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> +
>  	return ret;
>  }
>  
> @@ -4089,8 +4088,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>  		mutex_lock(&fs_info->volume_mutex);
>  		mutex_lock(&fs_info->balance_mutex);
>  
> -		if (fs_info->balance_ctl)
> +		if (fs_info->balance_ctl) {
>  			__cancel_balance(fs_info);
> +			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> +		}
>  
>  		mutex_unlock(&fs_info->volume_mutex);
>  	}
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-19 16:33 ` [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
@ 2018-04-20  7:04   ` Nikolay Borisov
  2018-04-20  7:36   ` Anand Jain
  1 sibling, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2018-04-20  7:04 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 19.04.2018 19:33, David Sterba wrote:
> The device replace is paused by unmount or read only remount, and
> resumed on next mount or write remount.
> 
> The exclusive status should be checked properly as it's a global
> invariant and we must not allow 2 operations run. In this case, the
> balance can be also paused and resumed under same conditions. It's
> always checked first so dev-replace could see the EXCL_OP already taken,
> BUT, the ioctl would never let start both at the same time.
> 
> Replace the WARN_ON with message and return 0, indicating no error as
> this is purely theoretical and the user will be informed. Resolving that
> manually should be possible by waiting for the other operation to finish
> or cancel the paused state.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/dev-replace.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 7a87ffad041e..346bd460f8e7 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -908,7 +908,17 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
>  	}
>  	btrfs_dev_replace_write_unlock(dev_replace);
>  
> -	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));
> +	/*
> +	 * This could collide with a paused balance, but the exclusive op logic
> +	 * should never allow both to start and pause. We don't want to allow
> +	 * dev-replace to start anyway.
> +	 */
> +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> +		btrfs_info(fs_info,
> +		"cannot resume dev-replace, other exclusive operation running");
> +		return 0;
> +	}
> +
>  	task = kthread_run(btrfs_dev_replace_kthread, fs_info, "btrfs-devrepl");
>  	return PTR_ERR_OR_ZERO(task);
>  }
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-19 16:33 ` [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state David Sterba
@ 2018-04-20  7:07   ` Nikolay Borisov
  2018-04-20 11:28     ` David Sterba
  2018-04-20  9:04   ` Anand Jain
  1 sibling, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2018-04-20  7:07 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 19.04.2018 19:33, David Sterba wrote:
> +		/* reset_balance_state needs volume_mutex */

Does it make sense to codify this invariant as lockdep_assert_held in
reset_balance_state ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance
  2018-04-19 16:33 ` [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
  2018-04-20  7:02   ` Nikolay Borisov
@ 2018-04-20  7:35   ` Anand Jain
  1 sibling, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  7:35 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> Make the clearning visible in the callers so we can pair it with the
> test_and_set part.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


> ---
>   fs/btrfs/ioctl.c   |  2 +-
>   fs/btrfs/volumes.c | 13 +++++++------
>   2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index b93dea445802..582bde5b7eda 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4656,7 +4656,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   	 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
>   	 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
>   	 * or, if restriper was paused all the way until unmount, in
> -	 * free_fs_info.  The flag is cleared in __cancel_balance.
> +	 * free_fs_info.  The flag should be cleared after __cancel_balance.
>   	 */
>   	need_unlock = false;
>   
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e0bd181dc9e0..5c83ebc8e199 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3760,8 +3760,6 @@ static void __cancel_balance(struct btrfs_fs_info *fs_info)
>   	ret = del_balance_item(fs_info);
>   	if (ret)
>   		btrfs_handle_fs_error(fs_info, ret, NULL);
> -
> -	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
>   }
>   
>   /* Non-zero return value signifies invalidity */
> @@ -3919,6 +3917,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   	if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
>   	    balance_need_close(fs_info)) {
>   		__cancel_balance(fs_info);
> +		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
>   	}
>   
>   	wake_up(&fs_info->balance_wait_q);
> @@ -3927,10 +3926,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   out:
>   	if (bctl->flags & BTRFS_BALANCE_RESUME)
>   		__cancel_balance(fs_info);
> -	else {
> +	else
>   		kfree(bctl);
> -		clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> -	}
> +	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> +
>   	return ret;
>   }
>   
> @@ -4089,8 +4088,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   		mutex_lock(&fs_info->volume_mutex);
>   		mutex_lock(&fs_info->balance_mutex);
>   
> -		if (fs_info->balance_ctl)
> +		if (fs_info->balance_ctl) {
>   			__cancel_balance(fs_info);
> +			clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> +		}
>   
>   		mutex_unlock(&fs_info->volume_mutex);
>   	}
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace
  2018-04-19 16:33 ` [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
  2018-04-20  7:04   ` Nikolay Borisov
@ 2018-04-20  7:36   ` Anand Jain
  1 sibling, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  7:36 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> The device replace is paused by unmount or read only remount, and
> resumed on next mount or write remount.
> 
> The exclusive status should be checked properly as it's a global
> invariant and we must not allow 2 operations run. In this case, the
> balance can be also paused and resumed under same conditions. It's
> always checked first so dev-replace could see the EXCL_OP already taken,
> BUT, the ioctl would never let start both at the same time.
> 
> Replace the WARN_ON with message and return 0, indicating no error as
> this is purely theoretical and the user will be informed. Resolving that
> manually should be possible by waiting for the other operation to finish
> or cancel the paused state.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount
  2018-04-19 16:33 ` [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
@ 2018-04-20  7:38   ` Anand Jain
  0 siblings, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  7:38 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> Replace a WARN_ON with a proper check and message in case something goes
> really wrong and resumed balance cannot set up its exclusive status.
> The check is a user friendly assertion, I don't expect to ever happen
> under normal circumstances.
> 
> Also document that the paused balance starts here and owns the exclusive
> op status.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-19 16:33 ` [PATCH v2 12/16] btrfs: track running balance in a simpler way David Sterba
@ 2018-04-20  7:52   ` Anand Jain
  2018-04-20 11:58     ` David Sterba
  2018-04-27  2:10   ` Anand Jain
  1 sibling, 1 reply; 37+ messages in thread
From: Anand Jain @ 2018-04-20  7:52 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> Currently fs_info::balance_running is 0 or 1 and does not use the
> semantics of atomics. The pause and cancel check for 0, that can happen
> only after __btrfs_balance exits for whatever reason.
> 
> Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
> times but will block on the balance_mutex that protects the
> fs_info::flags bit.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/ctree.h   |  6 +++++-
>   fs/btrfs/disk-io.c |  1 -
>   fs/btrfs/ioctl.c   |  6 +++---
>   fs/btrfs/volumes.c | 18 ++++++++++--------
>   4 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 011cb9a8a967..fda94a264eb7 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -726,6 +726,11 @@ struct btrfs_delayed_root;
>    * (device replace, resize, device add/delete, balance)
>    */
>   #define BTRFS_FS_EXCL_OP			16
> +/*
> + * Indicate that balance has been set up from the ioctl and is in the main
> + * phase. The fs_info::balance_ctl is initialized.
> + */
> +#define BTRFS_FS_BALANCE_RUNNING		17


  Change looks good to me as such and its a good idea to drop
    fs_info::balance_running;

  However instead of making BTRFS_FS_BALANCE_RUNNING part of
    btrfs_fs_info::flags
  pls make it part of
    btrfs_balance_control::flags

  Because:
  We already have BTRFS_BALANCE_RESUME there.
  And at fs_info level BTRFS_FS_EXCL_OP tells someone excl is running.
  And if its a balance (either in running or implicit-paused state)
  then btrfs_fs_info::balance_ctl is not NULL.

Thanks, Anand

>   
>   struct btrfs_fs_info {
>   	u8 fsid[BTRFS_FSID_SIZE];
> @@ -991,7 +996,6 @@ struct btrfs_fs_info {
>   	/* restriper state */
>   	spinlock_t balance_lock;
>   	struct mutex balance_mutex;
> -	atomic_t balance_running;
>   	atomic_t balance_pause_req;
>   	atomic_t balance_cancel_req;
>   	struct btrfs_balance_control *balance_ctl;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c0482ecea11f..b62559dfb053 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2168,7 +2168,6 @@ static void btrfs_init_balance(struct btrfs_fs_info *fs_info)
>   {
>   	spin_lock_init(&fs_info->balance_lock);
>   	mutex_init(&fs_info->balance_mutex);
> -	atomic_set(&fs_info->balance_running, 0);
>   	atomic_set(&fs_info->balance_pause_req, 0);
>   	atomic_set(&fs_info->balance_cancel_req, 0);
>   	fs_info->balance_ctl = NULL;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 7c99f74c200e..3dfd5ab2807b 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4506,7 +4506,7 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
>   
>   	bargs->flags = bctl->flags;
>   
> -	if (atomic_read(&fs_info->balance_running))
> +	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags))
>   		bargs->state |= BTRFS_BALANCE_STATE_RUNNING;
>   	if (atomic_read(&fs_info->balance_pause_req))
>   		bargs->state |= BTRFS_BALANCE_STATE_PAUSE_REQ;
> @@ -4558,7 +4558,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   	mutex_lock(&fs_info->balance_mutex);
>   	if (fs_info->balance_ctl) {
>   		/* this is either (2) or (3) */
> -		if (!atomic_read(&fs_info->balance_running)) {
> +		if (!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
>   			mutex_unlock(&fs_info->balance_mutex);
>   			/*
>   			 * Lock released to allow other waiters to continue,
> @@ -4567,7 +4567,7 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   			mutex_lock(&fs_info->balance_mutex);
>   
>   			if (fs_info->balance_ctl &&
> -			    !atomic_read(&fs_info->balance_running)) {
> +			    !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
>   				/* this is (3) */
>   				need_unlock = false;
>   				goto locked;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e95cc2f09fdd..a766d2f988c1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3886,13 +3886,14 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   		spin_unlock(&fs_info->balance_lock);
>   	}
>   
> -	atomic_inc(&fs_info->balance_running);
> +	ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
> +	set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
>   	mutex_unlock(&fs_info->balance_mutex);
>   
>   	ret = __btrfs_balance(fs_info);
>   
>   	mutex_lock(&fs_info->balance_mutex);
> -	atomic_dec(&fs_info->balance_running);
> +	clear_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
>   
>   	if (bargs) {
>   		memset(bargs, 0, sizeof(*bargs));
> @@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>   		return -ENOTCONN;
>   	}
>   
> -	if (atomic_read(&fs_info->balance_running)) {
> +	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
>   		atomic_inc(&fs_info->balance_pause_req);
>   		mutex_unlock(&fs_info->balance_mutex);
>   
>   		wait_event(fs_info->balance_wait_q,
> -			   atomic_read(&fs_info->balance_running) == 0);
> +			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
>   
>   		mutex_lock(&fs_info->balance_mutex);
>   		/* we are good with balance_ctl ripped off from under us */
> -		BUG_ON(atomic_read(&fs_info->balance_running));
> +		BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
>   		atomic_dec(&fs_info->balance_pause_req);
>   	} else {
>   		ret = -ENOTCONN;
> @@ -4066,10 +4067,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   	 * if we are running just wait and return, balance item is
>   	 * deleted in btrfs_balance in this case
>   	 */
> -	if (atomic_read(&fs_info->balance_running)) {
> +	if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
>   		mutex_unlock(&fs_info->balance_mutex);
>   		wait_event(fs_info->balance_wait_q,
> -			   atomic_read(&fs_info->balance_running) == 0);
> +			   !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
>   		mutex_lock(&fs_info->balance_mutex);
>   	} else {
>   		mutex_unlock(&fs_info->balance_mutex);
> @@ -4085,7 +4086,8 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   		}
>   	}
>   
> -	BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
> +	BUG_ON(fs_info->balance_ctl ||
> +		test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
>   	atomic_dec(&fs_info->balance_cancel_req);
>   	mutex_unlock(&fs_info->balance_mutex);
>   	return 0;
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-19 16:33 ` [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state David Sterba
  2018-04-20  7:07   ` Nikolay Borisov
@ 2018-04-20  9:04   ` Anand Jain
  1 sibling, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  9:04 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> The function __cancel_balance name is confusing with the cancel
> operation of balance and it really resets the state of balance back to
> zero. The unset_balance_control helper is called only from one place and
> simple enough to be inlined.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

  Reviewed-by: Anand Jain <anand.jain@oracle.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance
  2018-04-19 16:33 ` [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance David Sterba
@ 2018-04-20  9:13   ` Anand Jain
  2018-04-20 11:59     ` David Sterba
  2018-04-20 12:06     ` [PATCH v2.1 " David Sterba
  0 siblings, 2 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  9:13 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> Balance cannot be started on a read-only filesystem and will have to
> finish/exit before eg. going to read-only via remount.

  It can be paused as well.
    btrfs balance pause /btrfs && mount -o remount,ro /dev/sdb /btrfs


> @@ -4053,15 +4053,20 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>   
>   int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   {
> -	if (sb_rdonly(fs_info->sb))
> -		return -EROFS;
> -
>   	mutex_lock(&fs_info->balance_mutex);
>   	if (!fs_info->balance_ctl) {
>   		mutex_unlock(&fs_info->balance_mutex);
>   		return -ENOTCONN;
>   	}
>   
> +	/*
> +	 * A paused balance with the item stored on disk can be resumed at
> +	 * mount time if the mount is read-write. Otherwise it's still paused
> +	 * and we must not allow cancelling as it deletes the item.
> +	 */
> +	if (sb_rdonly(fs_info->sb))
> +		return -EROFS;
> +

   mutex_unlock(&fs_info->balance_mutex); ?

Thanks, Anand


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename
  2018-04-19 16:33 ` [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
@ 2018-04-20  9:21   ` Anand Jain
  0 siblings, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  9:21 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> The parameter controls locking of the stats part but we can lock it
> unconditionally, as this only happens once when balance starts. This is
> not performance critical.
> 
> Add the prefix for an exported function.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async
  2018-04-19 16:33 ` [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
@ 2018-04-20  9:25   ` Anand Jain
  0 siblings, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20  9:25 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> While the spinlock does not cause problems, using the mutex is more
> correct and consistent with others. The global status of balance is eg.
> checked from btrfs_pause_balance or btrfs_cancel_balance with mutex.
> 
> Resuming balance happens during mount or ro->rw remount. In the former
> case, no other user of the balance_ctl exists, in the latter, balance
> cannot run until the ro/rw transition is finished.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state
  2018-04-20  7:07   ` Nikolay Borisov
@ 2018-04-20 11:28     ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-20 11:28 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: David Sterba, linux-btrfs

On Fri, Apr 20, 2018 at 10:07:17AM +0300, Nikolay Borisov wrote:
> 
> 
> On 19.04.2018 19:33, David Sterba wrote:
> > +		/* reset_balance_state needs volume_mutex */
> 
> Does it make sense to codify this invariant as lockdep_assert_held in
> reset_balance_state ?

No, the comment and the mutex will be removed in the following patches.

But yeah in general the lockdep annotations are better than the comments
stating which lock is supposed to be held.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-20  7:52   ` Anand Jain
@ 2018-04-20 11:58     ` David Sterba
  2018-04-20 12:19       ` David Sterba
  0 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-20 11:58 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Fri, Apr 20, 2018 at 03:52:24PM +0800, Anand Jain wrote:
> 
> 
> On 04/20/2018 12:33 AM, David Sterba wrote:
> > Currently fs_info::balance_running is 0 or 1 and does not use the
> > semantics of atomics. The pause and cancel check for 0, that can happen
> > only after __btrfs_balance exits for whatever reason.
> > 
> > Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
> > times but will block on the balance_mutex that protects the
> > fs_info::flags bit.
> > 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >   fs/btrfs/ctree.h   |  6 +++++-
> >   fs/btrfs/disk-io.c |  1 -
> >   fs/btrfs/ioctl.c   |  6 +++---
> >   fs/btrfs/volumes.c | 18 ++++++++++--------
> >   4 files changed, 18 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 011cb9a8a967..fda94a264eb7 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -726,6 +726,11 @@ struct btrfs_delayed_root;
> >    * (device replace, resize, device add/delete, balance)
> >    */
> >   #define BTRFS_FS_EXCL_OP			16
> > +/*
> > + * Indicate that balance has been set up from the ioctl and is in the main
> > + * phase. The fs_info::balance_ctl is initialized.
> > + */
> > +#define BTRFS_FS_BALANCE_RUNNING		17
> 
> 
>   Change looks good to me as such and its a good idea to drop
>     fs_info::balance_running;
> 
>   However instead of making BTRFS_FS_BALANCE_RUNNING part of
>     btrfs_fs_info::flags
>   pls make it part of
>     btrfs_balance_control::flags
> 
>   Because:
>   We already have BTRFS_BALANCE_RESUME there.
>   And at fs_info level BTRFS_FS_EXCL_OP tells someone excl is running.
>   And if its a balance (either in running or implicit-paused state)
>   then btrfs_fs_info::balance_ctl is not NULL.

This would work fine, if the btrfs_balance_control::flags were not copy
of the ioctl structure. There are the filter flags stored there, in
addition to BTRFS_BALANCE_RESUME.

>From that point it's the balance ioctl interface and adding the internal
runtime flag does not seem to fit.

The status bit could be tracked separatelly in btrfs_balance_control so
it does not use the whole filesystem flag namespace, as it's always used
under balance mutex.

As this is simple change to the patch, I'll do that.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance
  2018-04-20  9:13   ` Anand Jain
@ 2018-04-20 11:59     ` David Sterba
  2018-04-20 12:06     ` [PATCH v2.1 " David Sterba
  1 sibling, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-20 11:59 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Fri, Apr 20, 2018 at 05:13:02PM +0800, Anand Jain wrote:
> 
> 
> On 04/20/2018 12:33 AM, David Sterba wrote:
> > Balance cannot be started on a read-only filesystem and will have to
> > finish/exit before eg. going to read-only via remount.
> 
>   It can be paused as well.
>     btrfs balance pause /btrfs && mount -o remount,ro /dev/sdb /btrfs
> 
> 
> > @@ -4053,15 +4053,20 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
> >   
> >   int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
> >   {
> > -	if (sb_rdonly(fs_info->sb))
> > -		return -EROFS;
> > -
> >   	mutex_lock(&fs_info->balance_mutex);
> >   	if (!fs_info->balance_ctl) {
> >   		mutex_unlock(&fs_info->balance_mutex);
> >   		return -ENOTCONN;
> >   	}
> >   
> > +	/*
> > +	 * A paused balance with the item stored on disk can be resumed at
> > +	 * mount time if the mount is read-write. Otherwise it's still paused
> > +	 * and we must not allow cancelling as it deletes the item.
> > +	 */
> > +	if (sb_rdonly(fs_info->sb))
> > +		return -EROFS;
> > +
> 
>    mutex_unlock(&fs_info->balance_mutex); ?

Ah, of course. Fixed, thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2.1 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance
  2018-04-20  9:13   ` Anand Jain
  2018-04-20 11:59     ` David Sterba
@ 2018-04-20 12:06     ` David Sterba
  2018-04-27  2:02       ` Anand Jain
  1 sibling, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-20 12:06 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

Balance cannot be started on a read-only filesystem and will have to
finish/exit before eg. going to read-only via remount.

In case the filesystem is forcibly set to read-only after an error,
balance will finish anyway and if the cancel call is too fast it will
just wait for that to happen.

The last case is when the balance is paused after mount but it's
read-only and cancelling would want to delete the item. The test is
moved after the check if balance is running at all, as it looks more
logical to report "no balance running" instead of "read-only
filesystem".

Signed-off-by: David Sterba <dsterba@suse.com>
---

- Add missing unlock

 fs/btrfs/volumes.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a766d2f988c1..9176d77e02ee 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4053,15 +4053,22 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
-	if (sb_rdonly(fs_info->sb))
-		return -EROFS;
-
 	mutex_lock(&fs_info->balance_mutex);
 	if (!fs_info->balance_ctl) {
 		mutex_unlock(&fs_info->balance_mutex);
 		return -ENOTCONN;
 	}
 
+	/*
+	 * A paused balance with the item stored on disk can be resumed at
+	 * mount time if the mount is read-write. Otherwise it's still paused
+	 * and we must not allow cancelling as it deletes the item.
+	 */
+	if (sb_rdonly(fs_info->sb)) {
+		mutex_unlock(&fs_info->balance_mutex);
+		return -EROFS;
+	}
+
 	atomic_inc(&fs_info->balance_cancel_req);
 	/*
 	 * if we are running just wait and return, balance item is
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-20 11:58     ` David Sterba
@ 2018-04-20 12:19       ` David Sterba
  2018-04-20 13:32         ` Anand Jain
  0 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2018-04-20 12:19 UTC (permalink / raw)
  To: dsterba, Anand Jain, David Sterba, linux-btrfs

On Fri, Apr 20, 2018 at 01:58:11PM +0200, David Sterba wrote:
> On Fri, Apr 20, 2018 at 03:52:24PM +0800, Anand Jain wrote:
> > 
> > 
> > On 04/20/2018 12:33 AM, David Sterba wrote:
> > > Currently fs_info::balance_running is 0 or 1 and does not use the
> > > semantics of atomics. The pause and cancel check for 0, that can happen
> > > only after __btrfs_balance exits for whatever reason.
> > > 
> > > Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
> > > times but will block on the balance_mutex that protects the
> > > fs_info::flags bit.
> > > 
> > > Signed-off-by: David Sterba <dsterba@suse.com>
> > > ---
> > >   fs/btrfs/ctree.h   |  6 +++++-
> > >   fs/btrfs/disk-io.c |  1 -
> > >   fs/btrfs/ioctl.c   |  6 +++---
> > >   fs/btrfs/volumes.c | 18 ++++++++++--------
> > >   4 files changed, 18 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > index 011cb9a8a967..fda94a264eb7 100644
> > > --- a/fs/btrfs/ctree.h
> > > +++ b/fs/btrfs/ctree.h
> > > @@ -726,6 +726,11 @@ struct btrfs_delayed_root;
> > >    * (device replace, resize, device add/delete, balance)
> > >    */
> > >   #define BTRFS_FS_EXCL_OP			16
> > > +/*
> > > + * Indicate that balance has been set up from the ioctl and is in the main
> > > + * phase. The fs_info::balance_ctl is initialized.
> > > + */
> > > +#define BTRFS_FS_BALANCE_RUNNING		17
> > 
> > 
> >   Change looks good to me as such and its a good idea to drop
> >     fs_info::balance_running;
> > 
> >   However instead of making BTRFS_FS_BALANCE_RUNNING part of
> >     btrfs_fs_info::flags
> >   pls make it part of
> >     btrfs_balance_control::flags
> > 
> >   Because:
> >   We already have BTRFS_BALANCE_RESUME there.
> >   And at fs_info level BTRFS_FS_EXCL_OP tells someone excl is running.
> >   And if its a balance (either in running or implicit-paused state)
> >   then btrfs_fs_info::balance_ctl is not NULL.
> 
> This would work fine, if the btrfs_balance_control::flags were not copy
> of the ioctl structure. There are the filter flags stored there, in
> addition to BTRFS_BALANCE_RESUME.
> 
> >From that point it's the balance ioctl interface and adding the internal
> runtime flag does not seem to fit.
> 
> The status bit could be tracked separatelly in btrfs_balance_control so
> it does not use the whole filesystem flag namespace, as it's always used
> under balance mutex.
> 
> As this is simple change to the patch, I'll do that.

Ok not that simple, the running status is checked outside of
balance_mutex and there's one more assertion that does not expect the
balance_ctl to exist:

@@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
                return -ENOTCONN;
        }

-       if (atomic_read(&fs_info->balance_running)) {
+       if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
                atomic_inc(&fs_info->balance_pause_req);
                mutex_unlock(&fs_info->balance_mutex);

                wait_event(fs_info->balance_wait_q,
-                          atomic_read(&fs_info->balance_running) == 0);
+                          !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));

here it's unlocked


                mutex_lock(&fs_info->balance_mutex);
                /* we are good with balance_ctl ripped off from under us */
-               BUG_ON(atomic_read(&fs_info->balance_running));
+               BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));

and rewriting the code so this could be checked the same way is not a simple
fixup as I expected.

As there's still the extra balance mutex lock/unlock after the volume
mutex removal, I'll have a look how this could be cleaned up further.

                atomic_dec(&fs_info->balance_pause_req);
        } else {
                ret = -ENOTCONN;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-20 12:19       ` David Sterba
@ 2018-04-20 13:32         ` Anand Jain
  0 siblings, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-20 13:32 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs

	



> Ok not that simple, the running status is checked outside of
> balance_mutex and there's one more assertion that does not expect the
> balance_ctl to exist:
> 
> @@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>                  return -ENOTCONN;
>          }
> 
> -       if (atomic_read(&fs_info->balance_running)) {
> +       if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
>                  atomic_inc(&fs_info->balance_pause_req);
>                  mutex_unlock(&fs_info->balance_mutex);
> 
>                  wait_event(fs_info->balance_wait_q,
> -                          atomic_read(&fs_info->balance_running) == 0);
> +                          !test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
> 
> here it's unlocked
> 
>                  mutex_lock(&fs_info->balance_mutex);
>                  /* we are good with balance_ctl ripped off from under us */
> -               BUG_ON(atomic_read(&fs_info->balance_running));
> +               BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
> 
> and rewriting the code so this could be checked the same way is not a simple
> fixup as I expected.
> 
> As there's still the extra balance mutex lock/unlock after the volume
> mutex removal, I'll have a look how this could be cleaned up further.
> 
>                  atomic_dec(&fs_info->balance_pause_req);
>          } else {
>                  ret = -ENOTCONN;

  Makes sense.

Thanks, Anand


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2.1 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance
  2018-04-20 12:06     ` [PATCH v2.1 " David Sterba
@ 2018-04-27  2:02       ` Anand Jain
  0 siblings, 0 replies; 37+ messages in thread
From: Anand Jain @ 2018-04-27  2:02 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs



On 04/20/2018 08:06 PM, David Sterba wrote:
> Balance cannot be started on a read-only filesystem and will have to
> finish/exit before eg. going to read-only via remount.
> 
> In case the filesystem is forcibly set to read-only after an error,
> balance will finish anyway and if the cancel call is too fast it will
> just wait for that to happen.
> 
> The last case is when the balance is paused after mount but it's
> read-only and cancelling would want to delete the item. The test is
> moved after the check if balance is running at all, as it looks more
> logical to report "no balance running" instead of "read-only
> filesystem".
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


> ---
> 
> - Add missing unlock
> 
>   fs/btrfs/volumes.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a766d2f988c1..9176d77e02ee 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4053,15 +4053,22 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>   
>   int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
>   {
> -	if (sb_rdonly(fs_info->sb))
> -		return -EROFS;
> -
>   	mutex_lock(&fs_info->balance_mutex);
>   	if (!fs_info->balance_ctl) {
>   		mutex_unlock(&fs_info->balance_mutex);
>   		return -ENOTCONN;
>   	}
>   
> +	/*
> +	 * A paused balance with the item stored on disk can be resumed at
> +	 * mount time if the mount is read-write. Otherwise it's still paused
> +	 * and we must not allow cancelling as it deletes the item.
> +	 */
> +	if (sb_rdonly(fs_info->sb)) {
> +		mutex_unlock(&fs_info->balance_mutex);
> +		return -EROFS;
> +	}
> +
>   	atomic_inc(&fs_info->balance_cancel_req);
>   	/*
>   	 * if we are running just wait and return, balance item is
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-19 16:33 ` [PATCH v2 12/16] btrfs: track running balance in a simpler way David Sterba
  2018-04-20  7:52   ` Anand Jain
@ 2018-04-27  2:10   ` Anand Jain
  2018-04-27 16:10     ` David Sterba
  1 sibling, 1 reply; 37+ messages in thread
From: Anand Jain @ 2018-04-27  2:10 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 04/20/2018 12:33 AM, David Sterba wrote:
> Currently fs_info::balance_running is 0 or 1 and does not use the
> semantics of atomics. The pause and cancel check for 0, that can happen
> only after __btrfs_balance exits for whatever reason.
> 
> Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
> times but will block on the balance_mutex that protects the
> fs_info::flags bit.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

  Though it appears to me that we could optimize the lock and atomic bit
  usage a little more, however it can be taken for the next round of
  cleanups. So for now..

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/16] btrfs: track running balance in a simpler way
  2018-04-27  2:10   ` Anand Jain
@ 2018-04-27 16:10     ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2018-04-27 16:10 UTC (permalink / raw)
  To: Anand Jain; +Cc: David Sterba, linux-btrfs

On Fri, Apr 27, 2018 at 10:10:24AM +0800, Anand Jain wrote:
> 
> 
> On 04/20/2018 12:33 AM, David Sterba wrote:
> > Currently fs_info::balance_running is 0 or 1 and does not use the
> > semantics of atomics. The pause and cancel check for 0, that can happen
> > only after __btrfs_balance exits for whatever reason.
> > 
> > Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
> > times but will block on the balance_mutex that protects the
> > fs_info::flags bit.
> > 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> 
>   Though it appears to me that we could optimize the lock and atomic bit
>   usage a little more, however it can be taken for the next round of
>   cleanups. So for now..
> 
>   Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, patches updated.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-04-27 16:12 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19 16:33 [PATCH v2 00/16] Kill fs_info::volume_mutex David Sterba
2018-04-19 16:33 ` [PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller David Sterba
2018-04-19 16:33 ` [PATCH v2 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear David Sterba
2018-04-19 16:33 ` [PATCH v2 03/16] btrfs: export and rename free_device David Sterba
2018-04-19 16:33 ` [PATCH v2 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static David Sterba
2018-04-19 16:33 ` [PATCH v2 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device David Sterba
2018-04-19 16:33 ` [PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance David Sterba
2018-04-20  7:02   ` Nikolay Borisov
2018-04-20  7:35   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace David Sterba
2018-04-20  7:04   ` Nikolay Borisov
2018-04-20  7:36   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount David Sterba
2018-04-20  7:38   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 09/16] btrfs: cleanup helpers that reset balance state David Sterba
2018-04-20  7:07   ` Nikolay Borisov
2018-04-20 11:28     ` David Sterba
2018-04-20  9:04   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start David Sterba
2018-04-19 16:33 ` [PATCH v2 11/16] btrfs: kill btrfs_fs_info::volume_mutex David Sterba
2018-04-19 16:33 ` [PATCH v2 12/16] btrfs: track running balance in a simpler way David Sterba
2018-04-20  7:52   ` Anand Jain
2018-04-20 11:58     ` David Sterba
2018-04-20 12:19       ` David Sterba
2018-04-20 13:32         ` Anand Jain
2018-04-27  2:10   ` Anand Jain
2018-04-27 16:10     ` David Sterba
2018-04-19 16:33 ` [PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance David Sterba
2018-04-20  9:13   ` Anand Jain
2018-04-20 11:59     ` David Sterba
2018-04-20 12:06     ` [PATCH v2.1 " David Sterba
2018-04-27  2:02       ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename David Sterba
2018-04-20  9:21   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async David Sterba
2018-04-20  9:25   ` Anand Jain
2018-04-19 16:33 ` [PATCH v2 16/16] btrfs: open code set_balance_control David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.