All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation
@ 2020-02-06 10:41 Naohiro Aota
  2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
                   ` (20 more replies)
  0 siblings, 21 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

This series refactors chunk allocation, device_extent allocation and
extent allocation functions and make them generalized to be able to
implement other allocation policy easily.

On top of this series, we can simplify some part of the "btrfs: zoned
block device support" series as adding a new type of chunk allocator
and extent allocator for zoned block devices. Furthermore, we will be
able to implement and test some other allocator in the idea page of
the wiki e.g. SSD caching, dedicated metadata drive, chunk allocation
groups, and so on.

This series has no functional changes except introducing "enum
btrfs_chunk_allocation_policy" and "enum
btrfs_extent_allocation_policy".

* Refactoring chunk/dev_extent allocator

Two functions are separated from find_free_dev_extent_start().
dev_extent_search_start() decides the starting position of the search.
dev_extent_hole_check() checks if a hole found is suitable for device
extent allocation.

__btrfs_alloc_chunk() is split into four functions. set_parameters()
initializes the parameters of an allocation. gather_device_info()
loops over devices and gather information of
them. decide_stripe_size() decides the size of chunk and
device_extent. And, create_chunk() creates a chunk and device extents.

* Refactoring extent allocator

Three functions are introduced in
find_free_extent(). prepare_allocation() initializes the parameters
and gives a hint byte to start the allocation with. do_allocation()
handles the actual allocation in a given block group.
release_block_group() is called when it gives up an allocation from a
block group, so the allocation context should be reset.

Two functions are introduced in find_free_extent_update_loop().
found_extent() is called when the allocator finally find a proper
extent. chunk_allocation_failed() is called when it failed to allocate
a new chunk. An allocator implementation can use this hook to set the
next stage to try e.g. LOOP_NO_EMPTY_SIZE.

Furthermore, LOOP_NO_EMPTY_SIZE stage is tweaked so that other
allocator than the current clustered allocator skips this stage.

* Patch organization

Patch 1 is a trivial patch to fix the type of an argument of
find_free_extent_update_loop().

Patches 2-9 refactors chunk and device_extent allocation functions:
find_free_dev_extent_start() and __btrfs_alloc_chunk().

Patches 10-20 refactors extent allocation function: find_free_extent()
and find_free_extent_update_loop().

Naohiro Aota (20):
  btrfs: change type of full_search to bool
  btrfs: introduce chunk allocation policy
  btrfs: refactor find_free_dev_extent_start()
  btrfs: introduce alloc_chunk_ctl
  btrfs: factor out set_parameters()
  btrfs: factor out gather_device_info()
  btrfs: factor out decide_stripe_size()
  btrfs: factor out create_chunk()
  btrfs: parameterize dev_extent_min
  btrfs: introduce extent allocation policy
  btrfs: move hint_byte into find_free_extent_ctl
  btrfs: introduce clustered_alloc_info
  btrfs: factor out do_allocation()
  btrfs: drop unnecessary arguments from clustered allocation functions
  btrfs: factor out release_block_group()
  btrfs: factor out found_extent()
  btrfs: drop unnecessary arguments from find_free_extent_update_loop()
  btrfs: factor out chunk_allocation_failed()
  btrfs: add assertion on LOOP_NO_EMPTY_SIZE stage
  btrfs: factor out prepare_allocation()

 fs/btrfs/extent-tree.c | 378 ++++++++++++++++++++++++++-------------
 fs/btrfs/volumes.c     | 397 +++++++++++++++++++++++++++--------------
 fs/btrfs/volumes.h     |   6 +
 3 files changed, 525 insertions(+), 256 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/20] btrfs: change type of full_search to bool
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
@ 2020-02-06 10:41 ` Naohiro Aota
  2020-02-06 11:26   ` Johannes Thumshirn
  2020-02-06 16:03   ` Josef Bacik
  2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

While the "full_search" variable defined in find_free_extent() is bool, but
the full_search argument of find_free_extent_update_loop() is defined as
int. Let's trivially fix the argument type.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0163fdd59f8f..227d4d628b90 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3672,7 +3672,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 					struct btrfs_free_cluster *last_ptr,
 					struct btrfs_key *ins,
 					struct find_free_extent_ctl *ffe_ctl,
-					int full_search, bool use_cluster)
+					bool full_search, bool use_cluster)
 {
 	struct btrfs_root *root = fs_info->extent_root;
 	int ret;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/20] btrfs: introduce chunk allocation policy
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
  2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
@ 2020-02-06 10:41 ` Naohiro Aota
  2020-02-06 11:30   ` Johannes Thumshirn
                     ` (2 more replies)
  2020-02-06 10:41 ` [PATCH 03/20] btrfs: refactor find_free_dev_extent_start() Naohiro Aota
                   ` (18 subsequent siblings)
  20 siblings, 3 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

This commit introduces chuk allocation policy for btrfs. This policy
controls how btrfs allocate a chunk and device extents from devices.

There is no functional change introduced with this commit.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 1 +
 fs/btrfs/volumes.h | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9cfc668f91f4..beee9a94142f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1209,6 +1209,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
+	fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
 out:
 	return ret;
 }
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 690d4f5a0653..f07e1c4240b9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,10 @@ BTRFS_DEVICE_GETSET_FUNCS(total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
+enum btrfs_chunk_allocation_policy {
+	BTRFS_CHUNK_ALLOC_REGULAR,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -259,6 +263,8 @@ struct btrfs_fs_devices {
 	struct kobject fsid_kobj;
 	struct kobject *devices_kobj;
 	struct completion kobj_unregister;
+
+	enum btrfs_chunk_allocation_policy chunk_alloc_policy;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/20] btrfs: refactor find_free_dev_extent_start()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
  2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
  2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
@ 2020-02-06 10:41 ` Naohiro Aota
  2020-02-06 12:02   ` Johannes Thumshirn
  2020-02-06 16:34   ` Josef Bacik
  2020-02-06 10:41 ` [PATCH 04/20] btrfs: introduce alloc_chunk_ctl Naohiro Aota
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out two functions from find_free_dev_extent_start().
dev_extent_search_start() decides the starting position of the search.
dev_extent_hole_check() checks if a hole found is suitable for device
extent allocation.

These functions also have the switch-cases to change the allocation
behavior depending on the policy.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 80 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index beee9a94142f..9bb673df777a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1384,6 +1384,61 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 	return false;
 }
 
+static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
+{
+	switch (device->fs_devices->chunk_alloc_policy) {
+	case BTRFS_CHUNK_ALLOC_REGULAR:
+		/*
+		 * We don't want to overwrite the superblock on the
+		 * drive nor any area used by the boot loader (grub
+		 * for example), so we make sure to start at an offset
+		 * of at least 1MB.
+		 */
+		return max_t(u64, start, SZ_1M);
+	default:
+		BUG();
+	}
+}
+
+/*
+ * dev_extent_hole_check - check if specified hole is suitable for allocation
+ * @device:	the device which we have the hole
+ * @hole_start: starting position of the hole
+ * @hole_size:	the size of the hole
+ * @num_bytes:	the size of the free space that we need
+ *
+ * This function may modify @hole_start and @hole_end to reflect the
+ * suitable position for allocation. Returns 1 if hole position is
+ * updated, 0 otherwise.
+ */
+static int dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
+				 u64 *hole_size, u64 num_bytes)
+{
+	int ret = 0;
+	u64 hole_end = *hole_start + *hole_size;
+
+	/*
+	 * Have to check before we set max_hole_start, otherwise we
+	 * could end up sending back this offset anyway.
+	 */
+	if (contains_pending_extent(device, hole_start, *hole_size)) {
+		if (hole_end >= *hole_start)
+			*hole_size = hole_end - *hole_start;
+		else
+			*hole_size = 0;
+		ret = 1;
+	}
+
+	switch (device->fs_devices->chunk_alloc_policy) {
+	case BTRFS_CHUNK_ALLOC_REGULAR:
+		/* No extra check */
+		break;
+	default:
+		BUG();
+	}
+
+	return ret;
+}
 
 /*
  * find_free_dev_extent_start - find free space in the specified device
@@ -1430,12 +1485,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	int slot;
 	struct extent_buffer *l;
 
-	/*
-	 * We don't want to overwrite the superblock on the drive nor any area
-	 * used by the boot loader (grub for example), so we make sure to start
-	 * at an offset of at least 1MB.
-	 */
-	search_start = max_t(u64, search_start, SZ_1M);
+	search_start = dev_extent_search_start(device, search_start);
 
 	path = btrfs_alloc_path();
 	if (!path)
@@ -1493,18 +1543,8 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 
 		if (key.offset > search_start) {
 			hole_size = key.offset - search_start;
-
-			/*
-			 * Have to check before we set max_hole_start, otherwise
-			 * we could end up sending back this offset anyway.
-			 */
-			if (contains_pending_extent(device, &search_start,
-						    hole_size)) {
-				if (key.offset >= search_start)
-					hole_size = key.offset - search_start;
-				else
-					hole_size = 0;
-			}
+			dev_extent_hole_check(device, &search_start, &hole_size,
+					      num_bytes);
 
 			if (hole_size > max_hole_size) {
 				max_hole_start = search_start;
@@ -1543,8 +1583,8 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	 */
 	if (search_end > search_start) {
 		hole_size = search_end - search_start;
-
-		if (contains_pending_extent(device, &search_start, hole_size)) {
+		if (dev_extent_hole_check(device, &search_start, &hole_size,
+					  num_bytes)) {
 			btrfs_release_path(path);
 			goto again;
 		}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/20] btrfs: introduce alloc_chunk_ctl
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (2 preceding siblings ...)
  2020-02-06 10:41 ` [PATCH 03/20] btrfs: refactor find_free_dev_extent_start() Naohiro Aota
@ 2020-02-06 10:41 ` Naohiro Aota
  2020-02-06 12:07   ` Johannes Thumshirn
  2020-02-06 16:38   ` Josef Bacik
  2020-02-06 10:41 ` [PATCH 05/20] btrfs: factor out set_parameters() Naohiro Aota
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Introduce "struct alloc_chunk_ctl" to wrap needed parameters for the chunk
allocation.  This will be used to split __btrfs_alloc_chunk() into smaller
functions.

This commit folds a number of local variables in __btrfs_alloc_chunk() into
one "struct alloc_chunk_ctl ctl". There is no functional change.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 143 +++++++++++++++++++++++++--------------------
 1 file changed, 81 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9bb673df777a..cfde302bf297 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4818,6 +4818,29 @@ static void check_raid1c34_incompat_flag(struct btrfs_fs_info *info, u64 type)
 	btrfs_set_fs_incompat(info, RAID1C34);
 }
 
+/*
+ * Structure used internally for __btrfs_alloc_chunk() function.
+ * Wraps needed parameters.
+ */
+struct alloc_chunk_ctl {
+	u64 start;
+	u64 type;
+	int num_stripes;	/* total number of stripes to allocate */
+	int sub_stripes;	/* sub_stripes info for map */
+	int dev_stripes;	/* stripes per dev */
+	int devs_max;		/* max devs to use */
+	int devs_min;		/* min devs needed */
+	int devs_increment;	/* ndevs has to be a multiple of this */
+	int ncopies;		/* how many copies to data has */
+	int nparity;		/* number of stripes worth of bytes to
+				   store parity information */
+	u64 max_stripe_size;
+	u64 max_chunk_size;
+	u64 stripe_size;
+	u64 chunk_size;
+	int ndevs;
+};
+
 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			       u64 start, u64 type)
 {
@@ -4828,23 +4851,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	struct extent_map_tree *em_tree;
 	struct extent_map *em;
 	struct btrfs_device_info *devices_info = NULL;
+	struct alloc_chunk_ctl ctl;
 	u64 total_avail;
-	int num_stripes;	/* total number of stripes to allocate */
 	int data_stripes;	/* number of stripes that count for
 				   block group size */
-	int sub_stripes;	/* sub_stripes info for map */
-	int dev_stripes;	/* stripes per dev */
-	int devs_max;		/* max devs to use */
-	int devs_min;		/* min devs needed */
-	int devs_increment;	/* ndevs has to be a multiple of this */
-	int ncopies;		/* how many copies to data has */
-	int nparity;		/* number of stripes worth of bytes to
-				   store parity information */
 	int ret;
-	u64 max_stripe_size;
-	u64 max_chunk_size;
-	u64 stripe_size;
-	u64 chunk_size;
 	int ndevs;
 	int i;
 	int j;
@@ -4858,32 +4869,36 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		return -ENOSPC;
 	}
 
+	ctl.start = start;
+	ctl.type = type;
+
 	index = btrfs_bg_flags_to_raid_index(type);
 
-	sub_stripes = btrfs_raid_array[index].sub_stripes;
-	dev_stripes = btrfs_raid_array[index].dev_stripes;
-	devs_max = btrfs_raid_array[index].devs_max;
-	if (!devs_max)
-		devs_max = BTRFS_MAX_DEVS(info);
-	devs_min = btrfs_raid_array[index].devs_min;
-	devs_increment = btrfs_raid_array[index].devs_increment;
-	ncopies = btrfs_raid_array[index].ncopies;
-	nparity = btrfs_raid_array[index].nparity;
+	ctl.sub_stripes = btrfs_raid_array[index].sub_stripes;
+	ctl.dev_stripes = btrfs_raid_array[index].dev_stripes;
+	ctl.devs_max = btrfs_raid_array[index].devs_max;
+	if (!ctl.devs_max)
+		ctl.devs_max = BTRFS_MAX_DEVS(info);
+	ctl.devs_min = btrfs_raid_array[index].devs_min;
+	ctl.devs_increment = btrfs_raid_array[index].devs_increment;
+	ctl.ncopies = btrfs_raid_array[index].ncopies;
+	ctl.nparity = btrfs_raid_array[index].nparity;
 
 	if (type & BTRFS_BLOCK_GROUP_DATA) {
-		max_stripe_size = SZ_1G;
-		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
+		ctl.max_stripe_size = SZ_1G;
+		ctl.max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
 	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
 		/* for larger filesystems, use larger metadata chunks */
 		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
-			max_stripe_size = SZ_1G;
+			ctl.max_stripe_size = SZ_1G;
 		else
-			max_stripe_size = SZ_256M;
-		max_chunk_size = max_stripe_size;
+			ctl.max_stripe_size = SZ_256M;
+		ctl.max_chunk_size = ctl.max_stripe_size;
 	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
-		max_stripe_size = SZ_32M;
-		max_chunk_size = 2 * max_stripe_size;
-		devs_max = min_t(int, devs_max, BTRFS_MAX_DEVS_SYS_CHUNK);
+		ctl.max_stripe_size = SZ_32M;
+		ctl.max_chunk_size = 2 * ctl.max_stripe_size;
+		ctl.devs_max = min_t(int, ctl.devs_max,
+				      BTRFS_MAX_DEVS_SYS_CHUNK);
 	} else {
 		btrfs_err(info, "invalid chunk type 0x%llx requested",
 		       type);
@@ -4891,8 +4906,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	}
 
 	/* We don't want a chunk larger than 10% of writable space */
-	max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
-			     max_chunk_size);
+	ctl.max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
+				  ctl.max_chunk_size);
 
 	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
 			       GFP_NOFS);
@@ -4929,20 +4944,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			continue;
 
 		ret = find_free_dev_extent(device,
-					   max_stripe_size * dev_stripes,
+				ctl.max_stripe_size * ctl.dev_stripes,
 					   &dev_offset, &max_avail);
 		if (ret && ret != -ENOSPC)
 			goto error;
 
 		if (ret == 0)
-			max_avail = max_stripe_size * dev_stripes;
+			max_avail = ctl.max_stripe_size * ctl.dev_stripes;
 
-		if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
+		if (max_avail < BTRFS_STRIPE_LEN * ctl.dev_stripes) {
 			if (btrfs_test_opt(info, ENOSPC_DEBUG))
 				btrfs_debug(info,
 			"%s: devid %llu has no free space, have=%llu want=%u",
 					    __func__, device->devid, max_avail,
-					    BTRFS_STRIPE_LEN * dev_stripes);
+				BTRFS_STRIPE_LEN * ctl.dev_stripes);
 			continue;
 		}
 
@@ -4957,30 +4972,31 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		devices_info[ndevs].dev = device;
 		++ndevs;
 	}
+	ctl.ndevs = ndevs;
 
 	/*
 	 * now sort the devices by hole size / available space
 	 */
-	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
+	sort(devices_info, ctl.ndevs, sizeof(struct btrfs_device_info),
 	     btrfs_cmp_device_info, NULL);
 
 	/*
 	 * Round down to number of usable stripes, devs_increment can be any
 	 * number so we can't use round_down()
 	 */
-	ndevs -= ndevs % devs_increment;
+	ctl.ndevs -= ctl.ndevs % ctl.devs_increment;
 
-	if (ndevs < devs_min) {
+	if (ctl.ndevs < ctl.devs_min) {
 		ret = -ENOSPC;
 		if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
 			btrfs_debug(info,
 	"%s: not enough devices with free space: have=%d minimum required=%d",
-				    __func__, ndevs, devs_min);
+				    __func__, ctl.ndevs, ctl.devs_min);
 		}
 		goto error;
 	}
 
-	ndevs = min(ndevs, devs_max);
+	ctl.ndevs = min(ctl.ndevs, ctl.devs_max);
 
 	/*
 	 * The primary goal is to maximize the number of stripes, so use as
@@ -4989,14 +5005,15 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 * The DUP profile stores more than one stripe per device, the
 	 * max_avail is the total size so we have to adjust.
 	 */
-	stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
-	num_stripes = ndevs * dev_stripes;
+	ctl.stripe_size = div_u64(devices_info[ctl.ndevs - 1].max_avail,
+				   ctl.dev_stripes);
+	ctl.num_stripes = ctl.ndevs * ctl.dev_stripes;
 
 	/*
 	 * this will have to be fixed for RAID1 and RAID10 over
 	 * more drives
 	 */
-	data_stripes = (num_stripes - nparity) / ncopies;
+	data_stripes = (ctl.num_stripes - ctl.nparity) / ctl.ncopies;
 
 	/*
 	 * Use the number of data stripes to figure out how big this chunk
@@ -5004,44 +5021,44 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 * and compare that answer with the max chunk size. If it's higher,
 	 * we try to reduce stripe_size.
 	 */
-	if (stripe_size * data_stripes > max_chunk_size) {
+	if (ctl.stripe_size * data_stripes > ctl.max_chunk_size) {
 		/*
 		 * Reduce stripe_size, round it up to a 16MB boundary again and
 		 * then use it, unless it ends up being even bigger than the
 		 * previous value we had already.
 		 */
-		stripe_size = min(round_up(div_u64(max_chunk_size,
+		ctl.stripe_size = min(round_up(div_u64(ctl.max_chunk_size,
 						   data_stripes), SZ_16M),
-				  stripe_size);
+				       ctl.stripe_size);
 	}
 
 	/* align to BTRFS_STRIPE_LEN */
-	stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN);
+	ctl.stripe_size = round_down(ctl.stripe_size, BTRFS_STRIPE_LEN);
 
-	map = kmalloc(map_lookup_size(num_stripes), GFP_NOFS);
+	map = kmalloc(map_lookup_size(ctl.num_stripes), GFP_NOFS);
 	if (!map) {
 		ret = -ENOMEM;
 		goto error;
 	}
-	map->num_stripes = num_stripes;
+	map->num_stripes = ctl.num_stripes;
 
-	for (i = 0; i < ndevs; ++i) {
-		for (j = 0; j < dev_stripes; ++j) {
-			int s = i * dev_stripes + j;
+	for (i = 0; i < ctl.ndevs; ++i) {
+		for (j = 0; j < ctl.dev_stripes; ++j) {
+			int s = i * ctl.dev_stripes + j;
 			map->stripes[s].dev = devices_info[i].dev;
 			map->stripes[s].physical = devices_info[i].dev_offset +
-						   j * stripe_size;
+						   j * ctl.stripe_size;
 		}
 	}
 	map->stripe_len = BTRFS_STRIPE_LEN;
 	map->io_align = BTRFS_STRIPE_LEN;
 	map->io_width = BTRFS_STRIPE_LEN;
 	map->type = type;
-	map->sub_stripes = sub_stripes;
+	map->sub_stripes = ctl.sub_stripes;
 
-	chunk_size = stripe_size * data_stripes;
+	ctl.chunk_size = ctl.stripe_size * data_stripes;
 
-	trace_btrfs_chunk_alloc(info, map, start, chunk_size);
+	trace_btrfs_chunk_alloc(info, map, start, ctl.chunk_size);
 
 	em = alloc_extent_map();
 	if (!em) {
@@ -5052,10 +5069,10 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
 	em->map_lookup = map;
 	em->start = start;
-	em->len = chunk_size;
+	em->len = ctl.chunk_size;
 	em->block_start = 0;
 	em->block_len = em->len;
-	em->orig_block_len = stripe_size;
+	em->orig_block_len = ctl.stripe_size;
 
 	em_tree = &info->mapping_tree;
 	write_lock(&em_tree->lock);
@@ -5067,20 +5084,22 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	}
 	write_unlock(&em_tree->lock);
 
-	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
+	ret = btrfs_make_block_group(trans, 0, type, start, ctl.chunk_size);
 	if (ret)
 		goto error_del_extent;
 
 	for (i = 0; i < map->num_stripes; i++) {
 		struct btrfs_device *dev = map->stripes[i].dev;
 
-		btrfs_device_set_bytes_used(dev, dev->bytes_used + stripe_size);
+		btrfs_device_set_bytes_used(dev,
+					    dev->bytes_used + ctl.stripe_size);
 		if (list_empty(&dev->post_commit_list))
 			list_add_tail(&dev->post_commit_list,
 				      &trans->transaction->dev_update_list);
 	}
 
-	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
+	atomic64_sub(ctl.stripe_size * map->num_stripes,
+		     &info->free_chunk_space);
 
 	free_extent_map(em);
 	check_raid56_incompat_flag(info, type);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/20] btrfs: factor out set_parameters()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (3 preceding siblings ...)
  2020-02-06 10:41 ` [PATCH 04/20] btrfs: introduce alloc_chunk_ctl Naohiro Aota
@ 2020-02-06 10:41 ` Naohiro Aota
  2020-02-06 13:51   ` Johannes Thumshirn
  2020-02-06 16:40   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 06/20] btrfs: factor out gather_device_info() Naohiro Aota
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:41 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out set_parameters() from __btrfs_alloc_chunk(). This function
initialises parameters of "struct alloc_chunk_ctl" for allocation.
set_parameters() handles a common part of the initialisation to load the
RAID parameters from btrfs_raid_array. set_parameters_regular() decides
some parameters for its allocation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 96 ++++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cfde302bf297..a5d6f0b5ca70 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4841,6 +4841,60 @@ struct alloc_chunk_ctl {
 	int ndevs;
 };
 
+static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
+				   struct alloc_chunk_ctl *ctl)
+{
+	u64 type = ctl->type;
+
+	if (type & BTRFS_BLOCK_GROUP_DATA) {
+		ctl->max_stripe_size = SZ_1G;
+		ctl->max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
+	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
+		/* for larger filesystems, use larger metadata chunks */
+		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
+			ctl->max_stripe_size = SZ_1G;
+		else
+			ctl->max_stripe_size = SZ_256M;
+		ctl->max_chunk_size = ctl->max_stripe_size;
+	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
+		ctl->max_stripe_size = SZ_32M;
+		ctl->max_chunk_size = 2 * ctl->max_stripe_size;
+		ctl->devs_max = min_t(int, ctl->devs_max,
+				      BTRFS_MAX_DEVS_SYS_CHUNK);
+	} else {
+		BUG();
+	}
+
+	/* We don't want a chunk larger than 10% of writable space */
+	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
+				  ctl->max_chunk_size);
+}
+
+static void set_parameters(struct btrfs_fs_devices *fs_devices,
+			   struct alloc_chunk_ctl *ctl)
+{
+	int index = btrfs_bg_flags_to_raid_index(ctl->type);
+
+	ctl->sub_stripes = btrfs_raid_array[index].sub_stripes;
+	ctl->dev_stripes = btrfs_raid_array[index].dev_stripes;
+	ctl->devs_max = btrfs_raid_array[index].devs_max;
+	if (!ctl->devs_max)
+		ctl->devs_max = BTRFS_MAX_DEVS(fs_devices->fs_info);
+	ctl->devs_min = btrfs_raid_array[index].devs_min;
+	ctl->devs_increment = btrfs_raid_array[index].devs_increment;
+	ctl->ncopies = btrfs_raid_array[index].ncopies;
+	ctl->nparity = btrfs_raid_array[index].nparity;
+	ctl->ndevs = 0;
+
+	switch (fs_devices->chunk_alloc_policy) {
+	case BTRFS_CHUNK_ALLOC_REGULAR:
+		set_parameters_regular(fs_devices, ctl);
+		break;
+	default:
+		BUG();
+	}
+}
+
 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			       u64 start, u64 type)
 {
@@ -4859,7 +4913,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int ndevs;
 	int i;
 	int j;
-	int index;
 
 	BUG_ON(!alloc_profile_is_valid(type, 0));
 
@@ -4869,45 +4922,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		return -ENOSPC;
 	}
 
-	ctl.start = start;
-	ctl.type = type;
-
-	index = btrfs_bg_flags_to_raid_index(type);
-
-	ctl.sub_stripes = btrfs_raid_array[index].sub_stripes;
-	ctl.dev_stripes = btrfs_raid_array[index].dev_stripes;
-	ctl.devs_max = btrfs_raid_array[index].devs_max;
-	if (!ctl.devs_max)
-		ctl.devs_max = BTRFS_MAX_DEVS(info);
-	ctl.devs_min = btrfs_raid_array[index].devs_min;
-	ctl.devs_increment = btrfs_raid_array[index].devs_increment;
-	ctl.ncopies = btrfs_raid_array[index].ncopies;
-	ctl.nparity = btrfs_raid_array[index].nparity;
-
-	if (type & BTRFS_BLOCK_GROUP_DATA) {
-		ctl.max_stripe_size = SZ_1G;
-		ctl.max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
-	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
-		/* for larger filesystems, use larger metadata chunks */
-		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
-			ctl.max_stripe_size = SZ_1G;
-		else
-			ctl.max_stripe_size = SZ_256M;
-		ctl.max_chunk_size = ctl.max_stripe_size;
-	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
-		ctl.max_stripe_size = SZ_32M;
-		ctl.max_chunk_size = 2 * ctl.max_stripe_size;
-		ctl.devs_max = min_t(int, ctl.devs_max,
-				      BTRFS_MAX_DEVS_SYS_CHUNK);
-	} else {
-		btrfs_err(info, "invalid chunk type 0x%llx requested",
-		       type);
+	if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+		btrfs_err(info, "invalid chunk type 0x%llx requested", type);
 		BUG();
 	}
 
-	/* We don't want a chunk larger than 10% of writable space */
-	ctl.max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
-				  ctl.max_chunk_size);
+	ctl.start = start;
+	ctl.type = type;
+	set_parameters(fs_devices, &ctl);
 
 	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
 			       GFP_NOFS);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/20] btrfs: factor out gather_device_info()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (4 preceding siblings ...)
  2020-02-06 10:41 ` [PATCH 05/20] btrfs: factor out set_parameters() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 15:43   ` Johannes Thumshirn
  2020-02-06 16:44   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 07/20] btrfs: factor out decide_stripe_size() Naohiro Aota
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out gather_device_info() from __btrfs_alloc_chunk(). This function
iterates over devices list and gather information about devices. This
commit also introduces "max_avail" and "dev_extent_min" to fold the same
calculation to one variable.

This commit has no functional changes.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 113 +++++++++++++++++++++++++--------------------
 1 file changed, 63 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a5d6f0b5ca70..02bd86d126ff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4895,56 +4895,25 @@ static void set_parameters(struct btrfs_fs_devices *fs_devices,
 	}
 }
 
-static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
-			       u64 start, u64 type)
+static int gather_device_info(struct btrfs_fs_devices *fs_devices,
+			      struct alloc_chunk_ctl *ctl,
+			      struct btrfs_device_info *devices_info)
 {
-	struct btrfs_fs_info *info = trans->fs_info;
-	struct btrfs_fs_devices *fs_devices = info->fs_devices;
+	struct btrfs_fs_info *info = fs_devices->fs_info;
 	struct btrfs_device *device;
-	struct map_lookup *map = NULL;
-	struct extent_map_tree *em_tree;
-	struct extent_map *em;
-	struct btrfs_device_info *devices_info = NULL;
-	struct alloc_chunk_ctl ctl;
 	u64 total_avail;
-	int data_stripes;	/* number of stripes that count for
-				   block group size */
+	u64 dev_extent_want = ctl->max_stripe_size * ctl->dev_stripes;
+	u64 dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
 	int ret;
-	int ndevs;
-	int i;
-	int j;
-
-	BUG_ON(!alloc_profile_is_valid(type, 0));
-
-	if (list_empty(&fs_devices->alloc_list)) {
-		if (btrfs_test_opt(info, ENOSPC_DEBUG))
-			btrfs_debug(info, "%s: no writable device", __func__);
-		return -ENOSPC;
-	}
-
-	if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
-		btrfs_err(info, "invalid chunk type 0x%llx requested", type);
-		BUG();
-	}
-
-	ctl.start = start;
-	ctl.type = type;
-	set_parameters(fs_devices, &ctl);
-
-	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
-			       GFP_NOFS);
-	if (!devices_info)
-		return -ENOMEM;
+	int ndevs = 0;
+	u64 max_avail;
+	u64 dev_offset;
 
 	/*
 	 * in the first pass through the devices list, we gather information
 	 * about the available holes on each device.
 	 */
-	ndevs = 0;
 	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
-		u64 max_avail;
-		u64 dev_offset;
-
 		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
 			WARN(1, KERN_ERR
 			       "BTRFS: read-only device in alloc_list\n");
@@ -4965,21 +4934,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		if (total_avail == 0)
 			continue;
 
-		ret = find_free_dev_extent(device,
-				ctl.max_stripe_size * ctl.dev_stripes,
-					   &dev_offset, &max_avail);
+		ret = find_free_dev_extent(device, dev_extent_want, &dev_offset,
+					   &max_avail);
 		if (ret && ret != -ENOSPC)
-			goto error;
+			return ret;
 
 		if (ret == 0)
-			max_avail = ctl.max_stripe_size * ctl.dev_stripes;
+			max_avail = dev_extent_want;
 
-		if (max_avail < BTRFS_STRIPE_LEN * ctl.dev_stripes) {
+		if (max_avail < dev_extent_min) {
 			if (btrfs_test_opt(info, ENOSPC_DEBUG))
 				btrfs_debug(info,
-			"%s: devid %llu has no free space, have=%llu want=%u",
+			"%s: devid %llu has no free space, have=%llu want=%llu",
 					    __func__, device->devid, max_avail,
-				BTRFS_STRIPE_LEN * ctl.dev_stripes);
+					    dev_extent_min);
 			continue;
 		}
 
@@ -4994,14 +4962,59 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		devices_info[ndevs].dev = device;
 		++ndevs;
 	}
-	ctl.ndevs = ndevs;
+	ctl->ndevs = ndevs;
 
 	/*
 	 * now sort the devices by hole size / available space
 	 */
-	sort(devices_info, ctl.ndevs, sizeof(struct btrfs_device_info),
+	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 	     btrfs_cmp_device_info, NULL);
 
+	return 0;
+}
+
+static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
+			       u64 start, u64 type)
+{
+	struct btrfs_fs_info *info = trans->fs_info;
+	struct btrfs_fs_devices *fs_devices = info->fs_devices;
+	struct map_lookup *map = NULL;
+	struct extent_map_tree *em_tree;
+	struct extent_map *em;
+	struct btrfs_device_info *devices_info = NULL;
+	struct alloc_chunk_ctl ctl;
+	int data_stripes;	/* number of stripes that count for
+				   block group size */
+	int ret;
+	int i;
+	int j;
+
+	BUG_ON(!alloc_profile_is_valid(type, 0));
+
+	if (list_empty(&fs_devices->alloc_list)) {
+		if (btrfs_test_opt(info, ENOSPC_DEBUG))
+			btrfs_debug(info, "%s: no writable device", __func__);
+		return -ENOSPC;
+	}
+
+	if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+		btrfs_err(info, "invalid chunk type 0x%llx requested", type);
+		BUG();
+	}
+
+	ctl.start = start;
+	ctl.type = type;
+	set_parameters(fs_devices, &ctl);
+
+	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
+			       GFP_NOFS);
+	if (!devices_info)
+		return -ENOMEM;
+
+	ret = gather_device_info(fs_devices, &ctl, devices_info);
+	if (ret < 0)
+		goto error;
+
 	/*
 	 * Round down to number of usable stripes, devs_increment can be any
 	 * number so we can't use round_down()
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/20] btrfs: factor out decide_stripe_size()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (5 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 06/20] btrfs: factor out gather_device_info() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 15:59   ` Johannes Thumshirn
  2020-02-06 16:47   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 08/20] btrfs: factor out create_chunk() Naohiro Aota
                   ` (13 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out decide_stripe_size() from __btrfs_alloc_chunk(). This function
calculates the actual stripe size to allocate. decide_stripe_size() handles
the common case to round down the 'ndevs' to 'devs_increment' and check the
upper and lower limitation of 'ndevs'. decide_stripe_size_regular() decides
the size of a stripe and the size of a chunk. The policy is to maximize the
number of stripes.

This commit has no functional changes.


Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 137 ++++++++++++++++++++++++++-------------------
 1 file changed, 80 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 02bd86d126ff..85c01df26852 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4973,6 +4973,84 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 	return 0;
 }
 
+static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl,
+				      struct btrfs_device_info *devices_info)
+{
+	int data_stripes;	/* number of stripes that count for
+				   block group size */
+
+	/*
+	 * The primary goal is to maximize the number of stripes, so use as
+	 * many devices as possible, even if the stripes are not maximum sized.
+	 *
+	 * The DUP profile stores more than one stripe per device, the
+	 * max_avail is the total size so we have to adjust.
+	 */
+	ctl->stripe_size = div_u64(devices_info[ctl->ndevs - 1].max_avail,
+				   ctl->dev_stripes);
+	ctl->num_stripes = ctl->ndevs * ctl->dev_stripes;
+
+	/*
+	 * this will have to be fixed for RAID1 and RAID10 over
+	 * more drives
+	 */
+	data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies;
+
+	/*
+	 * Use the number of data stripes to figure out how big this chunk
+	 * is really going to be in terms of logical address space,
+	 * and compare that answer with the max chunk size. If it's higher,
+	 * we try to reduce stripe_size.
+	 */
+	if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) {
+		/*
+		 * Reduce stripe_size, round it up to a 16MB boundary again and
+		 * then use it, unless it ends up being even bigger than the
+		 * previous value we had already.
+		 */
+		ctl->stripe_size = min(round_up(div_u64(ctl->max_chunk_size,
+							data_stripes), SZ_16M),
+				       ctl->stripe_size);
+	}
+
+	/* align to BTRFS_STRIPE_LEN */
+	ctl->stripe_size = round_down(ctl->stripe_size, BTRFS_STRIPE_LEN);
+	ctl->chunk_size = ctl->stripe_size * data_stripes;
+
+	return 0;
+}
+
+static int decide_stripe_size(struct btrfs_fs_devices *fs_devices,
+			      struct alloc_chunk_ctl *ctl,
+			      struct btrfs_device_info *devices_info)
+{
+	struct btrfs_fs_info *info = fs_devices->fs_info;
+
+	/*
+	 * Round down to number of usable stripes, devs_increment can be any
+	 * number so we can't use round_down()
+	 */
+	ctl->ndevs -= ctl->ndevs % ctl->devs_increment;
+
+	if (ctl->ndevs < ctl->devs_min) {
+		if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
+			btrfs_debug(info,
+	"%s: not enough devices with free space: have=%d minimum required=%d",
+				    __func__, ctl->ndevs, ctl->devs_min);
+		}
+		return -ENOSPC;
+	}
+
+	ctl->ndevs = min(ctl->ndevs, ctl->devs_max);
+
+	switch (fs_devices->chunk_alloc_policy) {
+	case BTRFS_CHUNK_ALLOC_REGULAR:
+		return decide_stripe_size_regular(ctl, devices_info);
+	default:
+		BUG();
+	}
+}
+
 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			       u64 start, u64 type)
 {
@@ -4983,8 +5061,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	struct extent_map *em;
 	struct btrfs_device_info *devices_info = NULL;
 	struct alloc_chunk_ctl ctl;
-	int data_stripes;	/* number of stripes that count for
-				   block group size */
 	int ret;
 	int i;
 	int j;
@@ -5015,60 +5091,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	if (ret < 0)
 		goto error;
 
-	/*
-	 * Round down to number of usable stripes, devs_increment can be any
-	 * number so we can't use round_down()
-	 */
-	ctl.ndevs -= ctl.ndevs % ctl.devs_increment;
-
-	if (ctl.ndevs < ctl.devs_min) {
-		ret = -ENOSPC;
-		if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
-			btrfs_debug(info,
-	"%s: not enough devices with free space: have=%d minimum required=%d",
-				    __func__, ctl.ndevs, ctl.devs_min);
-		}
+	ret = decide_stripe_size(fs_devices, &ctl, devices_info);
+	if (ret < 0)
 		goto error;
-	}
-
-	ctl.ndevs = min(ctl.ndevs, ctl.devs_max);
-
-	/*
-	 * The primary goal is to maximize the number of stripes, so use as
-	 * many devices as possible, even if the stripes are not maximum sized.
-	 *
-	 * The DUP profile stores more than one stripe per device, the
-	 * max_avail is the total size so we have to adjust.
-	 */
-	ctl.stripe_size = div_u64(devices_info[ctl.ndevs - 1].max_avail,
-				   ctl.dev_stripes);
-	ctl.num_stripes = ctl.ndevs * ctl.dev_stripes;
-
-	/*
-	 * this will have to be fixed for RAID1 and RAID10 over
-	 * more drives
-	 */
-	data_stripes = (ctl.num_stripes - ctl.nparity) / ctl.ncopies;
-
-	/*
-	 * Use the number of data stripes to figure out how big this chunk
-	 * is really going to be in terms of logical address space,
-	 * and compare that answer with the max chunk size. If it's higher,
-	 * we try to reduce stripe_size.
-	 */
-	if (ctl.stripe_size * data_stripes > ctl.max_chunk_size) {
-		/*
-		 * Reduce stripe_size, round it up to a 16MB boundary again and
-		 * then use it, unless it ends up being even bigger than the
-		 * previous value we had already.
-		 */
-		ctl.stripe_size = min(round_up(div_u64(ctl.max_chunk_size,
-						   data_stripes), SZ_16M),
-				       ctl.stripe_size);
-	}
-
-	/* align to BTRFS_STRIPE_LEN */
-	ctl.stripe_size = round_down(ctl.stripe_size, BTRFS_STRIPE_LEN);
 
 	map = kmalloc(map_lookup_size(ctl.num_stripes), GFP_NOFS);
 	if (!map) {
@@ -5091,8 +5116,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	map->type = type;
 	map->sub_stripes = ctl.sub_stripes;
 
-	ctl.chunk_size = ctl.stripe_size * data_stripes;
-
 	trace_btrfs_chunk_alloc(info, map, start, ctl.chunk_size);
 
 	em = alloc_extent_map();
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/20] btrfs: factor out create_chunk()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (6 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 07/20] btrfs: factor out decide_stripe_size() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 16:49   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 09/20] btrfs: parameterize dev_extent_min Naohiro Aota
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out create_chunk() from __btrfs_alloc_chunk(). This function finally
creates a chunk. There is no functional changes.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 126 +++++++++++++++++++++++++--------------------
 1 file changed, 71 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 85c01df26852..15837374db9c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5051,86 +5051,53 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices,
 	}
 }
 
-static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
-			       u64 start, u64 type)
+static int create_chunk(struct btrfs_trans_handle *trans,
+			struct alloc_chunk_ctl *ctl,
+			struct btrfs_device_info *devices_info)
 {
 	struct btrfs_fs_info *info = trans->fs_info;
-	struct btrfs_fs_devices *fs_devices = info->fs_devices;
 	struct map_lookup *map = NULL;
 	struct extent_map_tree *em_tree;
 	struct extent_map *em;
-	struct btrfs_device_info *devices_info = NULL;
-	struct alloc_chunk_ctl ctl;
+	u64 start = ctl->start;
+	u64 type = ctl->type;
 	int ret;
 	int i;
 	int j;
 
-	BUG_ON(!alloc_profile_is_valid(type, 0));
-
-	if (list_empty(&fs_devices->alloc_list)) {
-		if (btrfs_test_opt(info, ENOSPC_DEBUG))
-			btrfs_debug(info, "%s: no writable device", __func__);
-		return -ENOSPC;
-	}
-
-	if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
-		btrfs_err(info, "invalid chunk type 0x%llx requested", type);
-		BUG();
-	}
-
-	ctl.start = start;
-	ctl.type = type;
-	set_parameters(fs_devices, &ctl);
-
-	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
-			       GFP_NOFS);
-	if (!devices_info)
+	map = kmalloc(map_lookup_size(ctl->num_stripes), GFP_NOFS);
+	if (!map)
 		return -ENOMEM;
+	map->num_stripes = ctl->num_stripes;
 
-	ret = gather_device_info(fs_devices, &ctl, devices_info);
-	if (ret < 0)
-		goto error;
-
-	ret = decide_stripe_size(fs_devices, &ctl, devices_info);
-	if (ret < 0)
-		goto error;
-
-	map = kmalloc(map_lookup_size(ctl.num_stripes), GFP_NOFS);
-	if (!map) {
-		ret = -ENOMEM;
-		goto error;
-	}
-	map->num_stripes = ctl.num_stripes;
-
-	for (i = 0; i < ctl.ndevs; ++i) {
-		for (j = 0; j < ctl.dev_stripes; ++j) {
-			int s = i * ctl.dev_stripes + j;
+	for (i = 0; i < ctl->ndevs; ++i) {
+		for (j = 0; j < ctl->dev_stripes; ++j) {
+			int s = i * ctl->dev_stripes + j;
 			map->stripes[s].dev = devices_info[i].dev;
 			map->stripes[s].physical = devices_info[i].dev_offset +
-						   j * ctl.stripe_size;
+						   j * ctl->stripe_size;
 		}
 	}
 	map->stripe_len = BTRFS_STRIPE_LEN;
 	map->io_align = BTRFS_STRIPE_LEN;
 	map->io_width = BTRFS_STRIPE_LEN;
 	map->type = type;
-	map->sub_stripes = ctl.sub_stripes;
+	map->sub_stripes = ctl->sub_stripes;
 
-	trace_btrfs_chunk_alloc(info, map, start, ctl.chunk_size);
+	trace_btrfs_chunk_alloc(info, map, start, ctl->chunk_size);
 
 	em = alloc_extent_map();
 	if (!em) {
 		kfree(map);
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
 	em->map_lookup = map;
 	em->start = start;
-	em->len = ctl.chunk_size;
+	em->len = ctl->chunk_size;
 	em->block_start = 0;
 	em->block_len = em->len;
-	em->orig_block_len = ctl.stripe_size;
+	em->orig_block_len = ctl->stripe_size;
 
 	em_tree = &info->mapping_tree;
 	write_lock(&em_tree->lock);
@@ -5138,11 +5105,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	if (ret) {
 		write_unlock(&em_tree->lock);
 		free_extent_map(em);
-		goto error;
+		return ret;
 	}
 	write_unlock(&em_tree->lock);
 
-	ret = btrfs_make_block_group(trans, 0, type, start, ctl.chunk_size);
+	ret = btrfs_make_block_group(trans, 0, type, start, ctl->chunk_size);
 	if (ret)
 		goto error_del_extent;
 
@@ -5150,20 +5117,19 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		struct btrfs_device *dev = map->stripes[i].dev;
 
 		btrfs_device_set_bytes_used(dev,
-					    dev->bytes_used + ctl.stripe_size);
+					    dev->bytes_used + ctl->stripe_size);
 		if (list_empty(&dev->post_commit_list))
 			list_add_tail(&dev->post_commit_list,
 				      &trans->transaction->dev_update_list);
 	}
 
-	atomic64_sub(ctl.stripe_size * map->num_stripes,
+	atomic64_sub(ctl->stripe_size * map->num_stripes,
 		     &info->free_chunk_space);
 
 	free_extent_map(em);
 	check_raid56_incompat_flag(info, type);
 	check_raid1c34_incompat_flag(info, type);
 
-	kfree(devices_info);
 	return 0;
 
 error_del_extent:
@@ -5175,6 +5141,56 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	free_extent_map(em);
 	/* One for the tree reference */
 	free_extent_map(em);
+
+	return ret;
+}
+
+static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
+			       u64 start, u64 type)
+{
+	struct btrfs_fs_info *info = trans->fs_info;
+	struct btrfs_fs_devices *fs_devices = info->fs_devices;
+	struct btrfs_device_info *devices_info = NULL;
+	struct alloc_chunk_ctl ctl;
+	int ret;
+
+	BUG_ON(!alloc_profile_is_valid(type, 0));
+
+	if (list_empty(&fs_devices->alloc_list)) {
+		if (btrfs_test_opt(info, ENOSPC_DEBUG))
+			btrfs_debug(info, "%s: no writable device", __func__);
+		return -ENOSPC;
+	}
+
+	if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+		btrfs_err(info, "invalid chunk type 0x%llx requested", type);
+		BUG();
+	}
+
+	ctl.start = start;
+	ctl.type = type;
+	set_parameters(fs_devices, &ctl);
+
+	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
+			       GFP_NOFS);
+	if (!devices_info)
+		return -ENOMEM;
+
+	ret = gather_device_info(fs_devices, &ctl, devices_info);
+	if (ret < 0)
+		goto error;
+
+	ret = decide_stripe_size(fs_devices, &ctl, devices_info);
+	if (ret < 0)
+		goto error;
+
+	ret = create_chunk(trans, &ctl, devices_info);
+	if (ret < 0)
+		goto error;
+
+	kfree(devices_info);
+	return 0;
+
 error:
 	kfree(devices_info);
 	return ret;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/20] btrfs: parameterize dev_extent_min
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (7 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 08/20] btrfs: factor out create_chunk() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 16:52   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 10/20] btrfs: introduce extent allocation policy Naohiro Aota
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Currently, we ignore a device whose available space is less than
"BTRFS_STRIPE_LEN * dev_stripes". This is a lower limit for current
allocation policy (to maximize the number of stripes). This commit
parameterizes dev_extent_min, so that other policies can set their own
lower limitation to ignore a device with an insufficient space.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/volumes.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 15837374db9c..4a6cc098ee3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4836,6 +4836,7 @@ struct alloc_chunk_ctl {
 				   store parity information */
 	u64 max_stripe_size;
 	u64 max_chunk_size;
+	u64 dev_extent_min;
 	u64 stripe_size;
 	u64 chunk_size;
 	int ndevs;
@@ -4868,6 +4869,7 @@ static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
 	/* We don't want a chunk larger than 10% of writable space */
 	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
 				  ctl->max_chunk_size);
+	ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
 }
 
 static void set_parameters(struct btrfs_fs_devices *fs_devices,
@@ -4903,7 +4905,6 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 	struct btrfs_device *device;
 	u64 total_avail;
 	u64 dev_extent_want = ctl->max_stripe_size * ctl->dev_stripes;
-	u64 dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
 	int ret;
 	int ndevs = 0;
 	u64 max_avail;
@@ -4931,7 +4932,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 			total_avail = 0;
 
 		/* If there is no space on this device, skip it. */
-		if (total_avail == 0)
+		if (total_avail < ctl->dev_extent_min)
 			continue;
 
 		ret = find_free_dev_extent(device, dev_extent_want, &dev_offset,
@@ -4942,12 +4943,12 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 		if (ret == 0)
 			max_avail = dev_extent_want;
 
-		if (max_avail < dev_extent_min) {
+		if (max_avail < ctl->dev_extent_min) {
 			if (btrfs_test_opt(info, ENOSPC_DEBUG))
 				btrfs_debug(info,
 			"%s: devid %llu has no free space, have=%llu want=%llu",
 					    __func__, device->devid, max_avail,
-					    dev_extent_min);
+					    ctl->dev_extent_min);
 			continue;
 		}
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/20] btrfs: introduce extent allocation policy
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (8 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 09/20] btrfs: parameterize dev_extent_min Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 11/20] btrfs: move hint_byte into find_free_extent_ctl Naohiro Aota
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

This commit introduces extent allocation policy for btrfs. This policy
controls how btrfs allocate an extents from block groups.

There is no functional change introduced with this commit.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 227d4d628b90..247d68eb4735 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3438,6 +3438,10 @@ btrfs_release_block_group(struct btrfs_block_group *cache,
 	btrfs_put_block_group(cache);
 }
 
+enum btrfs_extent_allocation_policy {
+	BTRFS_EXTENT_ALLOC_CLUSTERED,
+};
+
 /*
  * Structure used internally for find_free_extent() function.  Wraps needed
  * parameters.
@@ -3489,6 +3493,9 @@ struct find_free_extent_ctl {
 
 	/* Found result */
 	u64 found_offset;
+
+	/* Allocation policy */
+	enum btrfs_extent_allocation_policy policy;
 };
 
 
@@ -3826,6 +3833,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	ffe_ctl.have_caching_bg = false;
 	ffe_ctl.orig_have_caching_bg = false;
 	ffe_ctl.found_offset = 0;
+	ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED;
 
 	ins->type = BTRFS_EXTENT_ITEM_KEY;
 	ins->objectid = 0;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/20] btrfs: move hint_byte into find_free_extent_ctl
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (9 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 10/20] btrfs: introduce extent allocation policy Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 12/20] btrfs: introduce clustered_alloc_info Naohiro Aota
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

This commit moves hint_byte into find_free_extent_ctl, so that we can
modify the hint_byte in the other functions. This will help us split
find_free_extent further. This commit also renames the function argument
"hint_byte" to "hint_byte_orig" to avoid misuse.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 247d68eb4735..b1f52eee24fe 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3494,6 +3494,9 @@ struct find_free_extent_ctl {
 	/* Found result */
 	u64 found_offset;
 
+	/* Hint byte to start looking for an empty space */
+	u64 hint_byte;
+
 	/* Allocation policy */
 	enum btrfs_extent_allocation_policy policy;
 };
@@ -3808,7 +3811,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
  */
 static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 				u64 ram_bytes, u64 num_bytes, u64 empty_size,
-				u64 hint_byte, struct btrfs_key *ins,
+				u64 hint_byte_orig, struct btrfs_key *ins,
 				u64 flags, int delalloc)
 {
 	int ret = 0;
@@ -3833,6 +3836,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	ffe_ctl.have_caching_bg = false;
 	ffe_ctl.orig_have_caching_bg = false;
 	ffe_ctl.found_offset = 0;
+	ffe_ctl.hint_byte = hint_byte_orig;
 	ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED;
 
 	ins->type = BTRFS_EXTENT_ITEM_KEY;
@@ -3875,14 +3879,14 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	if (last_ptr) {
 		spin_lock(&last_ptr->lock);
 		if (last_ptr->block_group)
-			hint_byte = last_ptr->window_start;
+			ffe_ctl.hint_byte = last_ptr->window_start;
 		if (last_ptr->fragmented) {
 			/*
 			 * We still set window_start so we can keep track of the
 			 * last place we found an allocation to try and save
 			 * some time.
 			 */
-			hint_byte = last_ptr->window_start;
+			ffe_ctl.hint_byte = last_ptr->window_start;
 			use_cluster = false;
 		}
 		spin_unlock(&last_ptr->lock);
@@ -3890,8 +3894,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 
 	ffe_ctl.search_start = max(ffe_ctl.search_start,
 				   first_logical_byte(fs_info, 0));
-	ffe_ctl.search_start = max(ffe_ctl.search_start, hint_byte);
-	if (ffe_ctl.search_start == hint_byte) {
+	ffe_ctl.search_start = max(ffe_ctl.search_start, ffe_ctl.hint_byte);
+	if (ffe_ctl.search_start == ffe_ctl.hint_byte) {
 		block_group = btrfs_lookup_block_group(fs_info,
 						       ffe_ctl.search_start);
 		/*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/20] btrfs: introduce clustered_alloc_info
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (10 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 11/20] btrfs: move hint_byte into find_free_extent_ctl Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 12:44   ` Su Yue
  2020-02-06 17:01   ` Josef Bacik
  2020-02-06 10:42 ` [PATCH 13/20] btrfs: factor out do_allocation() Naohiro Aota
                   ` (8 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Introduce struct clustered_alloc_info to manage parameters related to
clustered allocation. By separating clustered_alloc_info and
find_free_extent_ctl, we can introduce other allocation policy. One can
access per-allocation policy private information from "alloc_info" of
struct find_free_extent_ctl.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b1f52eee24fe..8124a6461043 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3456,9 +3456,6 @@ struct find_free_extent_ctl {
 	/* Where to start the search inside the bg */
 	u64 search_start;
 
-	/* For clustered allocation */
-	u64 empty_cluster;
-
 	bool have_caching_bg;
 	bool orig_have_caching_bg;
 
@@ -3470,18 +3467,6 @@ struct find_free_extent_ctl {
 	 */
 	int loop;
 
-	/*
-	 * Whether we're refilling a cluster, if true we need to re-search
-	 * current block group but don't try to refill the cluster again.
-	 */
-	bool retry_clustered;
-
-	/*
-	 * Whether we're updating free space cache, if true we need to re-search
-	 * current block group but don't try updating free space cache again.
-	 */
-	bool retry_unclustered;
-
 	/* If current block group is cached */
 	int cached;
 
@@ -3499,8 +3484,28 @@ struct find_free_extent_ctl {
 
 	/* Allocation policy */
 	enum btrfs_extent_allocation_policy policy;
+	void *alloc_info;
 };
 
+struct clustered_alloc_info {
+	/* For clustered allocation */
+	u64 empty_cluster;
+
+	/*
+	 * Whether we're refilling a cluster, if true we need to re-search
+	 * current block group but don't try to refill the cluster again.
+	 */
+	bool retry_clustered;
+
+	/*
+	 * Whether we're updating free space cache, if true we need to re-search
+	 * current block group but don't try updating free space cache again.
+	 */
+	bool retry_unclustered;
+
+	struct btrfs_free_cluster *last_ptr;
+	bool use_cluster;
+};
 
 /*
  * Helper function for find_free_extent().
@@ -3516,6 +3521,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
 		struct btrfs_block_group **cluster_bg_ret)
 {
 	struct btrfs_block_group *cluster_bg;
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
 	u64 aligned_cluster;
 	u64 offset;
 	int ret;
@@ -3572,7 +3578,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
 	}
 
 	aligned_cluster = max_t(u64,
-			ffe_ctl->empty_cluster + ffe_ctl->empty_size,
+			clustered->empty_cluster + ffe_ctl->empty_size,
 			bg->full_stripe_len);
 	ret = btrfs_find_space_cluster(bg, last_ptr, ffe_ctl->search_start,
 			ffe_ctl->num_bytes, aligned_cluster);
@@ -3591,12 +3597,12 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
 			return 0;
 		}
 	} else if (!ffe_ctl->cached && ffe_ctl->loop > LOOP_CACHING_NOWAIT &&
-		   !ffe_ctl->retry_clustered) {
+		   !clustered->retry_clustered) {
 		spin_unlock(&last_ptr->refill_lock);
 
-		ffe_ctl->retry_clustered = true;
+		clustered->retry_clustered = true;
 		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
-				ffe_ctl->empty_cluster + ffe_ctl->empty_size);
+				clustered->empty_cluster + ffe_ctl->empty_size);
 		return -EAGAIN;
 	}
 	/*
@@ -3618,6 +3624,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
 		struct btrfs_free_cluster *last_ptr,
 		struct find_free_extent_ctl *ffe_ctl)
 {
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
 	u64 offset;
 
 	/*
@@ -3636,7 +3643,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
 		free_space_ctl = bg->free_space_ctl;
 		spin_lock(&free_space_ctl->tree_lock);
 		if (free_space_ctl->free_space <
-		    ffe_ctl->num_bytes + ffe_ctl->empty_cluster +
+		    ffe_ctl->num_bytes + clustered->empty_cluster +
 		    ffe_ctl->empty_size) {
 			ffe_ctl->total_free_space = max_t(u64,
 					ffe_ctl->total_free_space,
@@ -3660,11 +3667,11 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
 	 * If @retry_unclustered is true then we've already waited on this
 	 * block group once and should move on to the next block group.
 	 */
-	if (!offset && !ffe_ctl->retry_unclustered && !ffe_ctl->cached &&
+	if (!offset && !clustered->retry_unclustered && !ffe_ctl->cached &&
 	    ffe_ctl->loop > LOOP_CACHING_NOWAIT) {
 		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
 						      ffe_ctl->empty_size);
-		ffe_ctl->retry_unclustered = true;
+		clustered->retry_unclustered = true;
 		return -EAGAIN;
 	} else if (!offset) {
 		return 1;
@@ -3685,6 +3692,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 					bool full_search, bool use_cluster)
 {
 	struct btrfs_root *root = fs_info->extent_root;
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
 	int ret;
 
 	if ((ffe_ctl->loop == LOOP_CACHING_NOWAIT) &&
@@ -3774,10 +3782,10 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 			 * no empty_cluster.
 			 */
 			if (ffe_ctl->empty_size == 0 &&
-			    ffe_ctl->empty_cluster == 0)
+			    clustered->empty_cluster == 0)
 				return -ENOSPC;
 			ffe_ctl->empty_size = 0;
-			ffe_ctl->empty_cluster = 0;
+			clustered->empty_cluster = 0;
 		}
 		return 1;
 	}
@@ -3816,11 +3824,10 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 {
 	int ret = 0;
 	int cache_block_group_error = 0;
-	struct btrfs_free_cluster *last_ptr = NULL;
 	struct btrfs_block_group *block_group = NULL;
 	struct find_free_extent_ctl ffe_ctl = {0};
 	struct btrfs_space_info *space_info;
-	bool use_cluster = true;
+	struct clustered_alloc_info *clustered = NULL;
 	bool full_search = false;
 
 	WARN_ON(num_bytes < fs_info->sectorsize);
@@ -3829,8 +3836,6 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	ffe_ctl.empty_size = empty_size;
 	ffe_ctl.flags = flags;
 	ffe_ctl.search_start = 0;
-	ffe_ctl.retry_clustered = false;
-	ffe_ctl.retry_unclustered = false;
 	ffe_ctl.delalloc = delalloc;
 	ffe_ctl.index = btrfs_bg_flags_to_raid_index(flags);
 	ffe_ctl.have_caching_bg = false;
@@ -3851,6 +3856,15 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		return -ENOSPC;
 	}
 
+	clustered = kzalloc(sizeof(*clustered), GFP_NOFS);
+	if (!clustered)
+		return -ENOMEM;
+	clustered->last_ptr = NULL;
+	clustered->use_cluster = true;
+	clustered->retry_clustered = false;
+	clustered->retry_unclustered = false;
+	ffe_ctl.alloc_info = clustered;
+
 	/*
 	 * If our free space is heavily fragmented we may not be able to make
 	 * big contiguous allocations, so instead of doing the expensive search
@@ -3869,14 +3883,16 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 			spin_unlock(&space_info->lock);
 			return -ENOSPC;
 		} else if (space_info->max_extent_size) {
-			use_cluster = false;
+			clustered->use_cluster = false;
 		}
 		spin_unlock(&space_info->lock);
 	}
 
-	last_ptr = fetch_cluster_info(fs_info, space_info,
-				      &ffe_ctl.empty_cluster);
-	if (last_ptr) {
+	clustered->last_ptr = fetch_cluster_info(fs_info, space_info,
+						 &clustered->empty_cluster);
+	if (clustered->last_ptr) {
+		struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
+
 		spin_lock(&last_ptr->lock);
 		if (last_ptr->block_group)
 			ffe_ctl.hint_byte = last_ptr->window_start;
@@ -3887,7 +3903,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 			 * some time.
 			 */
 			ffe_ctl.hint_byte = last_ptr->window_start;
-			use_cluster = false;
+			clustered->use_cluster = false;
 		}
 		spin_unlock(&last_ptr->lock);
 	}
@@ -4000,10 +4016,11 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		 * Ok we want to try and use the cluster allocator, so
 		 * lets look there
 		 */
-		if (last_ptr && use_cluster) {
+		if (clustered->last_ptr && clustered->use_cluster) {
 			struct btrfs_block_group *cluster_bg = NULL;
 
-			ret = find_free_extent_clustered(block_group, last_ptr,
+			ret = find_free_extent_clustered(block_group,
+							 clustered->last_ptr,
 							 &ffe_ctl, &cluster_bg);
 
 			if (ret == 0) {
@@ -4021,7 +4038,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 			/* ret == -ENOENT case falls through */
 		}
 
-		ret = find_free_extent_unclustered(block_group, last_ptr,
+		ret = find_free_extent_unclustered(block_group,
+						   clustered->last_ptr,
 						   &ffe_ctl);
 		if (ret == -EAGAIN)
 			goto have_block_group;
@@ -4062,8 +4080,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		btrfs_release_block_group(block_group, delalloc);
 		break;
 loop:
-		ffe_ctl.retry_clustered = false;
-		ffe_ctl.retry_unclustered = false;
+		clustered->retry_clustered = false;
+		clustered->retry_unclustered = false;
 		BUG_ON(btrfs_bg_flags_to_raid_index(block_group->flags) !=
 		       ffe_ctl.index);
 		btrfs_release_block_group(block_group, delalloc);
@@ -4071,8 +4089,9 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	}
 	up_read(&space_info->groups_sem);
 
-	ret = find_free_extent_update_loop(fs_info, last_ptr, ins, &ffe_ctl,
-					   full_search, use_cluster);
+	ret = find_free_extent_update_loop(fs_info, clustered->last_ptr, ins,
+					   &ffe_ctl, full_search,
+					   clustered->use_cluster);
 	if (ret > 0)
 		goto search;
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/20] btrfs: factor out do_allocation()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (11 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 12/20] btrfs: introduce clustered_alloc_info Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 14/20] btrfs: drop unnecessary arguments from clustered allocation functions Naohiro Aota
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out do_allocation() from find_free_extent(). This function do an
actual allocation in a given block group. The ffe_ctl->policy is used to
determine the actual allocator function to use.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 81 +++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8124a6461043..d033d537a31d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3680,6 +3680,41 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
 	return 0;
 }
 
+static int do_allocation_clustered(struct btrfs_block_group *block_group,
+				   struct find_free_extent_ctl *ffe_ctl,
+				   struct btrfs_block_group **bg_ret)
+{
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
+	int ret;
+
+	/*
+	 * Ok we want to try and use the cluster allocator, so lets look there
+	 */
+	if (clustered->last_ptr && clustered->use_cluster) {
+		ret = find_free_extent_clustered(block_group,
+						 clustered->last_ptr, ffe_ctl,
+						 bg_ret);
+		if (ret >= 0 || ret == -EAGAIN)
+			return ret;
+		/* ret == -ENOENT case falls through */
+	}
+
+	return find_free_extent_unclustered(block_group, clustered->last_ptr,
+					    ffe_ctl);
+}
+
+static int do_allocation(struct btrfs_block_group *block_group,
+			 struct find_free_extent_ctl *ffe_ctl,
+			 struct btrfs_block_group **bg_ret)
+{
+	switch (ffe_ctl->policy) {
+	case BTRFS_EXTENT_ALLOC_CLUSTERED:
+		return do_allocation_clustered(block_group, ffe_ctl, bg_ret);
+	default:
+		BUG();
+	}
+}
+
 /*
  * Return >0 means caller needs to re-search for free extent
  * Return 0 means we have the needed free extent.
@@ -3952,6 +3987,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	down_read(&space_info->groups_sem);
 	list_for_each_entry(block_group,
 			    &space_info->block_groups[ffe_ctl.index], list) {
+		struct btrfs_block_group *bg_ret;
+
 		/* If the block group is read-only, we can skip it entirely. */
 		if (unlikely(block_group->ro))
 			continue;
@@ -4012,41 +4049,21 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		if (unlikely(block_group->cached == BTRFS_CACHE_ERROR))
 			goto loop;
 
-		/*
-		 * Ok we want to try and use the cluster allocator, so
-		 * lets look there
-		 */
-		if (clustered->last_ptr && clustered->use_cluster) {
-			struct btrfs_block_group *cluster_bg = NULL;
-
-			ret = find_free_extent_clustered(block_group,
-							 clustered->last_ptr,
-							 &ffe_ctl, &cluster_bg);
-
-			if (ret == 0) {
-				if (cluster_bg && cluster_bg != block_group) {
-					btrfs_release_block_group(block_group,
-								  delalloc);
-					block_group = cluster_bg;
-				}
-				goto checks;
-			} else if (ret == -EAGAIN) {
-				goto have_block_group;
-			} else if (ret > 0) {
-				goto loop;
+		bg_ret = NULL;
+		ret = do_allocation(block_group, &ffe_ctl, &bg_ret);
+		if (ret == 0) {
+			if (bg_ret && bg_ret != block_group) {
+				btrfs_release_block_group(block_group,
+							  delalloc);
+				block_group = bg_ret;
 			}
-			/* ret == -ENOENT case falls through */
-		}
-
-		ret = find_free_extent_unclustered(block_group,
-						   clustered->last_ptr,
-						   &ffe_ctl);
-		if (ret == -EAGAIN)
+		} else if (ret == -EAGAIN) {
 			goto have_block_group;
-		else if (ret > 0)
+		} else if (ret > 0) {
 			goto loop;
-		/* ret == 0 case falls through */
-checks:
+		}
+
+		/* checks */
 		ffe_ctl.search_start = round_up(ffe_ctl.found_offset,
 					     fs_info->stripesize);
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 14/20] btrfs: drop unnecessary arguments from clustered allocation functions
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (12 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 13/20] btrfs: factor out do_allocation() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 15/20] btrfs: factor out release_block_group() Naohiro Aota
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Now that, find_free_extent_clustered() and find_free_extent_unclustered()
can access "last_ptr" from the "clustered" variable. So, we can drop it
from the arguments.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d033d537a31d..0e1fe83e5d79 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3516,12 +3516,12 @@ struct clustered_alloc_info {
  * Return 0 means we have found a location and set ffe_ctl->found_offset.
  */
 static int find_free_extent_clustered(struct btrfs_block_group *bg,
-		struct btrfs_free_cluster *last_ptr,
 		struct find_free_extent_ctl *ffe_ctl,
 		struct btrfs_block_group **cluster_bg_ret)
 {
 	struct btrfs_block_group *cluster_bg;
 	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
+	struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
 	u64 aligned_cluster;
 	u64 offset;
 	int ret;
@@ -3621,10 +3621,10 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
  * Return -EAGAIN to inform caller that we need to re-search this block group
  */
 static int find_free_extent_unclustered(struct btrfs_block_group *bg,
-		struct btrfs_free_cluster *last_ptr,
 		struct find_free_extent_ctl *ffe_ctl)
 {
 	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
+	struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
 	u64 offset;
 
 	/*
@@ -3691,16 +3691,13 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group,
 	 * Ok we want to try and use the cluster allocator, so lets look there
 	 */
 	if (clustered->last_ptr && clustered->use_cluster) {
-		ret = find_free_extent_clustered(block_group,
-						 clustered->last_ptr, ffe_ctl,
-						 bg_ret);
+		ret = find_free_extent_clustered(block_group, ffe_ctl, bg_ret);
 		if (ret >= 0 || ret == -EAGAIN)
 			return ret;
 		/* ret == -ENOENT case falls through */
 	}
 
-	return find_free_extent_unclustered(block_group, clustered->last_ptr,
-					    ffe_ctl);
+	return find_free_extent_unclustered(block_group, ffe_ctl);
 }
 
 static int do_allocation(struct btrfs_block_group *block_group,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 15/20] btrfs: factor out release_block_group()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (13 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 14/20] btrfs: drop unnecessary arguments from clustered allocation functions Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 16/20] btrfs: factor out found_extent() Naohiro Aota
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out release_block_group() from find_free_extent(). This function is
called when it gives up an allocation from a block group. Allocator hook
functions like release_block_group_clustered() should reset their
information for an allocation in the next block group.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0e1fe83e5d79..9f01c2bf7e11 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3712,6 +3712,31 @@ static int do_allocation(struct btrfs_block_group *block_group,
 	}
 }
 
+static void release_block_group_clustered(struct find_free_extent_ctl *ffe_ctl)
+{
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
+
+	clustered->retry_clustered = false;
+	clustered->retry_unclustered = false;
+}
+
+static void release_block_group(struct btrfs_block_group *block_group,
+				struct find_free_extent_ctl *ffe_ctl,
+				int delalloc)
+{
+	switch (ffe_ctl->policy) {
+	case BTRFS_EXTENT_ALLOC_CLUSTERED:
+		release_block_group_clustered(ffe_ctl);
+		break;
+	default:
+		BUG();
+	}
+
+	BUG_ON(btrfs_bg_flags_to_raid_index(block_group->flags) !=
+		ffe_ctl->index);
+	btrfs_release_block_group(block_group, delalloc);
+}
+
 /*
  * Return >0 means caller needs to re-search for free extent
  * Return 0 means we have the needed free extent.
@@ -4094,11 +4119,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		btrfs_release_block_group(block_group, delalloc);
 		break;
 loop:
-		clustered->retry_clustered = false;
-		clustered->retry_unclustered = false;
-		BUG_ON(btrfs_bg_flags_to_raid_index(block_group->flags) !=
-		       ffe_ctl.index);
-		btrfs_release_block_group(block_group, delalloc);
+		release_block_group(block_group, &ffe_ctl, delalloc);
 		cond_resched();
 	}
 	up_read(&space_info->groups_sem);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 16/20] btrfs: factor out found_extent()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (14 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 15/20] btrfs: factor out release_block_group() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 17/20] btrfs: drop unnecessary arguments from find_free_extent_update_loop() Naohiro Aota
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out found_extent() from find_free_extent_update_loop(). This
function is called when a proper extent is found and before returning from
find_free_extent().  Hook functions like found_extent_clustered() should
save information for a next allocation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9f01c2bf7e11..d70ef18de832 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3737,6 +3737,32 @@ static void release_block_group(struct btrfs_block_group *block_group,
 	btrfs_release_block_group(block_group, delalloc);
 }
 
+static void found_extent_clustered(struct find_free_extent_ctl *ffe_ctl,
+				   struct btrfs_key *ins)
+{
+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
+	struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
+	bool use_cluster = clustered->use_cluster;
+
+	if (!use_cluster && last_ptr) {
+		spin_lock(&last_ptr->lock);
+		last_ptr->window_start = ins->objectid;
+		spin_unlock(&last_ptr->lock);
+	}
+}
+
+static void found_extent(struct find_free_extent_ctl *ffe_ctl,
+			 struct btrfs_key *ins)
+{
+	switch (ffe_ctl->policy) {
+	case BTRFS_EXTENT_ALLOC_CLUSTERED:
+		found_extent_clustered(ffe_ctl, ins);
+		break;
+	default:
+		BUG();
+	}
+}
+
 /*
  * Return >0 means caller needs to re-search for free extent
  * Return 0 means we have the needed free extent.
@@ -3764,11 +3790,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 		return 1;
 
 	if (ins->objectid) {
-		if (!use_cluster && last_ptr) {
-			spin_lock(&last_ptr->lock);
-			last_ptr->window_start = ins->objectid;
-			spin_unlock(&last_ptr->lock);
-		}
+		found_extent(ffe_ctl, ins);
 		return 0;
 	}
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 17/20] btrfs: drop unnecessary arguments from find_free_extent_update_loop()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (15 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 16/20] btrfs: factor out found_extent() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 18/20] btrfs: factor out chunk_allocation_failed() Naohiro Aota
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Now that, we don't use last_ptr and use_cluster in the function. Drop these
arguments from it.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d70ef18de832..03d17639d975 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3769,10 +3769,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl,
  * Return <0 means we failed to locate any free extent.
  */
 static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
-					struct btrfs_free_cluster *last_ptr,
 					struct btrfs_key *ins,
 					struct find_free_extent_ctl *ffe_ctl,
-					bool full_search, bool use_cluster)
+					bool full_search)
 {
 	struct btrfs_root *root = fs_info->extent_root;
 	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
@@ -4146,9 +4145,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	}
 	up_read(&space_info->groups_sem);
 
-	ret = find_free_extent_update_loop(fs_info, clustered->last_ptr, ins,
-					   &ffe_ctl, full_search,
-					   clustered->use_cluster);
+	ret = find_free_extent_update_loop(fs_info, ins, &ffe_ctl, full_search);
 	if (ret > 0)
 		goto search;
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 18/20] btrfs: factor out chunk_allocation_failed()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (16 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 17/20] btrfs: drop unnecessary arguments from find_free_extent_update_loop() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 19/20] btrfs: skip LOOP_NO_EMPTY_SIZE if not clustered allocation Naohiro Aota
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

Factor out chunk_allocation_failed() from find_free_extent_update_loop().
This function is called when it failed to allocate a chunk. The function
can modify "ffe_ctl->loop" and return 0 to continue with the next stage.
Or, it can return -ENOSPC to give up here.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 03d17639d975..123b1a4e797a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3763,6 +3763,21 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl,
 	}
 }
 
+static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl)
+{
+	switch (ffe_ctl->policy) {
+	case BTRFS_EXTENT_ALLOC_CLUSTERED:
+		/*
+		 * If we can't allocate a new chunk we've already looped through
+		 * at least once, move on to the NO_EMPTY_SIZE case.
+		 */
+		ffe_ctl->loop = LOOP_NO_EMPTY_SIZE;
+		return 0;
+	default:
+		BUG();
+	}
+}
+
 /*
  * Return >0 means caller needs to re-search for free extent
  * Return 0 means we have the needed free extent.
@@ -3835,19 +3850,12 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 			ret = btrfs_chunk_alloc(trans, ffe_ctl->flags,
 						CHUNK_ALLOC_FORCE);
 
-			/*
-			 * If we can't allocate a new chunk we've already looped
-			 * through at least once, move on to the NO_EMPTY_SIZE
-			 * case.
-			 */
 			if (ret == -ENOSPC)
-				ffe_ctl->loop = LOOP_NO_EMPTY_SIZE;
+				ret = chunk_allocation_failed(ffe_ctl);
 
 			/* Do not bail out on ENOSPC since we can do more. */
 			if (ret < 0 && ret != -ENOSPC)
 				btrfs_abort_transaction(trans, ret);
-			else
-				ret = 0;
 			if (!exist)
 				btrfs_end_transaction(trans);
 			if (ret)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 19/20] btrfs: skip LOOP_NO_EMPTY_SIZE if not clustered allocation
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (17 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 18/20] btrfs: factor out chunk_allocation_failed() Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 10:42 ` [PATCH 20/20] btrfs: factor out prepare_allocation() Naohiro Aota
  2020-02-06 11:43 ` [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Martin Steigerwald
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

LOOP_NO_EMPTY_SIZE is solely dedicated for clustered allocation. So,
we can skip this stage and go to LOOP_GIVEUP stage to indicate we gave
up the allocation. This commit also moves the scope of the "clustered"
variable.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 123b1a4e797a..2631ce2e123c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3373,6 +3373,7 @@ enum btrfs_loop_type {
 	LOOP_CACHING_WAIT,
 	LOOP_ALLOC_CHUNK,
 	LOOP_NO_EMPTY_SIZE,
+	LOOP_GIVEUP,
 };
 
 static inline void
@@ -3789,7 +3790,6 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 					bool full_search)
 {
 	struct btrfs_root *root = fs_info->extent_root;
-	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
 	int ret;
 
 	if ((ffe_ctl->loop == LOOP_CACHING_NOWAIT) &&
@@ -3863,6 +3863,14 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 		}
 
 		if (ffe_ctl->loop == LOOP_NO_EMPTY_SIZE) {
+			struct clustered_alloc_info *clustered =
+				ffe_ctl->alloc_info;
+
+			if (ffe_ctl->policy != BTRFS_EXTENT_ALLOC_CLUSTERED) {
+				ffe_ctl->loop = LOOP_GIVEUP;
+				return -ENOSPC;
+			}
+
 			/*
 			 * Don't loop again if we already have no empty_size and
 			 * no empty_cluster.
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 20/20] btrfs: factor out prepare_allocation()
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (18 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 19/20] btrfs: skip LOOP_NO_EMPTY_SIZE if not clustered allocation Naohiro Aota
@ 2020-02-06 10:42 ` Naohiro Aota
  2020-02-06 11:43 ` [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Martin Steigerwald
  20 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-06 10:42 UTC (permalink / raw)
  To: linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel,
	Naohiro Aota

This function finally factor out prepare_allocation() form
find_free_extent(). This function is called before the allocation loop and
a specific allocator function like prepare_allocation_clustered() should
initialize their private information and can set proper hint_byte to
indicate where to start the allocation with.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/extent-tree.c | 131 +++++++++++++++++++++++++----------------
 1 file changed, 79 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2631ce2e123c..7742786b4675 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3886,6 +3886,82 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
 	return -ENOSPC;
 }
 
+static int prepare_allocation_clustered(struct btrfs_fs_info *fs_info,
+					struct find_free_extent_ctl *ffe_ctl,
+					struct btrfs_space_info *space_info,
+					struct btrfs_key *ins)
+{
+	struct clustered_alloc_info *clustered;
+
+	clustered = kzalloc(sizeof(*clustered), GFP_NOFS);
+	if (!clustered)
+		return -ENOMEM;
+	clustered->last_ptr = NULL;
+	clustered->use_cluster = true;
+	clustered->retry_clustered = false;
+	clustered->retry_unclustered = false;
+	ffe_ctl->alloc_info = clustered;
+
+	/*
+	 * If our free space is heavily fragmented we may not be able to make
+	 * big contiguous allocations, so instead of doing the expensive search
+	 * for free space, simply return ENOSPC with our max_extent_size so we
+	 * can go ahead and search for a more manageable chunk.
+	 *
+	 * If our max_extent_size is large enough for our allocation simply
+	 * disable clustering since we will likely not be able to find enough
+	 * space to create a cluster and induce latency trying.
+	 */
+	if (unlikely(space_info->max_extent_size)) {
+		spin_lock(&space_info->lock);
+		if (space_info->max_extent_size &&
+		    ffe_ctl->num_bytes > space_info->max_extent_size) {
+			ins->offset = space_info->max_extent_size;
+			spin_unlock(&space_info->lock);
+			return -ENOSPC;
+		} else if (space_info->max_extent_size) {
+			clustered->use_cluster = false;
+		}
+		spin_unlock(&space_info->lock);
+	}
+
+	clustered->last_ptr = fetch_cluster_info(fs_info, space_info,
+						 &clustered->empty_cluster);
+	if (clustered->last_ptr) {
+		struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
+
+		spin_lock(&last_ptr->lock);
+		if (last_ptr->block_group)
+			ffe_ctl->hint_byte = last_ptr->window_start;
+		if (last_ptr->fragmented) {
+			/*
+			 * We still set window_start so we can keep track of the
+			 * last place we found an allocation to try and save
+			 * some time.
+			 */
+			ffe_ctl->hint_byte = last_ptr->window_start;
+			clustered->use_cluster = false;
+		}
+		spin_unlock(&last_ptr->lock);
+	}
+
+	return 0;
+}
+
+static int prepare_allocation(struct btrfs_fs_info *fs_info,
+			      struct find_free_extent_ctl *ffe_ctl,
+			      struct btrfs_space_info *space_info,
+			      struct btrfs_key *ins)
+{
+	switch (ffe_ctl->policy) {
+	case BTRFS_EXTENT_ALLOC_CLUSTERED:
+		return prepare_allocation_clustered(fs_info, ffe_ctl,
+						    space_info, ins);
+	default:
+		BUG();
+	}
+}
+
 /*
  * walks the btree of allocated extents and find a hole of a given size.
  * The key ins is changed to record the hole:
@@ -3921,7 +3997,6 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	struct btrfs_block_group *block_group = NULL;
 	struct find_free_extent_ctl ffe_ctl = {0};
 	struct btrfs_space_info *space_info;
-	struct clustered_alloc_info *clustered = NULL;
 	bool full_search = false;
 
 	WARN_ON(num_bytes < fs_info->sectorsize);
@@ -3950,57 +4025,9 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		return -ENOSPC;
 	}
 
-	clustered = kzalloc(sizeof(*clustered), GFP_NOFS);
-	if (!clustered)
-		return -ENOMEM;
-	clustered->last_ptr = NULL;
-	clustered->use_cluster = true;
-	clustered->retry_clustered = false;
-	clustered->retry_unclustered = false;
-	ffe_ctl.alloc_info = clustered;
-
-	/*
-	 * If our free space is heavily fragmented we may not be able to make
-	 * big contiguous allocations, so instead of doing the expensive search
-	 * for free space, simply return ENOSPC with our max_extent_size so we
-	 * can go ahead and search for a more manageable chunk.
-	 *
-	 * If our max_extent_size is large enough for our allocation simply
-	 * disable clustering since we will likely not be able to find enough
-	 * space to create a cluster and induce latency trying.
-	 */
-	if (unlikely(space_info->max_extent_size)) {
-		spin_lock(&space_info->lock);
-		if (space_info->max_extent_size &&
-		    num_bytes > space_info->max_extent_size) {
-			ins->offset = space_info->max_extent_size;
-			spin_unlock(&space_info->lock);
-			return -ENOSPC;
-		} else if (space_info->max_extent_size) {
-			clustered->use_cluster = false;
-		}
-		spin_unlock(&space_info->lock);
-	}
-
-	clustered->last_ptr = fetch_cluster_info(fs_info, space_info,
-						 &clustered->empty_cluster);
-	if (clustered->last_ptr) {
-		struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
-
-		spin_lock(&last_ptr->lock);
-		if (last_ptr->block_group)
-			ffe_ctl.hint_byte = last_ptr->window_start;
-		if (last_ptr->fragmented) {
-			/*
-			 * We still set window_start so we can keep track of the
-			 * last place we found an allocation to try and save
-			 * some time.
-			 */
-			ffe_ctl.hint_byte = last_ptr->window_start;
-			clustered->use_cluster = false;
-		}
-		spin_unlock(&last_ptr->lock);
-	}
+	ret = prepare_allocation(fs_info, &ffe_ctl, space_info, ins);
+	if (ret < 0)
+		return ret;
 
 	ffe_ctl.search_start = max(ffe_ctl.search_start,
 				   first_logical_byte(fs_info, 0));
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/20] btrfs: change type of full_search to bool
  2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
@ 2020-02-06 11:26   ` Johannes Thumshirn
  2020-02-06 16:03   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 11:26 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/20] btrfs: introduce chunk allocation policy
  2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
@ 2020-02-06 11:30   ` Johannes Thumshirn
  2020-02-07  6:11     ` Naohiro Aota
  2020-02-06 16:06   ` Josef Bacik
  2020-02-06 16:07   ` Josef Bacik
  2 siblings, 1 reply; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 11:30 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 06/02/2020 11:44, Naohiro Aota wrote:
> This commit introduces chuk allocation policy for btrfs. 

Maybe "Introduce a per-device chink allocation policy for btrfs."

Code wise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation
  2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
                   ` (19 preceding siblings ...)
  2020-02-06 10:42 ` [PATCH 20/20] btrfs: factor out prepare_allocation() Naohiro Aota
@ 2020-02-06 11:43 ` Martin Steigerwald
  2020-02-07  6:06   ` Naohiro Aota
  20 siblings, 1 reply; 52+ messages in thread
From: Martin Steigerwald @ 2020-02-06 11:43 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

Hi Naohiro.

Naohiro Aota - 06.02.20, 11:41:54 CET:
> This series refactors chunk allocation, device_extent allocation and
> extent allocation functions and make them generalized to be able to
> implement other allocation policy easily.
> 
> On top of this series, we can simplify some part of the "btrfs: zoned
> block device support" series as adding a new type of chunk allocator
> and extent allocator for zoned block devices. Furthermore, we will be
> able to implement and test some other allocator in the idea page of
> the wiki e.g. SSD caching, dedicated metadata drive, chunk allocation
> groups, and so on.

Regarding SSD caching, are you aware that there has been previous work 
with even involved handling part of it in the Virtual Filesystem Switch 
(VFS)?

VFS hot-data tracking, LWN article:

https://lwn.net/Articles/525651/

Patchset, not sure whether it is the most recent one:

[PATCH v2 00/12] VFS hot tracking

https://lore.kernel.org/linux-btrfs/1368493184-5939-1-git-send-email-zwu.kernel@gmail.com/

So for SSD caching you may be able to re-use or pick up some of this 
work, unless it would be unsuitable to be used with this new approach.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/20] btrfs: refactor find_free_dev_extent_start()
  2020-02-06 10:41 ` [PATCH 03/20] btrfs: refactor find_free_dev_extent_start() Naohiro Aota
@ 2020-02-06 12:02   ` Johannes Thumshirn
  2020-02-07  6:22     ` Naohiro Aota
  2020-02-06 16:34   ` Josef Bacik
  1 sibling, 1 reply; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 12:02 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 06/02/2020 11:44, Naohiro Aota wrote:
> +/*

Nit: /**

> + * dev_extent_hole_check - check if specified hole is suitable for allocation
> + * @device:	the device which we have the hole
> + * @hole_start: starting position of the hole
> + * @hole_size:	the size of the hole
> + * @num_bytes:	the size of the free space that we need
> + *
> + * This function may modify @hole_start and @hole_end to reflect the
> + * suitable position for allocation. Returns 1 if hole position is
> + * updated, 0 otherwise.
> + */
> +static int dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
> +				 u64 *hole_size, u64 num_bytes)
> +{
> +	int ret = 0;
> +	u64 hole_end = *hole_start + *hole_size;
> +

Couldn't this be bool?

Thanks,
	Johannes

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/20] btrfs: introduce alloc_chunk_ctl
  2020-02-06 10:41 ` [PATCH 04/20] btrfs: introduce alloc_chunk_ctl Naohiro Aota
@ 2020-02-06 12:07   ` Johannes Thumshirn
  2020-02-06 16:38   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 12:07 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/20] btrfs: introduce clustered_alloc_info
  2020-02-06 10:42 ` [PATCH 12/20] btrfs: introduce clustered_alloc_info Naohiro Aota
@ 2020-02-06 12:44   ` Su Yue
  2020-02-07  9:25     ` Naohiro Aota
  2020-02-06 17:01   ` Josef Bacik
  1 sibling, 1 reply; 52+ messages in thread
From: Su Yue @ 2020-02-06 12:44 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Johannes Thumshirn, Hannes Reinecke, Anand Jain, linux-fsdevel

On 2020/2/6 6:42 PM, Naohiro Aota wrote:
> Introduce struct clustered_alloc_info to manage parameters related to
> clustered allocation. By separating clustered_alloc_info and
> find_free_extent_ctl, we can introduce other allocation policy. One can
> access per-allocation policy private information from "alloc_info" of
> struct find_free_extent_ctl.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++-----------------
>   1 file changed, 59 insertions(+), 40 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b1f52eee24fe..8124a6461043 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3456,9 +3456,6 @@ struct find_free_extent_ctl {
>   	/* Where to start the search inside the bg */
>   	u64 search_start;
>
> -	/* For clustered allocation */
> -	u64 empty_cluster;
> -
>   	bool have_caching_bg;
>   	bool orig_have_caching_bg;
>
> @@ -3470,18 +3467,6 @@ struct find_free_extent_ctl {
>   	 */
>   	int loop;
>
> -	/*
> -	 * Whether we're refilling a cluster, if true we need to re-search
> -	 * current block group but don't try to refill the cluster again.
> -	 */
> -	bool retry_clustered;
> -
> -	/*
> -	 * Whether we're updating free space cache, if true we need to re-search
> -	 * current block group but don't try updating free space cache again.
> -	 */
> -	bool retry_unclustered;
> -
>   	/* If current block group is cached */
>   	int cached;
>
> @@ -3499,8 +3484,28 @@ struct find_free_extent_ctl {
>
>   	/* Allocation policy */
>   	enum btrfs_extent_allocation_policy policy;
> +	void *alloc_info;
>   };
>
> +struct clustered_alloc_info {
> +	/* For clustered allocation */
> +	u64 empty_cluster;
> +
> +	/*
> +	 * Whether we're refilling a cluster, if true we need to re-search
> +	 * current block group but don't try to refill the cluster again.
> +	 */
> +	bool retry_clustered;
> +
> +	/*
> +	 * Whether we're updating free space cache, if true we need to re-search
> +	 * current block group but don't try updating free space cache again.
> +	 */
> +	bool retry_unclustered;
> +
> +	struct btrfs_free_cluster *last_ptr;
> +	bool use_cluster;
> +};
>
>   /*
>    * Helper function for find_free_extent().
> @@ -3516,6 +3521,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>   		struct btrfs_block_group **cluster_bg_ret)
>   {
>   	struct btrfs_block_group *cluster_bg;
> +	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>   	u64 aligned_cluster;
>   	u64 offset;
>   	int ret;
> @@ -3572,7 +3578,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>   	}
>
>   	aligned_cluster = max_t(u64,
> -			ffe_ctl->empty_cluster + ffe_ctl->empty_size,
> +			clustered->empty_cluster + ffe_ctl->empty_size,
>   			bg->full_stripe_len);
>   	ret = btrfs_find_space_cluster(bg, last_ptr, ffe_ctl->search_start,
>   			ffe_ctl->num_bytes, aligned_cluster);
> @@ -3591,12 +3597,12 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>   			return 0;
>   		}
>   	} else if (!ffe_ctl->cached && ffe_ctl->loop > LOOP_CACHING_NOWAIT &&
> -		   !ffe_ctl->retry_clustered) {
> +		   !clustered->retry_clustered) {
>   		spin_unlock(&last_ptr->refill_lock);
>
> -		ffe_ctl->retry_clustered = true;
> +		clustered->retry_clustered = true;
>   		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
> -				ffe_ctl->empty_cluster + ffe_ctl->empty_size);
> +				clustered->empty_cluster + ffe_ctl->empty_size);
>   		return -EAGAIN;
>   	}
>   	/*
> @@ -3618,6 +3624,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>   		struct btrfs_free_cluster *last_ptr,
>   		struct find_free_extent_ctl *ffe_ctl)
>   {
> +	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>   	u64 offset;
>
>   	/*
> @@ -3636,7 +3643,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>   		free_space_ctl = bg->free_space_ctl;
>   		spin_lock(&free_space_ctl->tree_lock);
>   		if (free_space_ctl->free_space <
> -		    ffe_ctl->num_bytes + ffe_ctl->empty_cluster +
> +		    ffe_ctl->num_bytes + clustered->empty_cluster +
>   		    ffe_ctl->empty_size) {
>   			ffe_ctl->total_free_space = max_t(u64,
>   					ffe_ctl->total_free_space,
> @@ -3660,11 +3667,11 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>   	 * If @retry_unclustered is true then we've already waited on this
>   	 * block group once and should move on to the next block group.
>   	 */
> -	if (!offset && !ffe_ctl->retry_unclustered && !ffe_ctl->cached &&
> +	if (!offset && !clustered->retry_unclustered && !ffe_ctl->cached &&
>   	    ffe_ctl->loop > LOOP_CACHING_NOWAIT) {
>   		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
>   						      ffe_ctl->empty_size);
> -		ffe_ctl->retry_unclustered = true;
> +		clustered->retry_unclustered = true;
>   		return -EAGAIN;
>   	} else if (!offset) {
>   		return 1;
> @@ -3685,6 +3692,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
>   					bool full_search, bool use_cluster)
>   {
>   	struct btrfs_root *root = fs_info->extent_root;
> +	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>   	int ret;
>
>   	if ((ffe_ctl->loop == LOOP_CACHING_NOWAIT) &&
> @@ -3774,10 +3782,10 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
>   			 * no empty_cluster.
>   			 */
>   			if (ffe_ctl->empty_size == 0 &&
> -			    ffe_ctl->empty_cluster == 0)
> +			    clustered->empty_cluster == 0)
>   				return -ENOSPC;
>   			ffe_ctl->empty_size = 0;
> -			ffe_ctl->empty_cluster = 0;
> +			clustered->empty_cluster = 0;
>   		}
>   		return 1;
>   	}
> @@ -3816,11 +3824,10 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   {
>   	int ret = 0;
>   	int cache_block_group_error = 0;
> -	struct btrfs_free_cluster *last_ptr = NULL;
>   	struct btrfs_block_group *block_group = NULL;
>   	struct find_free_extent_ctl ffe_ctl = {0};
>   	struct btrfs_space_info *space_info;
> -	bool use_cluster = true;
> +	struct clustered_alloc_info *clustered = NULL;
>   	bool full_search = false;
>
>   	WARN_ON(num_bytes < fs_info->sectorsize);
> @@ -3829,8 +3836,6 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   	ffe_ctl.empty_size = empty_size;
>   	ffe_ctl.flags = flags;
>   	ffe_ctl.search_start = 0;
> -	ffe_ctl.retry_clustered = false;
> -	ffe_ctl.retry_unclustered = false;
>   	ffe_ctl.delalloc = delalloc;
>   	ffe_ctl.index = btrfs_bg_flags_to_raid_index(flags);
>   	ffe_ctl.have_caching_bg = false;
> @@ -3851,6 +3856,15 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   		return -ENOSPC;
>   	}
>
> +	clustered = kzalloc(sizeof(*clustered), GFP_NOFS);
> +	if (!clustered)
> +		return -ENOMEM;

NIT of coding style, please pick the kzalloc after the whole assignment
zone.

> +	clustered->last_ptr = NULL;
> +	clustered->use_cluster = true;
> +	clustered->retry_clustered = false;
> +	clustered->retry_unclustered = false;
> +	ffe_ctl.alloc_info = clustered;
> +
>   	/*
>   	 * If our free space is heavily fragmented we may not be able to make
>   	 * big contiguous allocations, so instead of doing the expensive search
> @@ -3869,14 +3883,16 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   			spin_unlock(&space_info->lock);
>   			return -ENOSPC;
>   		} else if (space_info->max_extent_size) {
> -			use_cluster = false;
> +			clustered->use_cluster = false;
>   		}
>   		spin_unlock(&space_info->lock);
>   	}
>
> -	last_ptr = fetch_cluster_info(fs_info, space_info,
> -				      &ffe_ctl.empty_cluster);
> -	if (last_ptr) {
> +	clustered->last_ptr = fetch_cluster_info(fs_info, space_info,
> +						 &clustered->empty_cluster);
> +	if (clustered->last_ptr) {
> +		struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
> +
>   		spin_lock(&last_ptr->lock);
>   		if (last_ptr->block_group)
>   			ffe_ctl.hint_byte = last_ptr->window_start;
> @@ -3887,7 +3903,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   			 * some time.
>   			 */
>   			ffe_ctl.hint_byte = last_ptr->window_start;
> -			use_cluster = false;
> +			clustered->use_cluster = false;
>   		}
>   		spin_unlock(&last_ptr->lock);
>   	}
> @@ -4000,10 +4016,11 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   		 * Ok we want to try and use the cluster allocator, so
>   		 * lets look there
>   		 */
> -		if (last_ptr && use_cluster) {
> +		if (clustered->last_ptr && clustered->use_cluster) {
>   			struct btrfs_block_group *cluster_bg = NULL;
>
> -			ret = find_free_extent_clustered(block_group, last_ptr,
> +			ret = find_free_extent_clustered(block_group,
> +							 clustered->last_ptr,
>   							 &ffe_ctl, &cluster_bg);
>
>   			if (ret == 0) {
> @@ -4021,7 +4038,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   			/* ret == -ENOENT case falls through */
>   		}
>
> -		ret = find_free_extent_unclustered(block_group, last_ptr,
> +		ret = find_free_extent_unclustered(block_group,
> +						   clustered->last_ptr,
>   						   &ffe_ctl);
>   		if (ret == -EAGAIN)
>   			goto have_block_group;
> @@ -4062,8 +4080,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   		btrfs_release_block_group(block_group, delalloc);
>   		break;
>   loop:
> -		ffe_ctl.retry_clustered = false;
> -		ffe_ctl.retry_unclustered = false;
> +		clustered->retry_clustered = false;
> +		clustered->retry_unclustered = false;
>   		BUG_ON(btrfs_bg_flags_to_raid_index(block_group->flags) !=
>   		       ffe_ctl.index);
>   		btrfs_release_block_group(block_group, delalloc);
> @@ -4071,8 +4089,9 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>   	}
>   	up_read(&space_info->groups_sem);
>
> -	ret = find_free_extent_update_loop(fs_info, last_ptr, ins, &ffe_ctl,
> -					   full_search, use_cluster);
> +	ret = find_free_extent_update_loop(fs_info, clustered->last_ptr, ins,
> +					   &ffe_ctl, full_search,
> +					   clustered->use_cluster);
>   	if (ret > 0)
>   		goto search;
>
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/20] btrfs: factor out set_parameters()
  2020-02-06 10:41 ` [PATCH 05/20] btrfs: factor out set_parameters() Naohiro Aota
@ 2020-02-06 13:51   ` Johannes Thumshirn
  2020-02-06 16:40   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 13:51 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/20] btrfs: factor out gather_device_info()
  2020-02-06 10:42 ` [PATCH 06/20] btrfs: factor out gather_device_info() Naohiro Aota
@ 2020-02-06 15:43   ` Johannes Thumshirn
  2020-02-07  9:54     ` Naohiro Aota
  2020-02-06 16:44   ` Josef Bacik
  1 sibling, 1 reply; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 15:43 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 06/02/2020 11:44, Naohiro Aota wrote:
> +	BUG_ON(!alloc_profile_is_valid(type, 0));

I know this was present in the old code as well, but can we turn this 
into an ASSERT() + error return?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/20] btrfs: factor out decide_stripe_size()
  2020-02-06 10:42 ` [PATCH 07/20] btrfs: factor out decide_stripe_size() Naohiro Aota
@ 2020-02-06 15:59   ` Johannes Thumshirn
  2020-02-06 16:47   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Thumshirn @ 2020-02-06 15:59 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Josef Bacik, Nikolay Borisov, Damien Le Moal,
	Hannes Reinecke, Anand Jain, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/20] btrfs: change type of full_search to bool
  2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
  2020-02-06 11:26   ` Johannes Thumshirn
@ 2020-02-06 16:03   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:03 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> While the "full_search" variable defined in find_free_extent() is bool, but
> the full_search argument of find_free_extent_update_loop() is defined as
> int. Let's trivially fix the argument type.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/20] btrfs: introduce chunk allocation policy
  2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
  2020-02-06 11:30   ` Johannes Thumshirn
@ 2020-02-06 16:06   ` Josef Bacik
  2020-02-06 16:07   ` Josef Bacik
  2 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:06 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> This commit introduces chuk allocation policy for btrfs. This policy
> controls how btrfs allocate a chunk and device extents from devices.
> 
> There is no functional change introduced with this commit.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

I'd rather see this right before it's actually used, so I can decide if it's the 
correct place/approach for this.  Maybe once I'm through the whole series it'll 
make sense, but right now I can't tell where it gets used and thus can't really 
say if it's ok or not.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/20] btrfs: introduce chunk allocation policy
  2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
  2020-02-06 11:30   ` Johannes Thumshirn
  2020-02-06 16:06   ` Josef Bacik
@ 2020-02-06 16:07   ` Josef Bacik
  2 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:07 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> This commit introduces chuk allocation policy for btrfs. This policy
                           ^^
                         chunk

> controls how btrfs allocate a chunk and device extents from devices.
> 
> There is no functional change introduced with this commit.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Of course then I notice my mail client re-ordered your emails and it was 
actually in the next patch.  You can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to this.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/20] btrfs: refactor find_free_dev_extent_start()
  2020-02-06 10:41 ` [PATCH 03/20] btrfs: refactor find_free_dev_extent_start() Naohiro Aota
  2020-02-06 12:02   ` Johannes Thumshirn
@ 2020-02-06 16:34   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:34 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> Factor out two functions from find_free_dev_extent_start().
> dev_extent_search_start() decides the starting position of the search.
> dev_extent_hole_check() checks if a hole found is suitable for device
> extent allocation.
> 
> These functions also have the switch-cases to change the allocation
> behavior depending on the policy.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/20] btrfs: introduce alloc_chunk_ctl
  2020-02-06 10:41 ` [PATCH 04/20] btrfs: introduce alloc_chunk_ctl Naohiro Aota
  2020-02-06 12:07   ` Johannes Thumshirn
@ 2020-02-06 16:38   ` Josef Bacik
  2020-02-07  7:08     ` Naohiro Aota
  1 sibling, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:38 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> Introduce "struct alloc_chunk_ctl" to wrap needed parameters for the chunk
> allocation.  This will be used to split __btrfs_alloc_chunk() into smaller
> functions.
> 
> This commit folds a number of local variables in __btrfs_alloc_chunk() into
> one "struct alloc_chunk_ctl ctl". There is no functional change.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/volumes.c | 143 +++++++++++++++++++++++++--------------------
>   1 file changed, 81 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9bb673df777a..cfde302bf297 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4818,6 +4818,29 @@ static void check_raid1c34_incompat_flag(struct btrfs_fs_info *info, u64 type)
>   	btrfs_set_fs_incompat(info, RAID1C34);
>   }
>   
> +/*
> + * Structure used internally for __btrfs_alloc_chunk() function.
> + * Wraps needed parameters.
> + */
> +struct alloc_chunk_ctl {
> +	u64 start;
> +	u64 type;
> +	int num_stripes;	/* total number of stripes to allocate */
> +	int sub_stripes;	/* sub_stripes info for map */
> +	int dev_stripes;	/* stripes per dev */
> +	int devs_max;		/* max devs to use */
> +	int devs_min;		/* min devs needed */
> +	int devs_increment;	/* ndevs has to be a multiple of this */
> +	int ncopies;		/* how many copies to data has */
> +	int nparity;		/* number of stripes worth of bytes to
> +				   store parity information */
> +	u64 max_stripe_size;
> +	u64 max_chunk_size;
> +	u64 stripe_size;
> +	u64 chunk_size;
> +	int ndevs;
> +};
> +
>   static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   			       u64 start, u64 type)
>   {
> @@ -4828,23 +4851,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	struct extent_map_tree *em_tree;
>   	struct extent_map *em;
>   	struct btrfs_device_info *devices_info = NULL;
> +	struct alloc_chunk_ctl ctl;
>   	u64 total_avail;
> -	int num_stripes;	/* total number of stripes to allocate */
>   	int data_stripes;	/* number of stripes that count for
>   				   block group size */
> -	int sub_stripes;	/* sub_stripes info for map */
> -	int dev_stripes;	/* stripes per dev */
> -	int devs_max;		/* max devs to use */
> -	int devs_min;		/* min devs needed */
> -	int devs_increment;	/* ndevs has to be a multiple of this */
> -	int ncopies;		/* how many copies to data has */
> -	int nparity;		/* number of stripes worth of bytes to
> -				   store parity information */
>   	int ret;
> -	u64 max_stripe_size;
> -	u64 max_chunk_size;
> -	u64 stripe_size;
> -	u64 chunk_size;
>   	int ndevs;
>   	int i;
>   	int j;
> @@ -4858,32 +4869,36 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   		return -ENOSPC;
>   	}
>   
> +	ctl.start = start;
> +	ctl.type = type;
> +
>   	index = btrfs_bg_flags_to_raid_index(type);
>   
> -	sub_stripes = btrfs_raid_array[index].sub_stripes;
> -	dev_stripes = btrfs_raid_array[index].dev_stripes;
> -	devs_max = btrfs_raid_array[index].devs_max;
> -	if (!devs_max)
> -		devs_max = BTRFS_MAX_DEVS(info);
> -	devs_min = btrfs_raid_array[index].devs_min;
> -	devs_increment = btrfs_raid_array[index].devs_increment;
> -	ncopies = btrfs_raid_array[index].ncopies;
> -	nparity = btrfs_raid_array[index].nparity;
> +	ctl.sub_stripes = btrfs_raid_array[index].sub_stripes;
> +	ctl.dev_stripes = btrfs_raid_array[index].dev_stripes;
> +	ctl.devs_max = btrfs_raid_array[index].devs_max;
> +	if (!ctl.devs_max)
> +		ctl.devs_max = BTRFS_MAX_DEVS(info);
> +	ctl.devs_min = btrfs_raid_array[index].devs_min;
> +	ctl.devs_increment = btrfs_raid_array[index].devs_increment;
> +	ctl.ncopies = btrfs_raid_array[index].ncopies;
> +	ctl.nparity = btrfs_raid_array[index].nparity;
>   
>   	if (type & BTRFS_BLOCK_GROUP_DATA) {
> -		max_stripe_size = SZ_1G;
> -		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
> +		ctl.max_stripe_size = SZ_1G;
> +		ctl.max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>   	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
>   		/* for larger filesystems, use larger metadata chunks */
>   		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> -			max_stripe_size = SZ_1G;
> +			ctl.max_stripe_size = SZ_1G;
>   		else
> -			max_stripe_size = SZ_256M;
> -		max_chunk_size = max_stripe_size;
> +			ctl.max_stripe_size = SZ_256M;
> +		ctl.max_chunk_size = ctl.max_stripe_size;
>   	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> -		max_stripe_size = SZ_32M;
> -		max_chunk_size = 2 * max_stripe_size;
> -		devs_max = min_t(int, devs_max, BTRFS_MAX_DEVS_SYS_CHUNK);
> +		ctl.max_stripe_size = SZ_32M;
> +		ctl.max_chunk_size = 2 * ctl.max_stripe_size;
> +		ctl.devs_max = min_t(int, ctl.devs_max,
> +				      BTRFS_MAX_DEVS_SYS_CHUNK);
>   	} else {
>   		btrfs_err(info, "invalid chunk type 0x%llx requested",
>   		       type);
> @@ -4891,8 +4906,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	}
>   
>   	/* We don't want a chunk larger than 10% of writable space */
> -	max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
> -			     max_chunk_size);
> +	ctl.max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
> +				  ctl.max_chunk_size);
>   
>   	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
>   			       GFP_NOFS);
> @@ -4929,20 +4944,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   			continue;
>   
>   		ret = find_free_dev_extent(device,
> -					   max_stripe_size * dev_stripes,
> +				ctl.max_stripe_size * ctl.dev_stripes,
>   					   &dev_offset, &max_avail);

If you are going to adjust the indentation of arguments, you need to adjust them 
all.

>   		if (ret && ret != -ENOSPC)
>   			goto error;
>   
>   		if (ret == 0)
> -			max_avail = max_stripe_size * dev_stripes;
> +			max_avail = ctl.max_stripe_size * ctl.dev_stripes;
>   
> -		if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
> +		if (max_avail < BTRFS_STRIPE_LEN * ctl.dev_stripes) {
>   			if (btrfs_test_opt(info, ENOSPC_DEBUG))
>   				btrfs_debug(info,
>   			"%s: devid %llu has no free space, have=%llu want=%u",
>   					    __func__, device->devid, max_avail,
> -					    BTRFS_STRIPE_LEN * dev_stripes);
> +				BTRFS_STRIPE_LEN * ctl.dev_stripes);

Same here.

>   			continue;
>   		}
>   
> @@ -4957,30 +4972,31 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   		devices_info[ndevs].dev = device;
>   		++ndevs;
>   	}
> +	ctl.ndevs = ndevs;
>   
>   	/*
>   	 * now sort the devices by hole size / available space
>   	 */
> -	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> +	sort(devices_info, ctl.ndevs, sizeof(struct btrfs_device_info),
>   	     btrfs_cmp_device_info, NULL);
>   
>   	/*
>   	 * Round down to number of usable stripes, devs_increment can be any
>   	 * number so we can't use round_down()
>   	 */
> -	ndevs -= ndevs % devs_increment;
> +	ctl.ndevs -= ctl.ndevs % ctl.devs_increment;
>   
> -	if (ndevs < devs_min) {
> +	if (ctl.ndevs < ctl.devs_min) {
>   		ret = -ENOSPC;
>   		if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
>   			btrfs_debug(info,
>   	"%s: not enough devices with free space: have=%d minimum required=%d",
> -				    __func__, ndevs, devs_min);
> +				    __func__, ctl.ndevs, ctl.devs_min);
>   		}
>   		goto error;
>   	}
>   
> -	ndevs = min(ndevs, devs_max);
> +	ctl.ndevs = min(ctl.ndevs, ctl.devs_max);
>   
>   	/*
>   	 * The primary goal is to maximize the number of stripes, so use as
> @@ -4989,14 +5005,15 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	 * The DUP profile stores more than one stripe per device, the
>   	 * max_avail is the total size so we have to adjust.
>   	 */
> -	stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
> -	num_stripes = ndevs * dev_stripes;
> +	ctl.stripe_size = div_u64(devices_info[ctl.ndevs - 1].max_avail,
> +				   ctl.dev_stripes);
> +	ctl.num_stripes = ctl.ndevs * ctl.dev_stripes;
>   
>   	/*
>   	 * this will have to be fixed for RAID1 and RAID10 over
>   	 * more drives
>   	 */
> -	data_stripes = (num_stripes - nparity) / ncopies;
> +	data_stripes = (ctl.num_stripes - ctl.nparity) / ctl.ncopies;
>   
>   	/*
>   	 * Use the number of data stripes to figure out how big this chunk
> @@ -5004,44 +5021,44 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>   	 * and compare that answer with the max chunk size. If it's higher,
>   	 * we try to reduce stripe_size.
>   	 */
> -	if (stripe_size * data_stripes > max_chunk_size) {
> +	if (ctl.stripe_size * data_stripes > ctl.max_chunk_size) {
>   		/*
>   		 * Reduce stripe_size, round it up to a 16MB boundary again and
>   		 * then use it, unless it ends up being even bigger than the
>   		 * previous value we had already.
>   		 */
> -		stripe_size = min(round_up(div_u64(max_chunk_size,
> +		ctl.stripe_size = min(round_up(div_u64(ctl.max_chunk_size,
>   						   data_stripes), SZ_16M),
> -				  stripe_size);
> +				       ctl.stripe_size);

And here.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/20] btrfs: factor out set_parameters()
  2020-02-06 10:41 ` [PATCH 05/20] btrfs: factor out set_parameters() Naohiro Aota
  2020-02-06 13:51   ` Johannes Thumshirn
@ 2020-02-06 16:40   ` Josef Bacik
  2020-02-07  7:59     ` Naohiro Aota
  1 sibling, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:40 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:41 AM, Naohiro Aota wrote:
> Factor out set_parameters() from __btrfs_alloc_chunk(). This function
> initialises parameters of "struct alloc_chunk_ctl" for allocation.
> set_parameters() handles a common part of the initialisation to load the
> RAID parameters from btrfs_raid_array. set_parameters_regular() decides
> some parameters for its allocation.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/volumes.c | 96 ++++++++++++++++++++++++++++------------------
>   1 file changed, 59 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index cfde302bf297..a5d6f0b5ca70 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4841,6 +4841,60 @@ struct alloc_chunk_ctl {
>   	int ndevs;
>   };
>   
> +static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
> +				   struct alloc_chunk_ctl *ctl)

init_alloc_chunk_ctl_policy_regular()

> +{
> +	u64 type = ctl->type;
> +
> +	if (type & BTRFS_BLOCK_GROUP_DATA) {
> +		ctl->max_stripe_size = SZ_1G;
> +		ctl->max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
> +	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> +		/* for larger filesystems, use larger metadata chunks */
> +		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> +			ctl->max_stripe_size = SZ_1G;
> +		else
> +			ctl->max_stripe_size = SZ_256M;
> +		ctl->max_chunk_size = ctl->max_stripe_size;
> +	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> +		ctl->max_stripe_size = SZ_32M;
> +		ctl->max_chunk_size = 2 * ctl->max_stripe_size;
> +		ctl->devs_max = min_t(int, ctl->devs_max,
> +				      BTRFS_MAX_DEVS_SYS_CHUNK);
> +	} else {
> +		BUG();
> +	}
> +
> +	/* We don't want a chunk larger than 10% of writable space */
> +	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
> +				  ctl->max_chunk_size);
> +}
> +
> +static void set_parameters(struct btrfs_fs_devices *fs_devices,
> +			   struct alloc_chunk_ctl *ctl)

init_alloc_chunk_ctl().  These function names need to be more descriptive.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/20] btrfs: factor out gather_device_info()
  2020-02-06 10:42 ` [PATCH 06/20] btrfs: factor out gather_device_info() Naohiro Aota
  2020-02-06 15:43   ` Johannes Thumshirn
@ 2020-02-06 16:44   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:44 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:42 AM, Naohiro Aota wrote:
> Factor out gather_device_info() from __btrfs_alloc_chunk(). This function
> iterates over devices list and gather information about devices. This
> commit also introduces "max_avail" and "dev_extent_min" to fold the same
> calculation to one variable.
> 
> This commit has no functional changes.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/20] btrfs: factor out decide_stripe_size()
  2020-02-06 10:42 ` [PATCH 07/20] btrfs: factor out decide_stripe_size() Naohiro Aota
  2020-02-06 15:59   ` Johannes Thumshirn
@ 2020-02-06 16:47   ` Josef Bacik
  1 sibling, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:47 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:42 AM, Naohiro Aota wrote:
> Factor out decide_stripe_size() from __btrfs_alloc_chunk(). This function
> calculates the actual stripe size to allocate. decide_stripe_size() handles
> the common case to round down the 'ndevs' to 'devs_increment' and check the
> upper and lower limitation of 'ndevs'. decide_stripe_size_regular() decides
> the size of a stripe and the size of a chunk. The policy is to maximize the
> number of stripes.
> 
> This commit has no functional changes.
> 
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/20] btrfs: factor out create_chunk()
  2020-02-06 10:42 ` [PATCH 08/20] btrfs: factor out create_chunk() Naohiro Aota
@ 2020-02-06 16:49   ` Josef Bacik
  2020-02-07  9:17     ` Naohiro Aota
  0 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:49 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:42 AM, Naohiro Aota wrote:
> Factor out create_chunk() from __btrfs_alloc_chunk(). This function finally
> creates a chunk. There is no functional changes.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

<snip>
> +
> +	ctl.start = start;
> +	ctl.type = type;
> +	set_parameters(fs_devices, &ctl);
> +
> +	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
> +			       GFP_NOFS);
> +	if (!devices_info)
> +		return -ENOMEM;
> +
> +	ret = gather_device_info(fs_devices, &ctl, devices_info);
> +	if (ret < 0)
> +		goto error;
> +
> +	ret = decide_stripe_size(fs_devices, &ctl, devices_info);
> +	if (ret < 0)
> +		goto error;
> +
> +	ret = create_chunk(trans, &ctl, devices_info);
> +	if (ret < 0)
> +		goto error;
> +

This can just be

out:
	kfree(devcies_info);
	return ret;

and all the above people can just do goto out on error and we can drop the 
error: part below.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/20] btrfs: parameterize dev_extent_min
  2020-02-06 10:42 ` [PATCH 09/20] btrfs: parameterize dev_extent_min Naohiro Aota
@ 2020-02-06 16:52   ` Josef Bacik
  2020-02-07  9:00     ` Naohiro Aota
  0 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 16:52 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:42 AM, Naohiro Aota wrote:
> Currently, we ignore a device whose available space is less than
> "BTRFS_STRIPE_LEN * dev_stripes". This is a lower limit for current
> allocation policy (to maximize the number of stripes). This commit
> parameterizes dev_extent_min, so that other policies can set their own
> lower limitation to ignore a device with an insufficient space.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/volumes.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 15837374db9c..4a6cc098ee3e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4836,6 +4836,7 @@ struct alloc_chunk_ctl {
>   				   store parity information */
>   	u64 max_stripe_size;
>   	u64 max_chunk_size;
> +	u64 dev_extent_min;
>   	u64 stripe_size;
>   	u64 chunk_size;
>   	int ndevs;
> @@ -4868,6 +4869,7 @@ static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
>   	/* We don't want a chunk larger than 10% of writable space */
>   	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>   				  ctl->max_chunk_size);
> +	ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
>   }
>   
>   static void set_parameters(struct btrfs_fs_devices *fs_devices,
> @@ -4903,7 +4905,6 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   	struct btrfs_device *device;
>   	u64 total_avail;
>   	u64 dev_extent_want = ctl->max_stripe_size * ctl->dev_stripes;
> -	u64 dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
>   	int ret;
>   	int ndevs = 0;
>   	u64 max_avail;
> @@ -4931,7 +4932,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   			total_avail = 0;
>   
>   		/* If there is no space on this device, skip it. */
> -		if (total_avail == 0)
> +		if (total_avail < ctl->dev_extent_min)

This isn't correct, dev_extent_min is the total size with all stripes added up, 
not the size of a single stripe.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/20] btrfs: introduce clustered_alloc_info
  2020-02-06 10:42 ` [PATCH 12/20] btrfs: introduce clustered_alloc_info Naohiro Aota
  2020-02-06 12:44   ` Su Yue
@ 2020-02-06 17:01   ` Josef Bacik
  2020-02-07  9:53     ` Naohiro Aota
  1 sibling, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2020-02-06 17:01 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, David Sterba
  Cc: Chris Mason, Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On 2/6/20 5:42 AM, Naohiro Aota wrote:
> Introduce struct clustered_alloc_info to manage parameters related to
> clustered allocation. By separating clustered_alloc_info and
> find_free_extent_ctl, we can introduce other allocation policy. One can
> access per-allocation policy private information from "alloc_info" of
> struct find_free_extent_ctl.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++-----------------
>   1 file changed, 59 insertions(+), 40 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b1f52eee24fe..8124a6461043 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3456,9 +3456,6 @@ struct find_free_extent_ctl {
>   	/* Where to start the search inside the bg */
>   	u64 search_start;
>   
> -	/* For clustered allocation */
> -	u64 empty_cluster;
> -
>   	bool have_caching_bg;
>   	bool orig_have_caching_bg;
>   
> @@ -3470,18 +3467,6 @@ struct find_free_extent_ctl {
>   	 */
>   	int loop;
>   
> -	/*
> -	 * Whether we're refilling a cluster, if true we need to re-search
> -	 * current block group but don't try to refill the cluster again.
> -	 */
> -	bool retry_clustered;
> -
> -	/*
> -	 * Whether we're updating free space cache, if true we need to re-search
> -	 * current block group but don't try updating free space cache again.
> -	 */
> -	bool retry_unclustered;
> -
>   	/* If current block group is cached */
>   	int cached;
>   
> @@ -3499,8 +3484,28 @@ struct find_free_extent_ctl {
>   
>   	/* Allocation policy */
>   	enum btrfs_extent_allocation_policy policy;
> +	void *alloc_info;
>   };
>   
> +struct clustered_alloc_info {
> +	/* For clustered allocation */
> +	u64 empty_cluster;
> +
> +	/*
> +	 * Whether we're refilling a cluster, if true we need to re-search
> +	 * current block group but don't try to refill the cluster again.
> +	 */
> +	bool retry_clustered;
> +
> +	/*
> +	 * Whether we're updating free space cache, if true we need to re-search
> +	 * current block group but don't try updating free space cache again.
> +	 */
> +	bool retry_unclustered;
> +
> +	struct btrfs_free_cluster *last_ptr;
> +	bool use_cluster;
This isn't the right place for this, rather I'd put it in the 
find_free_extent_ctl if you want it at all.

And in fact I question the whole need for this in the first place.  I assume 
your goal is to just disable clustered allocation for shingle drives, so why 
don't you just handle that with your extent allocation policy flag?  If it's set 
to shingled then use_cluster = false and you are good to go, no need to add all 
this complication of the cluster ctl.

If you are looking to save space in the ctl, then I would just union {} the 
cluster stuff inside of the find_free_extent_ctl so the right flags are used for 
the correction allocation policy.

This whole last set of 10 patches needs to be reworked.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation
  2020-02-06 11:43 ` [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Martin Steigerwald
@ 2020-02-07  6:06   ` Naohiro Aota
  2020-02-07  8:02     ` Martin Steigerwald
  0 siblings, 1 reply; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  6:06 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On Thu, Feb 06, 2020 at 12:43:30PM +0100, Martin Steigerwald wrote:
>Hi Naohiro.
>
>Naohiro Aota - 06.02.20, 11:41:54 CET:
>> This series refactors chunk allocation, device_extent allocation and
>> extent allocation functions and make them generalized to be able to
>> implement other allocation policy easily.
>>
>> On top of this series, we can simplify some part of the "btrfs: zoned
>> block device support" series as adding a new type of chunk allocator
>> and extent allocator for zoned block devices. Furthermore, we will be
>> able to implement and test some other allocator in the idea page of
>> the wiki e.g. SSD caching, dedicated metadata drive, chunk allocation
>> groups, and so on.
>
>Regarding SSD caching, are you aware that there has been previous work
>with even involved handling part of it in the Virtual Filesystem Switch
>(VFS)?
>
>VFS hot-data tracking, LWN article:
>
>https://lwn.net/Articles/525651/
>
>Patchset, not sure whether it is the most recent one:
>
>[PATCH v2 00/12] VFS hot tracking
>
>https://lore.kernel.org/linux-btrfs/1368493184-5939-1-git-send-email-zwu.kernel@gmail.com/

Yes, I once saw the patchset. Not sure about the detail, though.

>So for SSD caching you may be able to re-use or pick up some of this
>work, unless it would be unsuitable to be used with this new approach.

Currently, I have no plan to implement the SSD caching feature by
myself. I think some patches of the series like this [1] can be
reworked on my series as adding an "SSD caching chunk allocator." So,
it's welcome to hear suggestions about the hook interface.

[1] https://lore.kernel.org/linux-btrfs/1371817260-8615-3-git-send-email-zwu.kernel@gmail.com/

>Thanks,
>-- 
>Martin
>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/20] btrfs: introduce chunk allocation policy
  2020-02-06 11:30   ` Johannes Thumshirn
@ 2020-02-07  6:11     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  6:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 11:30:06AM +0000, Johannes Thumshirn wrote:
>On 06/02/2020 11:44, Naohiro Aota wrote:
>> This commit introduces chuk allocation policy for btrfs.
>
>Maybe "Introduce a per-device chink allocation policy for btrfs."

What do you mean with "per-device"? Might be misunderstanding? One
chunk allocation policy is set to one btrfs file system. There is no
per-device policy for now.

# yep, I found my typo and fixed it: "chuk" -> "chunk"

>Code wise,
>Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/20] btrfs: refactor find_free_dev_extent_start()
  2020-02-06 12:02   ` Johannes Thumshirn
@ 2020-02-07  6:22     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  6:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 12:02:12PM +0000, Johannes Thumshirn wrote:
>On 06/02/2020 11:44, Naohiro Aota wrote:
>> +/*
>
>Nit: /**

Fixed.

>> + * dev_extent_hole_check - check if specified hole is suitable for allocation
>> + * @device:	the device which we have the hole
>> + * @hole_start: starting position of the hole
>> + * @hole_size:	the size of the hole
>> + * @num_bytes:	the size of the free space that we need
>> + *
>> + * This function may modify @hole_start and @hole_end to reflect the
>> + * suitable position for allocation. Returns 1 if hole position is
>> + * updated, 0 otherwise.
>> + */
>> +static int dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
>> +				 u64 *hole_size, u64 num_bytes)
>> +{
>> +	int ret = 0;
>> +	u64 hole_end = *hole_start + *hole_size;
>> +
>
>Couldn't this be bool?

Good point. I changed it to bool and also renamed "ret" to "changed"
to make it clear.

Thanks,

>Thanks,
>	Johannes
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/20] btrfs: introduce alloc_chunk_ctl
  2020-02-06 16:38   ` Josef Bacik
@ 2020-02-07  7:08     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  7:08 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, David Sterba, Chris Mason, Nikolay Borisov,
	Damien Le Moal, Johannes Thumshirn, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 11:38:14AM -0500, Josef Bacik wrote:
>On 2/6/20 5:41 AM, Naohiro Aota wrote:
>>Introduce "struct alloc_chunk_ctl" to wrap needed parameters for the chunk
>>allocation.  This will be used to split __btrfs_alloc_chunk() into smaller
>>functions.
>>
>>This commit folds a number of local variables in __btrfs_alloc_chunk() into
>>one "struct alloc_chunk_ctl ctl". There is no functional change.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>---
>>  fs/btrfs/volumes.c | 143 +++++++++++++++++++++++++--------------------
>>  1 file changed, 81 insertions(+), 62 deletions(-)
>>
>>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>index 9bb673df777a..cfde302bf297 100644
>>--- a/fs/btrfs/volumes.c
>>+++ b/fs/btrfs/volumes.c
>>@@ -4818,6 +4818,29 @@ static void check_raid1c34_incompat_flag(struct btrfs_fs_info *info, u64 type)
>>  	btrfs_set_fs_incompat(info, RAID1C34);
>>  }
>>+/*
>>+ * Structure used internally for __btrfs_alloc_chunk() function.
>>+ * Wraps needed parameters.
>>+ */
>>+struct alloc_chunk_ctl {
>>+	u64 start;
>>+	u64 type;
>>+	int num_stripes;	/* total number of stripes to allocate */
>>+	int sub_stripes;	/* sub_stripes info for map */
>>+	int dev_stripes;	/* stripes per dev */
>>+	int devs_max;		/* max devs to use */
>>+	int devs_min;		/* min devs needed */
>>+	int devs_increment;	/* ndevs has to be a multiple of this */
>>+	int ncopies;		/* how many copies to data has */
>>+	int nparity;		/* number of stripes worth of bytes to
>>+				   store parity information */
>>+	u64 max_stripe_size;
>>+	u64 max_chunk_size;
>>+	u64 stripe_size;
>>+	u64 chunk_size;
>>+	int ndevs;
>>+};
>>+
>>  static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  			       u64 start, u64 type)
>>  {
>>@@ -4828,23 +4851,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	struct extent_map_tree *em_tree;
>>  	struct extent_map *em;
>>  	struct btrfs_device_info *devices_info = NULL;
>>+	struct alloc_chunk_ctl ctl;
>>  	u64 total_avail;
>>-	int num_stripes;	/* total number of stripes to allocate */
>>  	int data_stripes;	/* number of stripes that count for
>>  				   block group size */
>>-	int sub_stripes;	/* sub_stripes info for map */
>>-	int dev_stripes;	/* stripes per dev */
>>-	int devs_max;		/* max devs to use */
>>-	int devs_min;		/* min devs needed */
>>-	int devs_increment;	/* ndevs has to be a multiple of this */
>>-	int ncopies;		/* how many copies to data has */
>>-	int nparity;		/* number of stripes worth of bytes to
>>-				   store parity information */
>>  	int ret;
>>-	u64 max_stripe_size;
>>-	u64 max_chunk_size;
>>-	u64 stripe_size;
>>-	u64 chunk_size;
>>  	int ndevs;
>>  	int i;
>>  	int j;
>>@@ -4858,32 +4869,36 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  		return -ENOSPC;
>>  	}
>>+	ctl.start = start;
>>+	ctl.type = type;
>>+
>>  	index = btrfs_bg_flags_to_raid_index(type);
>>-	sub_stripes = btrfs_raid_array[index].sub_stripes;
>>-	dev_stripes = btrfs_raid_array[index].dev_stripes;
>>-	devs_max = btrfs_raid_array[index].devs_max;
>>-	if (!devs_max)
>>-		devs_max = BTRFS_MAX_DEVS(info);
>>-	devs_min = btrfs_raid_array[index].devs_min;
>>-	devs_increment = btrfs_raid_array[index].devs_increment;
>>-	ncopies = btrfs_raid_array[index].ncopies;
>>-	nparity = btrfs_raid_array[index].nparity;
>>+	ctl.sub_stripes = btrfs_raid_array[index].sub_stripes;
>>+	ctl.dev_stripes = btrfs_raid_array[index].dev_stripes;
>>+	ctl.devs_max = btrfs_raid_array[index].devs_max;
>>+	if (!ctl.devs_max)
>>+		ctl.devs_max = BTRFS_MAX_DEVS(info);
>>+	ctl.devs_min = btrfs_raid_array[index].devs_min;
>>+	ctl.devs_increment = btrfs_raid_array[index].devs_increment;
>>+	ctl.ncopies = btrfs_raid_array[index].ncopies;
>>+	ctl.nparity = btrfs_raid_array[index].nparity;
>>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>>-		max_stripe_size = SZ_1G;
>>-		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>>+		ctl.max_stripe_size = SZ_1G;
>>+		ctl.max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>>  	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
>>  		/* for larger filesystems, use larger metadata chunks */
>>  		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
>>-			max_stripe_size = SZ_1G;
>>+			ctl.max_stripe_size = SZ_1G;
>>  		else
>>-			max_stripe_size = SZ_256M;
>>-		max_chunk_size = max_stripe_size;
>>+			ctl.max_stripe_size = SZ_256M;
>>+		ctl.max_chunk_size = ctl.max_stripe_size;
>>  	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>>-		max_stripe_size = SZ_32M;
>>-		max_chunk_size = 2 * max_stripe_size;
>>-		devs_max = min_t(int, devs_max, BTRFS_MAX_DEVS_SYS_CHUNK);
>>+		ctl.max_stripe_size = SZ_32M;
>>+		ctl.max_chunk_size = 2 * ctl.max_stripe_size;
>>+		ctl.devs_max = min_t(int, ctl.devs_max,
>>+				      BTRFS_MAX_DEVS_SYS_CHUNK);
>>  	} else {
>>  		btrfs_err(info, "invalid chunk type 0x%llx requested",
>>  		       type);
>>@@ -4891,8 +4906,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	}
>>  	/* We don't want a chunk larger than 10% of writable space */
>>-	max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>>-			     max_chunk_size);
>>+	ctl.max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>>+				  ctl.max_chunk_size);
>>  	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
>>  			       GFP_NOFS);
>>@@ -4929,20 +4944,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  			continue;
>>  		ret = find_free_dev_extent(device,
>>-					   max_stripe_size * dev_stripes,
>>+				ctl.max_stripe_size * ctl.dev_stripes,
>>  					   &dev_offset, &max_avail);
>
>If you are going to adjust the indentation of arguments, you need to 
>adjust them all.
>

So, the below would be fine here, right?

		ret = find_free_dev_extent(
			device, ctl.max_stripe_size * ctl.dev_stripes,
			&dev_offset, &max_avail);


>>  		if (ret && ret != -ENOSPC)
>>  			goto error;
>>  		if (ret == 0)
>>-			max_avail = max_stripe_size * dev_stripes;
>>+			max_avail = ctl.max_stripe_size * ctl.dev_stripes;
>>-		if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
>>+		if (max_avail < BTRFS_STRIPE_LEN * ctl.dev_stripes) {
>>  			if (btrfs_test_opt(info, ENOSPC_DEBUG))
>>  				btrfs_debug(info,
>>  			"%s: devid %llu has no free space, have=%llu want=%u",
>>  					    __func__, device->devid, max_avail,
>>-					    BTRFS_STRIPE_LEN * dev_stripes);
>>+				BTRFS_STRIPE_LEN * ctl.dev_stripes);
>
>Same here.

Actually, the line fit in just 80 characters, so I removed the indent.

>>  			continue;
>>  		}
>>@@ -4957,30 +4972,31 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  		devices_info[ndevs].dev = device;
>>  		++ndevs;
>>  	}
>>+	ctl.ndevs = ndevs;
>>  	/*
>>  	 * now sort the devices by hole size / available space
>>  	 */
>>-	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>>+	sort(devices_info, ctl.ndevs, sizeof(struct btrfs_device_info),
>>  	     btrfs_cmp_device_info, NULL);
>>  	/*
>>  	 * Round down to number of usable stripes, devs_increment can be any
>>  	 * number so we can't use round_down()
>>  	 */
>>-	ndevs -= ndevs % devs_increment;
>>+	ctl.ndevs -= ctl.ndevs % ctl.devs_increment;
>>-	if (ndevs < devs_min) {
>>+	if (ctl.ndevs < ctl.devs_min) {
>>  		ret = -ENOSPC;
>>  		if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
>>  			btrfs_debug(info,
>>  	"%s: not enough devices with free space: have=%d minimum required=%d",
>>-				    __func__, ndevs, devs_min);
>>+				    __func__, ctl.ndevs, ctl.devs_min);
>>  		}
>>  		goto error;
>>  	}
>>-	ndevs = min(ndevs, devs_max);
>>+	ctl.ndevs = min(ctl.ndevs, ctl.devs_max);
>>  	/*
>>  	 * The primary goal is to maximize the number of stripes, so use as
>>@@ -4989,14 +5005,15 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	 * The DUP profile stores more than one stripe per device, the
>>  	 * max_avail is the total size so we have to adjust.
>>  	 */
>>-	stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
>>-	num_stripes = ndevs * dev_stripes;
>>+	ctl.stripe_size = div_u64(devices_info[ctl.ndevs - 1].max_avail,
>>+				   ctl.dev_stripes);
>>+	ctl.num_stripes = ctl.ndevs * ctl.dev_stripes;
>>  	/*
>>  	 * this will have to be fixed for RAID1 and RAID10 over
>>  	 * more drives
>>  	 */
>>-	data_stripes = (num_stripes - nparity) / ncopies;
>>+	data_stripes = (ctl.num_stripes - ctl.nparity) / ctl.ncopies;
>>  	/*
>>  	 * Use the number of data stripes to figure out how big this chunk
>>@@ -5004,44 +5021,44 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	 * and compare that answer with the max chunk size. If it's higher,
>>  	 * we try to reduce stripe_size.
>>  	 */
>>-	if (stripe_size * data_stripes > max_chunk_size) {
>>+	if (ctl.stripe_size * data_stripes > ctl.max_chunk_size) {
>>  		/*
>>  		 * Reduce stripe_size, round it up to a 16MB boundary again and
>>  		 * then use it, unless it ends up being even bigger than the
>>  		 * previous value we had already.
>>  		 */
>>-		stripe_size = min(round_up(div_u64(max_chunk_size,
>>+		ctl.stripe_size = min(round_up(div_u64(ctl.max_chunk_size,
>>  						   data_stripes), SZ_16M),
>>-				  stripe_size);
>>+				       ctl.stripe_size);
>
>And here.  Thanks,

Changed to:

		ctl.stripe_size =
			min(round_up(div_u64(ctl.max_chunk_size, data_stripes),
				     SZ_16M),
			    ctl.stripe_size);

Thanks,

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/20] btrfs: factor out set_parameters()
  2020-02-06 16:40   ` Josef Bacik
@ 2020-02-07  7:59     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  7:59 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, David Sterba, Chris Mason, Nikolay Borisov,
	Damien Le Moal, Johannes Thumshirn, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 11:40:37AM -0500, Josef Bacik wrote:
>On 2/6/20 5:41 AM, Naohiro Aota wrote:
>>Factor out set_parameters() from __btrfs_alloc_chunk(). This function
>>initialises parameters of "struct alloc_chunk_ctl" for allocation.
>>set_parameters() handles a common part of the initialisation to load the
>>RAID parameters from btrfs_raid_array. set_parameters_regular() decides
>>some parameters for its allocation.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>---
>>  fs/btrfs/volumes.c | 96 ++++++++++++++++++++++++++++------------------
>>  1 file changed, 59 insertions(+), 37 deletions(-)
>>
>>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>index cfde302bf297..a5d6f0b5ca70 100644
>>--- a/fs/btrfs/volumes.c
>>+++ b/fs/btrfs/volumes.c
>>@@ -4841,6 +4841,60 @@ struct alloc_chunk_ctl {
>>  	int ndevs;
>>  };
>>+static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
>>+				   struct alloc_chunk_ctl *ctl)
>
>init_alloc_chunk_ctl_policy_regular()
>
>>+{
>>+	u64 type = ctl->type;
>>+
>>+	if (type & BTRFS_BLOCK_GROUP_DATA) {
>>+		ctl->max_stripe_size = SZ_1G;
>>+		ctl->max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>>+	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
>>+		/* for larger filesystems, use larger metadata chunks */
>>+		if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
>>+			ctl->max_stripe_size = SZ_1G;
>>+		else
>>+			ctl->max_stripe_size = SZ_256M;
>>+		ctl->max_chunk_size = ctl->max_stripe_size;
>>+	} else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>>+		ctl->max_stripe_size = SZ_32M;
>>+		ctl->max_chunk_size = 2 * ctl->max_stripe_size;
>>+		ctl->devs_max = min_t(int, ctl->devs_max,
>>+				      BTRFS_MAX_DEVS_SYS_CHUNK);
>>+	} else {
>>+		BUG();
>>+	}
>>+
>>+	/* We don't want a chunk larger than 10% of writable space */
>>+	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>>+				  ctl->max_chunk_size);
>>+}
>>+
>>+static void set_parameters(struct btrfs_fs_devices *fs_devices,
>>+			   struct alloc_chunk_ctl *ctl)
>
>init_alloc_chunk_ctl().  These function names need to be more descriptive.  Thanks,

I see. I renamed these two.

Thanks,

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation
  2020-02-07  6:06   ` Naohiro Aota
@ 2020-02-07  8:02     ` Martin Steigerwald
  0 siblings, 0 replies; 52+ messages in thread
From: Martin Steigerwald @ 2020-02-07  8:02 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel, Zhi Yong Wu

Naohiro Aota - 07.02.20, 07:06:00 CET:
> On Thu, Feb 06, 2020 at 12:43:30PM +0100, Martin Steigerwald wrote:
> >Hi Naohiro.
> >
> >Naohiro Aota - 06.02.20, 11:41:54 CET:
> >> This series refactors chunk allocation, device_extent allocation
> >> and
> >> extent allocation functions and make them generalized to be able to
> >> implement other allocation policy easily.
> >> 
> >> On top of this series, we can simplify some part of the "btrfs:
> >> zoned
> >> block device support" series as adding a new type of chunk
> >> allocator
> >> and extent allocator for zoned block devices. Furthermore, we will
> >> be
> >> able to implement and test some other allocator in the idea page of
> >> the wiki e.g. SSD caching, dedicated metadata drive, chunk
> >> allocation
> >> groups, and so on.
> >
> >Regarding SSD caching, are you aware that there has been previous
> >work with even involved handling part of it in the Virtual
> >Filesystem Switch (VFS)?
> >
> >VFS hot-data tracking, LWN article:
> >
> >https://lwn.net/Articles/525651/
> >
> >Patchset, not sure whether it is the most recent one:
> >
> >[PATCH v2 00/12] VFS hot tracking
> >
> >https://lore.kernel.org/linux-btrfs/1368493184-5939-1-git-send-email-> >zwu.kernel@gmail.com/
> Yes, I once saw the patchset. Not sure about the detail, though.
> 
> >So for SSD caching you may be able to re-use or pick up some of this
> >work, unless it would be unsuitable to be used with this new
> >approach.
> Currently, I have no plan to implement the SSD caching feature by
> myself. I think some patches of the series like this [1] can be
> reworked on my series as adding an "SSD caching chunk allocator." So,
> it's welcome to hear suggestions about the hook interface.

Thank you. Adding Zhi Yong Wu to Cc.

I don't know details about this patch set either.

@Zhi Yong Wu: Are you interested interested in rebasing your SSD caching 
patch above this refactoring work?

Ciao,
Martin

> [1]
> https://lore.kernel.org/linux-btrfs/1371817260-8615-3-git-send-email-> zwu.kernel@gmail.com/
> >Thanks,


-- 
Martin



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/20] btrfs: parameterize dev_extent_min
  2020-02-06 16:52   ` Josef Bacik
@ 2020-02-07  9:00     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  9:00 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, David Sterba, Chris Mason, Nikolay Borisov,
	Damien Le Moal, Johannes Thumshirn, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 11:52:19AM -0500, Josef Bacik wrote:
>On 2/6/20 5:42 AM, Naohiro Aota wrote:
>>Currently, we ignore a device whose available space is less than
>>"BTRFS_STRIPE_LEN * dev_stripes". This is a lower limit for current
>>allocation policy (to maximize the number of stripes). This commit
>>parameterizes dev_extent_min, so that other policies can set their own
>>lower limitation to ignore a device with an insufficient space.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>---
>>  fs/btrfs/volumes.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>index 15837374db9c..4a6cc098ee3e 100644
>>--- a/fs/btrfs/volumes.c
>>+++ b/fs/btrfs/volumes.c
>>@@ -4836,6 +4836,7 @@ struct alloc_chunk_ctl {
>>  				   store parity information */
>>  	u64 max_stripe_size;
>>  	u64 max_chunk_size;
>>+	u64 dev_extent_min;
>>  	u64 stripe_size;
>>  	u64 chunk_size;
>>  	int ndevs;
>>@@ -4868,6 +4869,7 @@ static void set_parameters_regular(struct btrfs_fs_devices *fs_devices,
>>  	/* We don't want a chunk larger than 10% of writable space */
>>  	ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>>  				  ctl->max_chunk_size);
>>+	ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
>>  }
>>  static void set_parameters(struct btrfs_fs_devices *fs_devices,
>>@@ -4903,7 +4905,6 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>>  	struct btrfs_device *device;
>>  	u64 total_avail;
>>  	u64 dev_extent_want = ctl->max_stripe_size * ctl->dev_stripes;
>>-	u64 dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes;
>>  	int ret;
>>  	int ndevs = 0;
>>  	u64 max_avail;
>>@@ -4931,7 +4932,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>>  			total_avail = 0;
>>  		/* If there is no space on this device, skip it. */
>>-		if (total_avail == 0)
>>+		if (total_avail < ctl->dev_extent_min)
>
>This isn't correct, dev_extent_min is the total size with all stripes 
>added up, not the size of a single stripe.  Thanks,

Hm, I can revert here, but isn't it no use to search into a device
whose available bytes is less than dev_extent_min (= BTRFS_STRIPE_LEN
* ctl->dev_stripes)? Since max_avail can only be less than
dev_extent_min, we anyway skip such device with the below "if
(max_avail < dev_extent_min) { ... }" part.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/20] btrfs: factor out create_chunk()
  2020-02-06 16:49   ` Josef Bacik
@ 2020-02-07  9:17     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  9:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, David Sterba, Chris Mason, Nikolay Borisov,
	Damien Le Moal, Johannes Thumshirn, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 11:49:58AM -0500, Josef Bacik wrote:
>On 2/6/20 5:42 AM, Naohiro Aota wrote:
>>Factor out create_chunk() from __btrfs_alloc_chunk(). This function finally
>>creates a chunk. There is no functional changes.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>
><snip>
>>+
>>+	ctl.start = start;
>>+	ctl.type = type;
>>+	set_parameters(fs_devices, &ctl);
>>+
>>+	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
>>+			       GFP_NOFS);
>>+	if (!devices_info)
>>+		return -ENOMEM;
>>+
>>+	ret = gather_device_info(fs_devices, &ctl, devices_info);
>>+	if (ret < 0)
>>+		goto error;
>>+
>>+	ret = decide_stripe_size(fs_devices, &ctl, devices_info);
>>+	if (ret < 0)
>>+		goto error;
>>+
>>+	ret = create_chunk(trans, &ctl, devices_info);
>>+	if (ret < 0)
>>+		goto error;
>>+
>
>This can just be
>
>out:
>	kfree(devcies_info);
>	return ret;
>
>and all the above people can just do goto out on error and we can drop 
>the error: part below.  Thanks,
>
>Josef

Great. I followed that style.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/20] btrfs: introduce clustered_alloc_info
  2020-02-06 12:44   ` Su Yue
@ 2020-02-07  9:25     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  9:25 UTC (permalink / raw)
  To: Su Yue
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Johannes Thumshirn,
	Hannes Reinecke, Anand Jain, linux-fsdevel

On Thu, Feb 06, 2020 at 08:44:51PM +0800, Su Yue wrote:
>On 2020/2/6 6:42 PM, Naohiro Aota wrote:
>>Introduce struct clustered_alloc_info to manage parameters related to
>>clustered allocation. By separating clustered_alloc_info and
>>find_free_extent_ctl, we can introduce other allocation policy. One can
>>access per-allocation policy private information from "alloc_info" of
>>struct find_free_extent_ctl.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>---
>>  fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++-----------------
>>  1 file changed, 59 insertions(+), 40 deletions(-)
>>
>>diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>index b1f52eee24fe..8124a6461043 100644
>>--- a/fs/btrfs/extent-tree.c
>>+++ b/fs/btrfs/extent-tree.c
>>@@ -3456,9 +3456,6 @@ struct find_free_extent_ctl {
>>  	/* Where to start the search inside the bg */
>>  	u64 search_start;
>>
>>-	/* For clustered allocation */
>>-	u64 empty_cluster;
>>-
>>  	bool have_caching_bg;
>>  	bool orig_have_caching_bg;
>>
>>@@ -3470,18 +3467,6 @@ struct find_free_extent_ctl {
>>  	 */
>>  	int loop;
>>
>>-	/*
>>-	 * Whether we're refilling a cluster, if true we need to re-search
>>-	 * current block group but don't try to refill the cluster again.
>>-	 */
>>-	bool retry_clustered;
>>-
>>-	/*
>>-	 * Whether we're updating free space cache, if true we need to re-search
>>-	 * current block group but don't try updating free space cache again.
>>-	 */
>>-	bool retry_unclustered;
>>-
>>  	/* If current block group is cached */
>>  	int cached;
>>
>>@@ -3499,8 +3484,28 @@ struct find_free_extent_ctl {
>>
>>  	/* Allocation policy */
>>  	enum btrfs_extent_allocation_policy policy;
>>+	void *alloc_info;
>>  };
>>
>>+struct clustered_alloc_info {
>>+	/* For clustered allocation */
>>+	u64 empty_cluster;
>>+
>>+	/*
>>+	 * Whether we're refilling a cluster, if true we need to re-search
>>+	 * current block group but don't try to refill the cluster again.
>>+	 */
>>+	bool retry_clustered;
>>+
>>+	/*
>>+	 * Whether we're updating free space cache, if true we need to re-search
>>+	 * current block group but don't try updating free space cache again.
>>+	 */
>>+	bool retry_unclustered;
>>+
>>+	struct btrfs_free_cluster *last_ptr;
>>+	bool use_cluster;
>>+};
>>
>>  /*
>>   * Helper function for find_free_extent().
>>@@ -3516,6 +3521,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>>  		struct btrfs_block_group **cluster_bg_ret)
>>  {
>>  	struct btrfs_block_group *cluster_bg;
>>+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>>  	u64 aligned_cluster;
>>  	u64 offset;
>>  	int ret;
>>@@ -3572,7 +3578,7 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>>  	}
>>
>>  	aligned_cluster = max_t(u64,
>>-			ffe_ctl->empty_cluster + ffe_ctl->empty_size,
>>+			clustered->empty_cluster + ffe_ctl->empty_size,
>>  			bg->full_stripe_len);
>>  	ret = btrfs_find_space_cluster(bg, last_ptr, ffe_ctl->search_start,
>>  			ffe_ctl->num_bytes, aligned_cluster);
>>@@ -3591,12 +3597,12 @@ static int find_free_extent_clustered(struct btrfs_block_group *bg,
>>  			return 0;
>>  		}
>>  	} else if (!ffe_ctl->cached && ffe_ctl->loop > LOOP_CACHING_NOWAIT &&
>>-		   !ffe_ctl->retry_clustered) {
>>+		   !clustered->retry_clustered) {
>>  		spin_unlock(&last_ptr->refill_lock);
>>
>>-		ffe_ctl->retry_clustered = true;
>>+		clustered->retry_clustered = true;
>>  		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
>>-				ffe_ctl->empty_cluster + ffe_ctl->empty_size);
>>+				clustered->empty_cluster + ffe_ctl->empty_size);
>>  		return -EAGAIN;
>>  	}
>>  	/*
>>@@ -3618,6 +3624,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>>  		struct btrfs_free_cluster *last_ptr,
>>  		struct find_free_extent_ctl *ffe_ctl)
>>  {
>>+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>>  	u64 offset;
>>
>>  	/*
>>@@ -3636,7 +3643,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>>  		free_space_ctl = bg->free_space_ctl;
>>  		spin_lock(&free_space_ctl->tree_lock);
>>  		if (free_space_ctl->free_space <
>>-		    ffe_ctl->num_bytes + ffe_ctl->empty_cluster +
>>+		    ffe_ctl->num_bytes + clustered->empty_cluster +
>>  		    ffe_ctl->empty_size) {
>>  			ffe_ctl->total_free_space = max_t(u64,
>>  					ffe_ctl->total_free_space,
>>@@ -3660,11 +3667,11 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
>>  	 * If @retry_unclustered is true then we've already waited on this
>>  	 * block group once and should move on to the next block group.
>>  	 */
>>-	if (!offset && !ffe_ctl->retry_unclustered && !ffe_ctl->cached &&
>>+	if (!offset && !clustered->retry_unclustered && !ffe_ctl->cached &&
>>  	    ffe_ctl->loop > LOOP_CACHING_NOWAIT) {
>>  		btrfs_wait_block_group_cache_progress(bg, ffe_ctl->num_bytes +
>>  						      ffe_ctl->empty_size);
>>-		ffe_ctl->retry_unclustered = true;
>>+		clustered->retry_unclustered = true;
>>  		return -EAGAIN;
>>  	} else if (!offset) {
>>  		return 1;
>>@@ -3685,6 +3692,7 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
>>  					bool full_search, bool use_cluster)
>>  {
>>  	struct btrfs_root *root = fs_info->extent_root;
>>+	struct clustered_alloc_info *clustered = ffe_ctl->alloc_info;
>>  	int ret;
>>
>>  	if ((ffe_ctl->loop == LOOP_CACHING_NOWAIT) &&
>>@@ -3774,10 +3782,10 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
>>  			 * no empty_cluster.
>>  			 */
>>  			if (ffe_ctl->empty_size == 0 &&
>>-			    ffe_ctl->empty_cluster == 0)
>>+			    clustered->empty_cluster == 0)
>>  				return -ENOSPC;
>>  			ffe_ctl->empty_size = 0;
>>-			ffe_ctl->empty_cluster = 0;
>>+			clustered->empty_cluster = 0;
>>  		}
>>  		return 1;
>>  	}
>>@@ -3816,11 +3824,10 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  {
>>  	int ret = 0;
>>  	int cache_block_group_error = 0;
>>-	struct btrfs_free_cluster *last_ptr = NULL;
>>  	struct btrfs_block_group *block_group = NULL;
>>  	struct find_free_extent_ctl ffe_ctl = {0};
>>  	struct btrfs_space_info *space_info;
>>-	bool use_cluster = true;
>>+	struct clustered_alloc_info *clustered = NULL;
>>  	bool full_search = false;
>>
>>  	WARN_ON(num_bytes < fs_info->sectorsize);
>>@@ -3829,8 +3836,6 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  	ffe_ctl.empty_size = empty_size;
>>  	ffe_ctl.flags = flags;
>>  	ffe_ctl.search_start = 0;
>>-	ffe_ctl.retry_clustered = false;
>>-	ffe_ctl.retry_unclustered = false;
>>  	ffe_ctl.delalloc = delalloc;
>>  	ffe_ctl.index = btrfs_bg_flags_to_raid_index(flags);
>>  	ffe_ctl.have_caching_bg = false;
>>@@ -3851,6 +3856,15 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  		return -ENOSPC;
>>  	}
>>
>>+	clustered = kzalloc(sizeof(*clustered), GFP_NOFS);
>>+	if (!clustered)
>>+		return -ENOMEM;
>
>NIT of coding style, please pick the kzalloc after the whole assignment
>zone.

Hm, this part will be eventually folded into
prepare_allocation_clustered() with the below part (of doing
fetch_cluster_info() etc.) in the patch 20. So, I'm not sure I really
need to move it within this patch...

>>+	clustered->last_ptr = NULL;
>>+	clustered->use_cluster = true;
>>+	clustered->retry_clustered = false;
>>+	clustered->retry_unclustered = false;
>>+	ffe_ctl.alloc_info = clustered;
>>+
>>  	/*
>>  	 * If our free space is heavily fragmented we may not be able to make
>>  	 * big contiguous allocations, so instead of doing the expensive search
>>@@ -3869,14 +3883,16 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  			spin_unlock(&space_info->lock);
>>  			return -ENOSPC;
>>  		} else if (space_info->max_extent_size) {
>>-			use_cluster = false;
>>+			clustered->use_cluster = false;
>>  		}
>>  		spin_unlock(&space_info->lock);
>>  	}
>>
>>-	last_ptr = fetch_cluster_info(fs_info, space_info,
>>-				      &ffe_ctl.empty_cluster);
>>-	if (last_ptr) {
>>+	clustered->last_ptr = fetch_cluster_info(fs_info, space_info,
>>+						 &clustered->empty_cluster);
>>+	if (clustered->last_ptr) {
>>+		struct btrfs_free_cluster *last_ptr = clustered->last_ptr;
>>+
>>  		spin_lock(&last_ptr->lock);
>>  		if (last_ptr->block_group)
>>  			ffe_ctl.hint_byte = last_ptr->window_start;
>>@@ -3887,7 +3903,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  			 * some time.
>>  			 */
>>  			ffe_ctl.hint_byte = last_ptr->window_start;
>>-			use_cluster = false;
>>+			clustered->use_cluster = false;
>>  		}
>>  		spin_unlock(&last_ptr->lock);
>>  	}
>>@@ -4000,10 +4016,11 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  		 * Ok we want to try and use the cluster allocator, so
>>  		 * lets look there
>>  		 */
>>-		if (last_ptr && use_cluster) {
>>+		if (clustered->last_ptr && clustered->use_cluster) {
>>  			struct btrfs_block_group *cluster_bg = NULL;
>>
>>-			ret = find_free_extent_clustered(block_group, last_ptr,
>>+			ret = find_free_extent_clustered(block_group,
>>+							 clustered->last_ptr,
>>  							 &ffe_ctl, &cluster_bg);
>>
>>  			if (ret == 0) {
>>@@ -4021,7 +4038,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  			/* ret == -ENOENT case falls through */
>>  		}
>>
>>-		ret = find_free_extent_unclustered(block_group, last_ptr,
>>+		ret = find_free_extent_unclustered(block_group,
>>+						   clustered->last_ptr,
>>  						   &ffe_ctl);
>>  		if (ret == -EAGAIN)
>>  			goto have_block_group;
>>@@ -4062,8 +4080,8 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  		btrfs_release_block_group(block_group, delalloc);
>>  		break;
>>  loop:
>>-		ffe_ctl.retry_clustered = false;
>>-		ffe_ctl.retry_unclustered = false;
>>+		clustered->retry_clustered = false;
>>+		clustered->retry_unclustered = false;
>>  		BUG_ON(btrfs_bg_flags_to_raid_index(block_group->flags) !=
>>  		       ffe_ctl.index);
>>  		btrfs_release_block_group(block_group, delalloc);
>>@@ -4071,8 +4089,9 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>>  	}
>>  	up_read(&space_info->groups_sem);
>>
>>-	ret = find_free_extent_update_loop(fs_info, last_ptr, ins, &ffe_ctl,
>>-					   full_search, use_cluster);
>>+	ret = find_free_extent_update_loop(fs_info, clustered->last_ptr, ins,
>>+					   &ffe_ctl, full_search,
>>+					   clustered->use_cluster);
>>  	if (ret > 0)
>>  		goto search;
>>
>>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/20] btrfs: introduce clustered_alloc_info
  2020-02-06 17:01   ` Josef Bacik
@ 2020-02-07  9:53     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  9:53 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, David Sterba, Chris Mason, Nikolay Borisov,
	Damien Le Moal, Johannes Thumshirn, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 12:01:28PM -0500, Josef Bacik wrote:
>On 2/6/20 5:42 AM, Naohiro Aota wrote:
>>Introduce struct clustered_alloc_info to manage parameters related to
>>clustered allocation. By separating clustered_alloc_info and
>>find_free_extent_ctl, we can introduce other allocation policy. One can
>>access per-allocation policy private information from "alloc_info" of
>>struct find_free_extent_ctl.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>---
>>  fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++-----------------
>>  1 file changed, 59 insertions(+), 40 deletions(-)
>>
>>diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>index b1f52eee24fe..8124a6461043 100644
>>--- a/fs/btrfs/extent-tree.c
>>+++ b/fs/btrfs/extent-tree.c
>>@@ -3456,9 +3456,6 @@ struct find_free_extent_ctl {
>>  	/* Where to start the search inside the bg */
>>  	u64 search_start;
>>-	/* For clustered allocation */
>>-	u64 empty_cluster;
>>-
>>  	bool have_caching_bg;
>>  	bool orig_have_caching_bg;
>>@@ -3470,18 +3467,6 @@ struct find_free_extent_ctl {
>>  	 */
>>  	int loop;
>>-	/*
>>-	 * Whether we're refilling a cluster, if true we need to re-search
>>-	 * current block group but don't try to refill the cluster again.
>>-	 */
>>-	bool retry_clustered;
>>-
>>-	/*
>>-	 * Whether we're updating free space cache, if true we need to re-search
>>-	 * current block group but don't try updating free space cache again.
>>-	 */
>>-	bool retry_unclustered;
>>-
>>  	/* If current block group is cached */
>>  	int cached;
>>@@ -3499,8 +3484,28 @@ struct find_free_extent_ctl {
>>  	/* Allocation policy */
>>  	enum btrfs_extent_allocation_policy policy;
>>+	void *alloc_info;
>>  };
>>+struct clustered_alloc_info {
>>+	/* For clustered allocation */
>>+	u64 empty_cluster;
>>+
>>+	/*
>>+	 * Whether we're refilling a cluster, if true we need to re-search
>>+	 * current block group but don't try to refill the cluster again.
>>+	 */
>>+	bool retry_clustered;
>>+
>>+	/*
>>+	 * Whether we're updating free space cache, if true we need to re-search
>>+	 * current block group but don't try updating free space cache again.
>>+	 */
>>+	bool retry_unclustered;
>>+
>>+	struct btrfs_free_cluster *last_ptr;
>>+	bool use_cluster;
>This isn't the right place for this, rather I'd put it in the 
>find_free_extent_ctl if you want it at all.
>
>And in fact I question the whole need for this in the first place.  I 
>assume your goal is to just disable clustered allocation for shingle 
>drives, so why don't you just handle that with your extent allocation 
>policy flag?  If it's set to shingled then use_cluster = false and you 
>are good to go, no need to add all this complication of the cluster 
>ctl.

Not really. The clustered allocator (= the current allocator) first
try allocation using a cluster by calling find_free_extent_clustered()
and, if it failed, try allocation without cluster by calling
find_free_extent_unclustered(). This "use_cluster" indicates to skip
the first find_free_extent_clustered().

When not in BTRFS_EXTENT_ALLOC_CLUSTERED, then both of these functions
are skipped (actually, another function for the policy is called). So
the policy controls higher-level function call and the "use_cluster"
controls lower level (only in BTRFS_EXTENT_ALLOC_CLUSTERED level)
function call.

>If you are looking to save space in the ctl, then I would just union 
>{} the cluster stuff inside of the find_free_extent_ctl so the right 
>flags are used for the correction allocation policy.
>
>This whole last set of 10 patches needs to be reworked.  Thanks,

I'm fine with keeping these variable in find_free_extent_ctl. We can
delay this separation of clustered_alloc_info until we really want to
e.g. when some other policy want to use many per-policy
variables. (and, actually, zoned allocator is not using any per-policy
variable)

Thanks,
Naohiro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/20] btrfs: factor out gather_device_info()
  2020-02-06 15:43   ` Johannes Thumshirn
@ 2020-02-07  9:54     ` Naohiro Aota
  0 siblings, 0 replies; 52+ messages in thread
From: Naohiro Aota @ 2020-02-07  9:54 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik,
	Nikolay Borisov, Damien Le Moal, Hannes Reinecke, Anand Jain,
	linux-fsdevel

On Thu, Feb 06, 2020 at 03:43:10PM +0000, Johannes Thumshirn wrote:
>On 06/02/2020 11:44, Naohiro Aota wrote:
>> +	BUG_ON(!alloc_profile_is_valid(type, 0));
>
>I know this was present in the old code as well, but can we turn this
>into an ASSERT() + error return?
>

OK, I'll add this as a separated patch.

Thanks,
Naohiro

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-02-07  9:54 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06 10:41 [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Naohiro Aota
2020-02-06 10:41 ` [PATCH 01/20] btrfs: change type of full_search to bool Naohiro Aota
2020-02-06 11:26   ` Johannes Thumshirn
2020-02-06 16:03   ` Josef Bacik
2020-02-06 10:41 ` [PATCH 02/20] btrfs: introduce chunk allocation policy Naohiro Aota
2020-02-06 11:30   ` Johannes Thumshirn
2020-02-07  6:11     ` Naohiro Aota
2020-02-06 16:06   ` Josef Bacik
2020-02-06 16:07   ` Josef Bacik
2020-02-06 10:41 ` [PATCH 03/20] btrfs: refactor find_free_dev_extent_start() Naohiro Aota
2020-02-06 12:02   ` Johannes Thumshirn
2020-02-07  6:22     ` Naohiro Aota
2020-02-06 16:34   ` Josef Bacik
2020-02-06 10:41 ` [PATCH 04/20] btrfs: introduce alloc_chunk_ctl Naohiro Aota
2020-02-06 12:07   ` Johannes Thumshirn
2020-02-06 16:38   ` Josef Bacik
2020-02-07  7:08     ` Naohiro Aota
2020-02-06 10:41 ` [PATCH 05/20] btrfs: factor out set_parameters() Naohiro Aota
2020-02-06 13:51   ` Johannes Thumshirn
2020-02-06 16:40   ` Josef Bacik
2020-02-07  7:59     ` Naohiro Aota
2020-02-06 10:42 ` [PATCH 06/20] btrfs: factor out gather_device_info() Naohiro Aota
2020-02-06 15:43   ` Johannes Thumshirn
2020-02-07  9:54     ` Naohiro Aota
2020-02-06 16:44   ` Josef Bacik
2020-02-06 10:42 ` [PATCH 07/20] btrfs: factor out decide_stripe_size() Naohiro Aota
2020-02-06 15:59   ` Johannes Thumshirn
2020-02-06 16:47   ` Josef Bacik
2020-02-06 10:42 ` [PATCH 08/20] btrfs: factor out create_chunk() Naohiro Aota
2020-02-06 16:49   ` Josef Bacik
2020-02-07  9:17     ` Naohiro Aota
2020-02-06 10:42 ` [PATCH 09/20] btrfs: parameterize dev_extent_min Naohiro Aota
2020-02-06 16:52   ` Josef Bacik
2020-02-07  9:00     ` Naohiro Aota
2020-02-06 10:42 ` [PATCH 10/20] btrfs: introduce extent allocation policy Naohiro Aota
2020-02-06 10:42 ` [PATCH 11/20] btrfs: move hint_byte into find_free_extent_ctl Naohiro Aota
2020-02-06 10:42 ` [PATCH 12/20] btrfs: introduce clustered_alloc_info Naohiro Aota
2020-02-06 12:44   ` Su Yue
2020-02-07  9:25     ` Naohiro Aota
2020-02-06 17:01   ` Josef Bacik
2020-02-07  9:53     ` Naohiro Aota
2020-02-06 10:42 ` [PATCH 13/20] btrfs: factor out do_allocation() Naohiro Aota
2020-02-06 10:42 ` [PATCH 14/20] btrfs: drop unnecessary arguments from clustered allocation functions Naohiro Aota
2020-02-06 10:42 ` [PATCH 15/20] btrfs: factor out release_block_group() Naohiro Aota
2020-02-06 10:42 ` [PATCH 16/20] btrfs: factor out found_extent() Naohiro Aota
2020-02-06 10:42 ` [PATCH 17/20] btrfs: drop unnecessary arguments from find_free_extent_update_loop() Naohiro Aota
2020-02-06 10:42 ` [PATCH 18/20] btrfs: factor out chunk_allocation_failed() Naohiro Aota
2020-02-06 10:42 ` [PATCH 19/20] btrfs: skip LOOP_NO_EMPTY_SIZE if not clustered allocation Naohiro Aota
2020-02-06 10:42 ` [PATCH 20/20] btrfs: factor out prepare_allocation() Naohiro Aota
2020-02-06 11:43 ` [PATCH 00/20] btrfs: refactor and generalize chunk/dev_extent/extent allocation Martin Steigerwald
2020-02-07  6:06   ` Naohiro Aota
2020-02-07  8:02     ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.