Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] btrfs: Speedup chunk allocation for large fs
@ 2019-01-30  7:39 Qu Wenruo
  2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Qu Wenruo @ 2019-01-30  7:39 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/falloc_speedup

Which is based on v5.0-rc1 tag, with another enospc debug patch.

Btrfs falloc can be slower and slower when there are more and more block
groups.

One cause of this problem is find_free_dev_extent(), as it always search
from device offset 0, and if there are thousands existing dev extents
btrfs will search leaf by leaf until it reaches a free slot.

This is super slow and inefficient.

This patchset will introduce a new member,
btrfs_device::hint_free_dev_extent to give some hint for
find_free_dev_extent().

The full cause analyse and benchmark can be found in the 2nd patch.

Qu Wenruo (2):
  btrfs: Don't search devid for every verify_one_dev_extent() call
  btrfs: Introduce free dev extent hint to speed up chunk allocation

 fs/btrfs/volumes.c | 49 ++++++++++++++++++++++++++++-----------
 fs/btrfs/volumes.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+), 13 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call
  2019-01-30  7:39 [PATCH 0/2] btrfs: Speedup chunk allocation for large fs Qu Wenruo
@ 2019-01-30  7:39 ` Qu Wenruo
  2019-01-30  9:13   ` Nikolay Borisov
  2019-01-31  9:38   ` Anand Jain
  2019-01-30  7:40 ` [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Qu Wenruo
  2019-02-08 22:27 ` [PATCH 0/2] btrfs: Speedup chunk allocation for large fs David Sterba
  2 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2019-01-30  7:39 UTC (permalink / raw)
  To: linux-btrfs

verify_one_dev_extent() will call btrfs_find_device() for each dev
extent, this waste some CPU time just searching the devices list.

Move the search one level up, into the btrfs_verify_dev_extents(), so
for each device we only call btrfs_find_device() once.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2576b1a379c9..8e932d7d2fe6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7761,13 +7761,14 @@ static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
 }
 
 static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
-				 u64 chunk_offset, u64 devid,
-				 u64 physical_offset, u64 physical_len)
+				 struct btrfs_device *dev,
+				 u64 chunk_offset, u64 physical_offset,
+				 u64 physical_len)
 {
 	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
 	struct extent_map *em;
 	struct map_lookup *map;
-	struct btrfs_device *dev;
+	u64 devid = dev->devid;
 	u64 stripe_len;
 	bool found = false;
 	int ret = 0;
@@ -7819,12 +7820,6 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
 	}
 
 	/* Make sure no dev extent is beyond device bondary */
-	dev = btrfs_find_device(fs_info, devid, NULL, NULL);
-	if (!dev) {
-		btrfs_err(fs_info, "failed to find devid %llu", devid);
-		ret = -EUCLEAN;
-		goto out;
-	}
 	if (physical_offset + physical_len > dev->disk_total_bytes) {
 		btrfs_err(fs_info,
 "dev extent devid %llu physical offset %llu len %llu is beyond device boundary %llu",
@@ -7874,6 +7869,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_path *path;
 	struct btrfs_root *root = fs_info->dev_root;
+	struct btrfs_device *device = NULL;
 	struct btrfs_key key;
 	u64 prev_devid = 0;
 	u64 prev_dev_ext_end = 0;
@@ -7917,6 +7913,16 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 		devid = key.objectid;
 		physical_offset = key.offset;
 
+		if (!device || devid != device->devid) {
+			device = btrfs_find_device(fs_info, devid, NULL, NULL);
+			if (!device) {
+				btrfs_err(fs_info, "failed to find devid %llu",
+					  devid);
+				ret = -EUCLEAN;
+				goto out;
+			}
+		}
+
 		dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
 		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
 		physical_len = btrfs_dev_extent_length(leaf, dext);
@@ -7930,7 +7936,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 			goto out;
 		}
 
-		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
+		ret = verify_one_dev_extent(fs_info, device, chunk_offset,
 					    physical_offset, physical_len);
 		if (ret < 0)
 			goto out;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation
  2019-01-30  7:39 [PATCH 0/2] btrfs: Speedup chunk allocation for large fs Qu Wenruo
  2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
@ 2019-01-30  7:40 ` Qu Wenruo
  2019-01-31  2:38   ` Qu Wenruo
  2019-02-08 22:27 ` [PATCH 0/2] btrfs: Speedup chunk allocation for large fs David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-01-30  7:40 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
When allocating a 4G file, falloc() call is slower and slower if the fs
is more and more filled.

This is going to be super obvious when doing some crazy test.
E.g. fallocating a 1PiB file TiB by TiB.
The first 16T only takes 10 seconds to finish while the last 16TiB can
take over 15min to finish.

[CAUSE]
When allocating new dev extents for a chunk, btrfs doesn't have any free
dev extent cache, thus it can only search device tree to find free slot.

However it always search from device offset 0, and if there are a lot of
dev extents already allocated, such search will be super time consuming.

The following is function execution time for __btrfs_alloc_chunk()
and find_free_dev_extent_start(), one is allocating 4G on an empty fs,
the other one is allocating 4G on a 4T used fs.
      empty fs   |   4T used fs    |    function
-----------------------------------------------------------------------
 4)              | 7)              |  __btrfs_alloc_chunk [btrfs]() {
 4)   4.839 us   | 7) ! 152.496 us |    find_free_dev_extent_start [btrfs]()
 4) + 64.692 us  | 7) ! 185.488 us |  }
 4)              | 7)              |  __btrfs_alloc_chunk [btrfs]() {
 4) + 12.844 us  | 7) ! 132.889 us |    find_free_dev_extent_start [btrfs]()
 4) ! 131.877 us | 7) ! 152.115 us |  }
 4)              | 7)              |  __btrfs_alloc_chunk [btrfs]() {
 4)   2.375 us   | 7) ! 127.689 us |    find_free_dev_extent_start [btrfs]()
 4) + 16.992 us  | 7) ! 146.595 us |  }
 4)              | 7)              |  __btrfs_alloc_chunk [btrfs]() {
 4)   2.144 us   | 7) ! 126.657 us |    find_free_dev_extent_start [btrfs]()
 4) + 16.280 us  | 7) ! 144.521 us |  }

It's pretty ovbious that find_free_free_dev_extent_start() takes over
20x more time for 4TB used fs, causing chunk allocation much slower.

[ENHANCEMENT]
This patch will introduce btrfs_device::hint_free_dev_extent member to
give some hint for chunk allocator to find free dev extents.

The hint itself is pretty simple, only tells where the first free slot
could possibly be.

It is not 100% correct, unlike free space cache, but since
find_free_dev_extent_start() is already robust enough to handle
search_hint, so there is not need to introduce a complex and fancy free
dev extent cache.

With this patch, allocating 4G on a 4T filled fs will be way more
faster:

      v5.0-rc1   |   patched      |    function
---------------------------------------------------------------------
 7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
 7) ! 152.496 us | 7)   7.885 us  |    find_free_dev_extent_start [btrfs]();
 7) ! 185.488 us | 7) + 36.649 us |  }
 7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
 7) ! 132.889 us | 7)   2.454 us  |    find_free_dev_extent_start [btrfs]();
 7) ! 152.115 us | 7) + 24.145 us |  }
 7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
 7) ! 127.689 us | 7)   2.245 us  |    find_free_dev_extent_start [btrfs]();
 7) ! 146.595 us | 7) + 19.376 us |  }
 7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
 7) ! 126.657 us | 7)   2.174 us  |    find_free_dev_extent_start [btrfs]();
 7) ! 144.521 us | 7) + 16.321 us |  }

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 23 +++++++++++++++---
 fs/btrfs/volumes.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8e932d7d2fe6..cc15bf70dc72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -411,6 +411,7 @@ static struct btrfs_device *__alloc_device(void)
 	btrfs_device_data_ordered_init(dev);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
+	dev->hint_free_dev_extent = (u64)-1;
 
 	return dev;
 }
@@ -1741,9 +1742,9 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
 			 struct btrfs_device *device, u64 num_bytes,
 			 u64 *start, u64 *len)
 {
-	/* FIXME use last free of some kind */
-	return find_free_dev_extent_start(trans->transaction, device,
-					  num_bytes, 0, start, len);
+	return find_free_dev_extent_start(trans->transaction, device, num_bytes,
+					  device->hint_free_dev_extent, start,
+					  len);
 }
 
 static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
@@ -1799,6 +1800,7 @@ static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
 				      "Failed to remove dev extent item");
 	} else {
 		set_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags);
+		btrfs_device_hint_add_free(device, key.offset, *dev_extent_len);
 	}
 out:
 	btrfs_free_path(path);
@@ -1841,6 +1843,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
 	btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
 
 	btrfs_set_dev_extent_length(leaf, extent, num_bytes);
+	btrfs_device_hint_del_free(device, key.offset, num_bytes);
 	btrfs_mark_buffer_dirty(leaf);
 out:
 	btrfs_free_path(path);
@@ -7913,6 +7916,14 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 		devid = key.objectid;
 		physical_offset = key.offset;
 
+		/*
+		 * previous device verification is done, update its free dev
+		 * extent hint
+		 */
+		if (device && devid != device->devid)
+			btrfs_device_hint_add_free(device, prev_dev_ext_end,
+				device->disk_total_bytes - prev_dev_ext_end);
+
 		if (!device || devid != device->devid) {
 			device = btrfs_find_device(fs_info, devid, NULL, NULL);
 			if (!device) {
@@ -7940,6 +7951,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 					    physical_offset, physical_len);
 		if (ret < 0)
 			goto out;
+
+		btrfs_device_hint_add_free(device, prev_dev_ext_end,
+				physical_offset - prev_dev_ext_end);
+
 		prev_devid = devid;
 		prev_dev_ext_end = physical_offset + physical_len;
 
@@ -7951,6 +7966,8 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 			break;
 		}
 	}
+	btrfs_device_hint_add_free(device, prev_dev_ext_end,
+			device->disk_total_bytes - prev_dev_ext_end);
 
 	/* Ensure all chunks have corresponding dev extents */
 	ret = verify_chunk_dev_extent_mapping(fs_info);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ed806649a473..00f7ef72466f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -108,6 +108,14 @@ struct btrfs_device {
 
 	/* bytes used on the current transaction */
 	u64 commit_bytes_used;
+
+	/*
+	 * hint about where the first possible free dev extent is.
+	 *
+	 * u64(-1) means no hint.
+	 */
+	u64 hint_free_dev_extent;
+
 	/*
 	 * used to manage the device which is resized
 	 *
@@ -569,4 +577,54 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 int btrfs_bg_type_to_factor(u64 flags);
 int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
 
+static inline void btrfs_device_hint_add_free(struct btrfs_device *dev,
+					      u64 start, u64 len)
+{
+	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
+		return;
+	if (len < SZ_16M)
+		return;
+	if (start > dev->hint_free_dev_extent)
+		return;
+	dev->hint_free_dev_extent = start;
+}
+
+static inline void btrfs_device_hint_del_free(struct btrfs_device *dev,
+					      u64 start, u64 len)
+{
+	u64 free_hint = dev->hint_free_dev_extent;
+
+	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
+		return;
+	/*
+	 * |<- to be removed ->|
+	 * 			| free hint
+	 * Not affecting free hint
+	 */
+	if (start + len <= free_hint)
+		return;
+	/*
+	 * |<- to be removed ->|
+	 * 		| free hint
+	 * Or
+	 * 	|<- to be removed ->|
+	 * | free hint
+	 * |<-->| Less than 16M
+	 *
+	 * Move the hint to the range end
+	 */
+	if ((start <= free_hint && start + len > free_hint) ||
+	    (start > free_hint && free_hint - start < SZ_16M)) {
+		dev->hint_free_dev_extent = start + len;
+		return;
+	}
+
+	/*
+	 * 			|<- to be removed ->|
+	 * | free hint
+	 *
+	 * We still have larger than 16M free space, no need to update
+	 * free hint
+	 */
+}
 #endif
-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call
  2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
@ 2019-01-30  9:13   ` Nikolay Borisov
  2019-01-31  9:38   ` Anand Jain
  1 sibling, 0 replies; 7+ messages in thread
From: Nikolay Borisov @ 2019-01-30  9:13 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 30.01.19 г. 9:39 ч., Qu Wenruo wrote:
> verify_one_dev_extent() will call btrfs_find_device() for each dev
> extent, this waste some CPU time just searching the devices list.
> 
> Move the search one level up, into the btrfs_verify_dev_extents(), so
> for each device we only call btrfs_find_device() once.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Seems the sane thing to do, so:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2576b1a379c9..8e932d7d2fe6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7761,13 +7761,14 @@ static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
>  }
>  
>  static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
> -				 u64 chunk_offset, u64 devid,
> -				 u64 physical_offset, u64 physical_len)
> +				 struct btrfs_device *dev,
> +				 u64 chunk_offset, u64 physical_offset,
> +				 u64 physical_len)
>  {
>  	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
>  	struct extent_map *em;
>  	struct map_lookup *map;
> -	struct btrfs_device *dev;
> +	u64 devid = dev->devid;
>  	u64 stripe_len;
>  	bool found = false;
>  	int ret = 0;
> @@ -7819,12 +7820,6 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
>  	}
>  
>  	/* Make sure no dev extent is beyond device bondary */
> -	dev = btrfs_find_device(fs_info, devid, NULL, NULL);
> -	if (!dev) {
> -		btrfs_err(fs_info, "failed to find devid %llu", devid);
> -		ret = -EUCLEAN;
> -		goto out;
> -	}
>  	if (physical_offset + physical_len > dev->disk_total_bytes) {
>  		btrfs_err(fs_info,
>  "dev extent devid %llu physical offset %llu len %llu is beyond device boundary %llu",
> @@ -7874,6 +7869,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_root *root = fs_info->dev_root;
> +	struct btrfs_device *device = NULL;
>  	struct btrfs_key key;
>  	u64 prev_devid = 0;
>  	u64 prev_dev_ext_end = 0;
> @@ -7917,6 +7913,16 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  		devid = key.objectid;
>  		physical_offset = key.offset;
>  
> +		if (!device || devid != device->devid) {
> +			device = btrfs_find_device(fs_info, devid, NULL, NULL);
> +			if (!device) {
> +				btrfs_err(fs_info, "failed to find devid %llu",
> +					  devid);
> +				ret = -EUCLEAN;
> +				goto out;
> +			}
> +		}
> +
>  		dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
>  		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
>  		physical_len = btrfs_dev_extent_length(leaf, dext);
> @@ -7930,7 +7936,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  			goto out;
>  		}
>  
> -		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
> +		ret = verify_one_dev_extent(fs_info, device, chunk_offset,
>  					    physical_offset, physical_len);
>  		if (ret < 0)
>  			goto out;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation
  2019-01-30  7:40 ` [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Qu Wenruo
@ 2019-01-31  2:38   ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2019-01-31  2:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 7102 bytes --]

> [ENHANCEMENT]
> This patch will introduce btrfs_device::hint_free_dev_extent member to
> give some hint for chunk allocator to find free dev extents.
> 
> The hint itself is pretty simple, only tells where the first free slot
> could possibly be.
> 
> It is not 100% correct, unlike free space cache, but since
> find_free_dev_extent_start() is already robust enough to handle
> search_hint, so there is not need to introduce a complex and fancy free
> dev extent cache.
> 
> With this patch, allocating 4G on a 4T filled fs will be way more
> faster:
> 
>       v5.0-rc1   |   patched      |    function
> ---------------------------------------------------------------------
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 152.496 us | 7)   7.885 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 185.488 us | 7) + 36.649 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 132.889 us | 7)   2.454 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 152.115 us | 7) + 24.145 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 127.689 us | 7)   2.245 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 146.595 us | 7) + 19.376 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 126.657 us | 7)   2.174 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 144.521 us | 7) + 16.321 us |  }

For anyone who is interesting in unrealistic workload, without this
patch, fallocating a 1PiB file TiB by TiB will take 5+ hours!!

With this patch, it's just going to take around 15~20min.

Anyway, we're still far from customer oriented 1PiB HDDs, so that's not
something we need to bother yet.

Thanks,
Qu

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/volumes.c | 23 +++++++++++++++---
>  fs/btrfs/volumes.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8e932d7d2fe6..cc15bf70dc72 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -411,6 +411,7 @@ static struct btrfs_device *__alloc_device(void)
>  	btrfs_device_data_ordered_init(dev);
>  	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
>  	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
> +	dev->hint_free_dev_extent = (u64)-1;
>  
>  	return dev;
>  }
> @@ -1741,9 +1742,9 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
>  			 struct btrfs_device *device, u64 num_bytes,
>  			 u64 *start, u64 *len)
>  {
> -	/* FIXME use last free of some kind */
> -	return find_free_dev_extent_start(trans->transaction, device,
> -					  num_bytes, 0, start, len);
> +	return find_free_dev_extent_start(trans->transaction, device, num_bytes,
> +					  device->hint_free_dev_extent, start,
> +					  len);
>  }
>  
>  static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
> @@ -1799,6 +1800,7 @@ static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
>  				      "Failed to remove dev extent item");
>  	} else {
>  		set_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags);
> +		btrfs_device_hint_add_free(device, key.offset, *dev_extent_len);
>  	}
>  out:
>  	btrfs_free_path(path);
> @@ -1841,6 +1843,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>  	btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
>  
>  	btrfs_set_dev_extent_length(leaf, extent, num_bytes);
> +	btrfs_device_hint_del_free(device, key.offset, num_bytes);
>  	btrfs_mark_buffer_dirty(leaf);
>  out:
>  	btrfs_free_path(path);
> @@ -7913,6 +7916,14 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  		devid = key.objectid;
>  		physical_offset = key.offset;
>  
> +		/*
> +		 * previous device verification is done, update its free dev
> +		 * extent hint
> +		 */
> +		if (device && devid != device->devid)
> +			btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +				device->disk_total_bytes - prev_dev_ext_end);
> +
>  		if (!device || devid != device->devid) {
>  			device = btrfs_find_device(fs_info, devid, NULL, NULL);
>  			if (!device) {
> @@ -7940,6 +7951,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  					    physical_offset, physical_len);
>  		if (ret < 0)
>  			goto out;
> +
> +		btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +				physical_offset - prev_dev_ext_end);
> +
>  		prev_devid = devid;
>  		prev_dev_ext_end = physical_offset + physical_len;
>  
> @@ -7951,6 +7966,8 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  			break;
>  		}
>  	}
> +	btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +			device->disk_total_bytes - prev_dev_ext_end);
>  
>  	/* Ensure all chunks have corresponding dev extents */
>  	ret = verify_chunk_dev_extent_mapping(fs_info);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index ed806649a473..00f7ef72466f 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -108,6 +108,14 @@ struct btrfs_device {
>  
>  	/* bytes used on the current transaction */
>  	u64 commit_bytes_used;
> +
> +	/*
> +	 * hint about where the first possible free dev extent is.
> +	 *
> +	 * u64(-1) means no hint.
> +	 */
> +	u64 hint_free_dev_extent;
> +
>  	/*
>  	 * used to manage the device which is resized
>  	 *
> @@ -569,4 +577,54 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>  int btrfs_bg_type_to_factor(u64 flags);
>  int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>  
> +static inline void btrfs_device_hint_add_free(struct btrfs_device *dev,
> +					      u64 start, u64 len)
> +{
> +	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
> +		return;
> +	if (len < SZ_16M)
> +		return;
> +	if (start > dev->hint_free_dev_extent)
> +		return;
> +	dev->hint_free_dev_extent = start;
> +}
> +
> +static inline void btrfs_device_hint_del_free(struct btrfs_device *dev,
> +					      u64 start, u64 len)
> +{
> +	u64 free_hint = dev->hint_free_dev_extent;
> +
> +	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
> +		return;
> +	/*
> +	 * |<- to be removed ->|
> +	 * 			| free hint
> +	 * Not affecting free hint
> +	 */
> +	if (start + len <= free_hint)
> +		return;
> +	/*
> +	 * |<- to be removed ->|
> +	 * 		| free hint
> +	 * Or
> +	 * 	|<- to be removed ->|
> +	 * | free hint
> +	 * |<-->| Less than 16M
> +	 *
> +	 * Move the hint to the range end
> +	 */
> +	if ((start <= free_hint && start + len > free_hint) ||
> +	    (start > free_hint && free_hint - start < SZ_16M)) {
> +		dev->hint_free_dev_extent = start + len;
> +		return;
> +	}
> +
> +	/*
> +	 * 			|<- to be removed ->|
> +	 * | free hint
> +	 *
> +	 * We still have larger than 16M free space, no need to update
> +	 * free hint
> +	 */
> +}
>  #endif
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call
  2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
  2019-01-30  9:13   ` Nikolay Borisov
@ 2019-01-31  9:38   ` Anand Jain
  1 sibling, 0 replies; 7+ messages in thread
From: Anand Jain @ 2019-01-31  9:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs




On 1/30/19 3:39 PM, Qu Wenruo wrote:

> verify_one_dev_extent() will call btrfs_find_device() for each dev
> extent, this waste some CPU time just searching the devices list.


> Move the search one level up, into the btrfs_verify_dev_extents(), so
> for each device we only call btrfs_find_device() once.


This does not apply on misc-next. Looks like this patch's branch is 
missing the seed fix.

Looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>


> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/volumes.c | 26 ++++++++++++++++----------
>   1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 2576b1a379c9..8e932d7d2fe6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7761,13 +7761,14 @@ static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
>   }
>   
>   static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
> -				 u64 chunk_offset, u64 devid,
> -				 u64 physical_offset, u64 physical_len)
> +				 struct btrfs_device *dev,
> +				 u64 chunk_offset, u64 physical_offset,
> +				 u64 physical_len)
>   {
>   	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
>   	struct extent_map *em;
>   	struct map_lookup *map;
> -	struct btrfs_device *dev;
> +	u64 devid = dev->devid;
>   	u64 stripe_len;
>   	bool found = false;
>   	int ret = 0;
> @@ -7819,12 +7820,6 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
>   	}
>   
>   	/* Make sure no dev extent is beyond device bondary */
> -	dev = btrfs_find_device(fs_info, devid, NULL, NULL);
> -	if (!dev) {
> -		btrfs_err(fs_info, "failed to find devid %llu", devid);
> -		ret = -EUCLEAN;
> -		goto out;
> -	}
>   	if (physical_offset + physical_len > dev->disk_total_bytes) {
>   		btrfs_err(fs_info,
>   "dev extent devid %llu physical offset %llu len %llu is beyond device boundary %llu",
> @@ -7874,6 +7869,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>   {
>   	struct btrfs_path *path;
>   	struct btrfs_root *root = fs_info->dev_root;
> +	struct btrfs_device *device = NULL;
>   	struct btrfs_key key;
>   	u64 prev_devid = 0;
>   	u64 prev_dev_ext_end = 0;
> @@ -7917,6 +7913,16 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>   		devid = key.objectid;
>   		physical_offset = key.offset;
>   
> +		if (!device || devid != device->devid) {
> +			device = btrfs_find_device(fs_info, devid, NULL, NULL);
> +			if (!device) {
> +				btrfs_err(fs_info, "failed to find devid %llu",
> +					  devid);
> +				ret = -EUCLEAN;
> +				goto out;
> +			}
> +		}
> +
>   		dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
>   		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
>   		physical_len = btrfs_dev_extent_length(leaf, dext);
> @@ -7930,7 +7936,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>   			goto out;
>   		}
>   
> -		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
> +		ret = verify_one_dev_extent(fs_info, device, chunk_offset,
>   					    physical_offset, physical_len);
>   		if (ret < 0)
>   			goto out;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] btrfs: Speedup chunk allocation for large fs
  2019-01-30  7:39 [PATCH 0/2] btrfs: Speedup chunk allocation for large fs Qu Wenruo
  2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
  2019-01-30  7:40 ` [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Qu Wenruo
@ 2019-02-08 22:27 ` David Sterba
  2 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2019-02-08 22:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jan 30, 2019 at 03:39:58PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/falloc_speedup
> 
> Which is based on v5.0-rc1 tag, with another enospc debug patch.
> 
> Btrfs falloc can be slower and slower when there are more and more block
> groups.
> 
> One cause of this problem is find_free_dev_extent(), as it always search
> from device offset 0, and if there are thousands existing dev extents
> btrfs will search leaf by leaf until it reaches a free slot.
> 
> This is super slow and inefficient.
> 
> This patchset will introduce a new member,
> btrfs_device::hint_free_dev_extent to give some hint for
> find_free_dev_extent().
> 
> The full cause analyse and benchmark can be found in the 2nd patch.
> 
> Qu Wenruo (2):
>   btrfs: Don't search devid for every verify_one_dev_extent() call
>   btrfs: Introduce free dev extent hint to speed up chunk allocation

fstests complain. It's the 2G VM, all tests up to that one are fine then it
goes donwhill and logs are flooded.

btrfs/161               [20:53:28][ 4057.174855] run fstests btrfs/161 at 2019-02-08 20:53:28
[ 4057.318947] BTRFS info (device vda): disk space caching is enabled
[ 4057.323914] BTRFS info (device vda): has skinny extents
[ 4057.515090] BTRFS: device fsid 8760735f-f548-42ff-a0c4-9fa470681954 devid 1 transid 5 /dev/vdb
[ 4057.530213] BTRFS info (device vdb): disk space caching is enabled
[ 4057.532582] BTRFS info (device vdb): has skinny extents
[ 4057.534353] BTRFS info (device vdb): flagging fs with big metadata feature
[ 4057.538604] BTRFS info (device vdb): checking UUID tree
[ 4057.626129] BTRFS info (device vdb): disk space caching is enabled
[ 4057.628347] BTRFS info (device vdb): has skinny extents
[ 4057.661631] ------------[ cut here ]------------
[ 4057.663927] BTRFS: Transaction aborted (error -28)
[ 4057.666545] WARNING: CPU: 1 PID: 3034 at fs/btrfs/volumes.c:2715 btrfs_init_new_device+0x12bd/0x1350 [btrfs]
[ 4057.671389] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_flakey dm_mod loop [last unloaded: libcrc32c]
[ 4057.675003] CPU: 1 PID: 3034 Comm: btrfs Not tainted 5.0.0-rc5-default+ #460
[ 4057.676487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 4057.678239] RIP: 0010:btrfs_init_new_device+0x12bd/0x1350 [btrfs]
[ 4057.684383] RSP: 0018:ffffb8bf0581bc70 EFLAGS: 00010282
[ 4057.685866] RAX: 0000000000000000 RBX: ffff9ce1fc874000 RCX: 0000000000000000
[ 4057.687677] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa30c5345
[ 4057.689126] RBP: ffffb8bf0581bd90 R08: 0000000000000001 R09: 0000000000000000
[ 4057.690770] R10: 0000000000000000 R11: ffffb8bf0581bb18 R12: 00000000ffffffe4
[ 4057.692258] R13: ffff9ce1f8278000 R14: ffff9ce1fc874980 R15: ffff9ce18afa0800
[ 4057.693732] FS:  00007f94294ec8c0(0000) GS:ffff9ce1fd600000(0000) knlGS:0000000000000000
[ 4057.695754] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4057.697206] CR2: 00007ffcf7a96fd0 CR3: 0000000037ffd000 CR4: 00000000000006e0
[ 4057.699078] Call Trace:
[ 4057.699832]  btrfs_ioctl+0x24ae/0x2e10 [btrfs]
[ 4057.700962]  ? _copy_to_user+0x5e/0x70
[ 4057.701684]  ? cp_new_stat+0x12c/0x160
[ 4057.702258]  ? do_vfs_ioctl+0xa2/0x6d0
[ 4057.702823]  do_vfs_ioctl+0xa2/0x6d0
[ 4057.703386]  ? __do_sys_newlstat+0x48/0x70
[ 4057.704202]  ksys_ioctl+0x3a/0x70
[ 4057.704784]  __x64_sys_ioctl+0x16/0x20
[ 4057.705371]  do_syscall_64+0x54/0x180
[ 4057.705929]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 4057.706633] RIP: 0033:0x7f94295e1a97
[ 4057.711767] RSP: 002b:00007ffcf7a9c508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4057.713815] RAX: ffffffffffffffda RBX: 00007ffcf7a9d6c8 RCX: 00007f94295e1a97
[ 4057.715370] RDX: 00007ffcf7a9c540 RSI: 000000005000940a RDI: 0000000000000003
[ 4057.717266] RBP: 0000000000000001 R08: 000000000000ffff R09: 0000000000000001
[ 4057.719098] R10: 0000556403361a00 R11: 0000000000000246 R12: 0000000000000004
[ 4057.720925] R13: 00007ffcf7a9d6d0 R14: 0000000000000000 R15: 0000556403360a20
[ 4057.722731] irq event stamp: 0
[ 4057.723632] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[ 4057.725718] hardirqs last disabled at (0): [<ffffffffa305c84b>] copy_process.part.72+0x84b/0x1df0
[ 4057.728217] softirqs last  enabled at (0): [<ffffffffa305c84b>] copy_process.part.72+0x84b/0x1df0
[ 4057.730514] softirqs last disabled at (0): [<0000000000000000>]           (null)
[ 4057.732558] ---[ end trace d75e6b78fb58b18b ]---
[ 4057.733808] BTRFS warning (device vdb): btrfs_init_new_device:2715: Aborting unused transaction(No space left).
[failed, exit status 1][ 4057.776999] assertion failed: nr_devices, file: fs/btrfs/super.c, line: 1918
[ 4057.781086] ------------[ cut here ]------------
[ 4057.783603] kernel BUG at fs/btrfs/ctree.h:3530!
[ 4057.786148] invalid opcode: 0000 [#1] PREEMPT SMP
[ 4057.788739] CPU: 1 PID: 3039 Comm: umount Tainted: G        W         5.0.0-rc5-default+ #460
[ 4057.792391] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 4057.795704] RIP: 0010:btrfs_statfs.cold.68+0x1f/0x21 [btrfs]
[ 4057.801880] RSP: 0018:ffffb8bf05843dc0 EFLAGS: 00010246
[ 4057.802703] RAX: 0000000000000040 RBX: ffff9ce1fc8742d0 RCX: 0000000000000000
[ 4057.803919] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa30c5345
[ 4057.805846] RBP: ffffb8bf05843e38 R08: 0000000000000001 R09: 0000000000000000
[ 4057.807157] R10: 0000000000000000 R11: ffffb8bf05843c70 R12: ffffb8bf05843eb0
[ 4057.808932] R13: ffff9ce18afa0800 R14: ffff9ce1fc874000 R15: 0000000000000001
[ 4057.810237] FS:  00007fe011459fc0(0000) GS:ffff9ce1fd600000(0000) knlGS:0000000000000000
[ 4057.811801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4057.813081] CR2: 00007fe0117985d0 CR3: 000000004c613000 CR4: 00000000000006e0
[ 4057.814549] Call Trace:
[ 4057.815171]  statfs_by_dentry+0x57/0x80
[ 4057.816129]  vfs_statfs+0x16/0xc0
[ 4057.816916]  user_statfs+0x54/0xa0
[ 4057.817782]  __do_sys_statfs+0x20/0x50
[ 4057.818678]  do_syscall_64+0x54/0x180
[ 4057.819603]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 4057.820814] RIP: 0033:0x7fe0116945e7
[ 4057.821745] Code: 44 00 00 48 8b 05 b1 f8 0c 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 f8 0c 00 f7 d8 64 89 01 48
[ 4057.826876] RSP: 002b:00007ffeea10e7e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[ 4057.828936] RAX: ffffffffffffffda RBX: 000055d89a521970 RCX: 00007fe0116945e7
[ 4057.830023] RDX: 0000000000000001 RSI: 00007ffeea10e820 RDI: 000055d89a521b80
[ 4057.830968] RBP: 00007ffeea10e820 R08: 0000000000000000 R09: 0000000000000000
[ 4057.832374] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d89a521b80
[ 4057.833827] R13: 0000000000000000 R14: 00007fe0117c91c4 R15: 00007ffeea10fc00
[ 4057.834781] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_flakey dm_mod loop [last unloaded: libcrc32c]
[ 4057.837089] ---[ end trace d75e6b78fb58b18c ]---
[ 4057.838144] RIP: 0010:btrfs_statfs.cold.68+0x1f/0x21 [btrfs]
[ 4057.843474] RSP: 0018:ffffb8bf05843dc0 EFLAGS: 00010246
[ 4057.844636] RAX: 0000000000000040 RBX: ffff9ce1fc8742d0 RCX: 0000000000000000
[ 4057.846229] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa30c5345
[ 4057.848112] RBP: ffffb8bf05843e38 R08: 0000000000000001 R09: 0000000000000000
[ 4057.849874] R10: 0000000000000000 R11: ffffb8bf05843c70 R12: ffffb8bf05843eb0
[ 4057.851783] R13: ffff9ce18afa0800 R14: ffff9ce1fc874000 R15: 0000000000000001
[ 4057.853636] FS:  00007fe011459fc0(0000) GS:ffff9ce1fd600000(0000) knlGS:0000000000000000
[ 4057.854943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4057.855973] CR2: 00007fe0117985d0 CR3: 000000004c613000 CR4: 00000000000006e0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30  7:39 [PATCH 0/2] btrfs: Speedup chunk allocation for large fs Qu Wenruo
2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
2019-01-30  9:13   ` Nikolay Borisov
2019-01-31  9:38   ` Anand Jain
2019-01-30  7:40 ` [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Qu Wenruo
2019-01-31  2:38   ` Qu Wenruo
2019-02-08 22:27 ` [PATCH 0/2] btrfs: Speedup chunk allocation for large fs David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox