[PATCH 0/3] Rework the worst case calculations for space reservation

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Rework the worst case calculations for space reservation
@ 2019-08-16 15:05 Josef Bacik
  2019-08-16 15:05 ` [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Josef Bacik @ 2019-08-16 15:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have two worst case calculations for space reservation, one that takes into
account splitting at every level when cow'ing down the btree, and another that
doesn't account for splitting at all.  The first is used everywhere, and the
second is used mostly for truncate.

However we also do not split when we're only changing an item, so for example
updating the inode item.  So the name for this helper is wrong, because it can
be used for in-place updates as well as for truncates.  Rename the helpers and
then use the smaller worst-case reservation for inode updates in a few places.

As a rule we still want to use the insert calculation when we can't be sure what
kind of operation is going to end up happening.  But for things like delayed
inode updates and file writes where we know there is going to be an existing
inode item we can use the smaller reservation.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers
  2019-08-16 15:05 [PATCH 0/3] Rework the worst case calculations for space reservation Josef Bacik
@ 2019-08-16 15:05 ` Josef Bacik
  2019-08-19  8:30   ` Nikolay Borisov
  2019-08-16 15:05 ` [PATCH 2/3] btrfs: only reserve metadata_size for inodes Josef Bacik
  2019-08-16 15:06 ` [PATCH 3/3] btrfs: global reserve fallback should use metadata_size Josef Bacik
  2 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-08-16 15:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
it doesn't take into account any splitting at the levels, because
truncate will never split nodes.  However truncate _and_ changing will
never split nodes, so rename btrfs_calc_trunc_metadata_size to
btrfs_calc_metadata_size.  Also btrfs_calc_trans_metadata_size is purely
for inserting items, so rename this to btrfs_calc_insert_metadata_size.
Making these clearer will help when I start using them differently in
upcoming patches.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c      |  4 ++--
 fs/btrfs/ctree.h            | 14 +++++++++-----
 fs/btrfs/delalloc-space.c   |  8 ++++----
 fs/btrfs/delayed-inode.c    |  4 ++--
 fs/btrfs/delayed-ref.c      |  8 ++++----
 fs/btrfs/file.c             |  4 ++--
 fs/btrfs/free-space-cache.c |  4 ++--
 fs/btrfs/inode-map.c        |  2 +-
 fs/btrfs/inode.c            |  6 +++---
 fs/btrfs/props.c            |  2 +-
 fs/btrfs/root-tree.c        |  2 +-
 fs/btrfs/space-info.c       |  2 +-
 fs/btrfs/transaction.c      |  4 ++--
 13 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index afae5c731904..3147e840f839 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3014,8 +3014,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
 	num_devs = get_profile_num_devs(fs_info, type);
 
 	/* num_devs device items to update and 1 chunk item to add or remove */
-	thresh = btrfs_calc_trunc_metadata_size(fs_info, num_devs) +
-		btrfs_calc_trans_metadata_size(fs_info, 1);
+	thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
+		btrfs_calc_insert_metadata_size(fs_info, 1);
 
 	if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
 		btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 85b808e3ea42..f352aa098015 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2450,17 +2450,21 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
 
 u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes);
 
-static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info,
-						 unsigned num_items)
+/*
+ * Use this if we would be adding new items, as we could split nodes as we cow
+ * down the tree.
+ */
+static inline u64 btrfs_calc_insert_metadata_size(struct btrfs_fs_info *fs_info,
+						  unsigned num_items)
 {
 	return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
 }
 
 /*
- * Doing a truncate won't result in new nodes or leaves, just what we need for
- * COW.
+ * Doing a truncate or a modification won't result in new nodes or leaves, just
+ * what we need for COW.
  */
-static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
+static inline u64 btrfs_calc_metadata_size(struct btrfs_fs_info *fs_info,
 						 unsigned num_items)
 {
 	return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * num_items;
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 1fc6bef3ccdf..2412be4a3de2 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -252,12 +252,12 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 	lockdep_assert_held(&inode->lock);
 	outstanding_extents = inode->outstanding_extents;
 	if (outstanding_extents)
-		reserve_size = btrfs_calc_trans_metadata_size(fs_info,
+		reserve_size = btrfs_calc_insert_metadata_size(fs_info,
 						outstanding_extents + 1);
 	csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
 						 inode->csum_bytes);
-	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
-						       csum_leaves);
+	reserve_size += btrfs_calc_insert_metadata_size(fs_info,
+							csum_leaves);
 	/*
 	 * For qgroup rsv, the calculation is very simple:
 	 * account one nodesize for each outstanding extent
@@ -280,7 +280,7 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
 	u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, num_bytes);
 
 	/* We add one for the inode update at finish ordered time */
-	*meta_reserve = btrfs_calc_trans_metadata_size(fs_info,
+	*meta_reserve = btrfs_calc_insert_metadata_size(fs_info,
 						nr_extents + csum_leaves + 1);
 	*qgroup_reserve = nr_extents * fs_info->nodesize;
 }
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6858a05606dd..de87ea7ce84d 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -558,7 +558,7 @@ static int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
 	src_rsv = trans->block_rsv;
 	dst_rsv = &fs_info->delayed_block_rsv;
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+	num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
 
 	/*
 	 * Here we migrate space rsv from transaction rsv, since have already
@@ -612,7 +612,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 	src_rsv = trans->block_rsv;
 	dst_rsv = &fs_info->delayed_block_rsv;
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+	num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
 
 	/*
 	 * btrfs_dirty_inode will update the inode under btrfs_join_transaction
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 9a91d1eb0af4..951a60c740e7 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -79,7 +79,7 @@ int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans)
 void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr)
 {
 	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
-	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, nr);
+	u64 num_bytes = btrfs_calc_insert_metadata_size(fs_info, nr);
 	u64 released = 0;
 
 	released = __btrfs_block_rsv_release(fs_info, block_rsv, num_bytes,
@@ -105,8 +105,8 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans)
 	if (!trans->delayed_ref_updates)
 		return;
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info,
-						   trans->delayed_ref_updates);
+	num_bytes = btrfs_calc_insert_metadata_size(fs_info,
+						    trans->delayed_ref_updates);
 	spin_lock(&delayed_rsv->lock);
 	delayed_rsv->size += num_bytes;
 	delayed_rsv->full = 0;
@@ -174,7 +174,7 @@ int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
 				  enum btrfs_reserve_flush_enum flush)
 {
 	struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
-	u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1);
+	u64 limit = btrfs_calc_insert_metadata_size(fs_info, 1);
 	u64 num_bytes = 0;
 	int ret = -ENOSPC;
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b31991f0f440..1cb694c96500 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2511,7 +2511,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			   struct btrfs_trans_handle **trans_out)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	u64 min_size = btrfs_calc_trans_metadata_size(fs_info, 1);
+	u64 min_size = btrfs_calc_insert_metadata_size(fs_info, 1);
 	u64 ino_size = round_up(inode->i_size, fs_info->sectorsize);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_trans_handle *trans = NULL;
@@ -2530,7 +2530,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 		ret = -ENOMEM;
 		goto out;
 	}
-	rsv->size = btrfs_calc_trans_metadata_size(fs_info, 1);
+	rsv->size = btrfs_calc_insert_metadata_size(fs_info, 1);
 	rsv->failfast = 1;
 
 	/*
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index faaf57a7c289..265dc75f7a7a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -211,8 +211,8 @@ int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
 	int ret;
 
 	/* 1 for slack space, 1 for updating the inode */
-	needed_bytes = btrfs_calc_trunc_metadata_size(fs_info, 1) +
-		btrfs_calc_trans_metadata_size(fs_info, 1);
+	needed_bytes = btrfs_calc_insert_metadata_size(fs_info, 1) +
+		btrfs_calc_metadata_size(fs_info, 1);
 
 	spin_lock(&rsv->lock);
 	if (rsv->reserved < needed_bytes)
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 86031cdfc356..63cad7865d75 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -436,7 +436,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	 * 1 item for free space object
 	 * 3 items for pre-allocation
 	 */
-	trans->bytes_reserved = btrfs_calc_trans_metadata_size(fs_info, 10);
+	trans->bytes_reserved = btrfs_calc_insert_metadata_size(fs_info, 10);
 	ret = btrfs_block_rsv_add(root, trans->block_rsv,
 				  trans->bytes_reserved,
 				  BTRFS_RESERVE_NO_FLUSH);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 612c25aac15c..a884a34131a4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5336,7 +5336,7 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
-	u64 delayed_refs_extra = btrfs_calc_trans_metadata_size(fs_info, 1);
+	u64 delayed_refs_extra = btrfs_calc_insert_metadata_size(fs_info, 1);
 	int failures = 0;
 
 	for (;;) {
@@ -5435,7 +5435,7 @@ void btrfs_evict_inode(struct inode *inode)
 	rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
 	if (!rsv)
 		goto no_delete;
-	rsv->size = btrfs_calc_trunc_metadata_size(fs_info, 1);
+	rsv->size = btrfs_calc_metadata_size(fs_info, 1);
 	rsv->failfast = 1;
 
 	btrfs_i_size_write(BTRFS_I(inode), 0);
@@ -9049,7 +9049,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 	int ret;
 	struct btrfs_trans_handle *trans;
 	u64 mask = fs_info->sectorsize - 1;
-	u64 min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
+	u64 min_size = btrfs_calc_metadata_size(fs_info, 1);
 
 	if (!skip_writeback) {
 		ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask),
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index e0469816c678..1e664e0b59b8 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -362,7 +362,7 @@ static int inherit_props(struct btrfs_trans_handle *trans,
 		 * reservations if we do add more properties in the future.
 		 */
 		if (need_reserve) {
-			num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+			num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
 			ret = btrfs_block_rsv_add(root, trans->block_rsv,
 					num_bytes, BTRFS_RESERVE_NO_FLUSH);
 			if (ret)
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 47733fb55df7..3b17b647d002 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -533,7 +533,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 			return ret;
 	}
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, items);
+	num_bytes = btrfs_calc_insert_metadata_size(fs_info, items);
 	rsv->space_info = btrfs_find_space_info(fs_info,
 					    BTRFS_BLOCK_GROUP_METADATA);
 	ret = btrfs_block_rsv_add(root, rsv, num_bytes,
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c8867a0d9f5a..9c5f81074cd5 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -346,7 +346,7 @@ static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
 	u64 bytes;
 	u64 nr;
 
-	bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+	bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
 	nr = div64_u64(to_reclaim, bytes);
 	if (!nr)
 		nr = 1;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2e3f6778bfa3..f21416d68c2c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -485,7 +485,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 		 * worth of delayed refs updates in this trans handle, and
 		 * refill that amount for whatever is missing in the reserve.
 		 */
-		num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items);
+		num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items);
 		if (delayed_refs_rsv->full == 0) {
 			delayed_refs_bytes = num_bytes;
 			num_bytes <<= 1;
@@ -636,7 +636,7 @@ struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv(
 	if (IS_ERR(trans))
 		return trans;
 
-	num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items);
+	num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items);
 	ret = btrfs_cond_migrate_bytes(fs_info, &fs_info->trans_block_rsv,
 				       num_bytes, min_factor);
 	if (ret) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] btrfs: only reserve metadata_size for inodes
  2019-08-16 15:05 [PATCH 0/3] Rework the worst case calculations for space reservation Josef Bacik
  2019-08-16 15:05 ` [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers Josef Bacik
@ 2019-08-16 15:05 ` Josef Bacik
  2019-08-19  9:17   ` Nikolay Borisov
  2019-08-16 15:06 ` [PATCH 3/3] btrfs: global reserve fallback should use metadata_size Josef Bacik
  2 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-08-16 15:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Historically we reserved worst case for every btree operation, and
generally speaking we want to do that in cases where it could be the
worst case.  However for updating inodes we know the inode items are
already in the tree, so it will only be an update operation and never an
insert operation.  This allows us to always reserve only the
metadata_size amount for inode updates rather than the
insert_metadata_size amount.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delalloc-space.c | 15 ++++++++++++---
 fs/btrfs/delayed-inode.c  |  2 +-
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 2412be4a3de2..b8111ebdc92a 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -251,9 +251,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 
 	lockdep_assert_held(&inode->lock);
 	outstanding_extents = inode->outstanding_extents;
-	if (outstanding_extents)
+
+	/*
+	 * Insert size for the number of outstanding extents, 1 normal size for
+	 * updating the inode.
+	 */
+	if (outstanding_extents) {
 		reserve_size = btrfs_calc_insert_metadata_size(fs_info,
-						outstanding_extents + 1);
+						outstanding_extents);
+		reserve_size += btrfs_calc_metadata_size(fs_info, 1);
+	}
 	csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
 						 inode->csum_bytes);
 	reserve_size += btrfs_calc_insert_metadata_size(fs_info,
@@ -278,10 +285,12 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
 {
 	u64 nr_extents = count_max_extents(num_bytes);
 	u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, num_bytes);
+	u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
 
 	/* We add one for the inode update at finish ordered time */
 	*meta_reserve = btrfs_calc_insert_metadata_size(fs_info,
-						nr_extents + csum_leaves + 1);
+						nr_extents + csum_leaves);
+	*meta_reserve += inode_update;
 	*qgroup_reserve = nr_extents * fs_info->nodesize;
 }
 
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index de87ea7ce84d..9318cf761a07 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -612,7 +612,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 	src_rsv = trans->block_rsv;
 	dst_rsv = &fs_info->delayed_block_rsv;
 
-	num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+	num_bytes = btrfs_calc_metadata_size(fs_info, 1);
 
 	/*
 	 * btrfs_dirty_inode will update the inode under btrfs_join_transaction
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] btrfs: global reserve fallback should use metadata_size
  2019-08-16 15:05 [PATCH 0/3] Rework the worst case calculations for space reservation Josef Bacik
  2019-08-16 15:05 ` [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers Josef Bacik
  2019-08-16 15:05 ` [PATCH 2/3] btrfs: only reserve metadata_size for inodes Josef Bacik
@ 2019-08-16 15:06 ` Josef Bacik
  2019-08-16 15:35   ` Filipe Manana
  2 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-08-16 15:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We only use the global reserve fallback for truncates, so use
calc_metadata_size instead of calc_insert_metadata_size.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f21416d68c2c..cc1a3000a344 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -636,7 +636,7 @@ struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv(
 	if (IS_ERR(trans))
 		return trans;
 
-	num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items);
+	num_bytes = btrfs_calc_metadata_size(fs_info, num_items);
 	ret = btrfs_cond_migrate_bytes(fs_info, &fs_info->trans_block_rsv,
 				       num_bytes, min_factor);
 	if (ret) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] btrfs: global reserve fallback should use metadata_size
  2019-08-16 15:06 ` [PATCH 3/3] btrfs: global reserve fallback should use metadata_size Josef Bacik
@ 2019-08-16 15:35   ` Filipe Manana
  2019-08-16 16:51     ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe Manana @ 2019-08-16 15:35 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Aug 16, 2019 at 4:08 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We only use the global reserve fallback for truncates, so use

For truncates?
I would say only for unlinks, rmdir and removing empty block groups.
Or did some of your previous patches changed that, and I missed it,
and now only truncates use it?

> calc_metadata_size instead of calc_insert_metadata_size.

I wouldn't hurt to be less vague and mention why we do this change (if
this is still used for unlinks/bg removal, we still need to insertion
orphan item, not just remove items).

Thanks.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index f21416d68c2c..cc1a3000a344 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -636,7 +636,7 @@ struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv(
>         if (IS_ERR(trans))
>                 return trans;
>
> -       num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items);
> +       num_bytes = btrfs_calc_metadata_size(fs_info, num_items);
>         ret = btrfs_cond_migrate_bytes(fs_info, &fs_info->trans_block_rsv,
>                                        num_bytes, min_factor);
>         if (ret) {
> --
> 2.21.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] btrfs: global reserve fallback should use metadata_size
  2019-08-16 15:35   ` Filipe Manana
@ 2019-08-16 16:51     ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2019-08-16 16:51 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Fri, Aug 16, 2019 at 04:35:42PM +0100, Filipe Manana wrote:
> On Fri, Aug 16, 2019 at 4:08 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > We only use the global reserve fallback for truncates, so use
> 
> For truncates?
> I would say only for unlinks, rmdir and removing empty block groups.
> Or did some of your previous patches changed that, and I missed it,
> and now only truncates use it?
> 

Sorry I misspoke, but same basic idea, we only use it when we're removing shit,
not adding shit.

> > calc_metadata_size instead of calc_insert_metadata_size.
> 
> I wouldn't hurt to be less vague and mention why we do this change (if
> this is still used for unlinks/bg removal, we still need to insertion
> orphan item, not just remove items).
> 

Argh I misread the orphan stuff, I thought it was earlier/separate from this
usage.  You're right, let me see if I can just axe this altogether.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers
  2019-08-16 15:05 ` [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers Josef Bacik
@ 2019-08-19  8:30   ` Nikolay Borisov
  2019-08-19 12:47     ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Borisov @ 2019-08-19  8:30 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 16.08.19 г. 18:05 ч., Josef Bacik wrote:
> btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
> it doesn't take into account any splitting at the levels, because
> truncate will never split nodes.  However truncate _and_ changing will
> never split nodes, so rename btrfs_calc_trunc_metadata_size to
> btrfs_calc_metadata_size.  Also btrfs_calc_trans_metadata_size is purely
> for inserting items, so rename this to btrfs_calc_insert_metadata_size.
> Making these clearer will help when I start using them differently in
> upcoming patches.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/block-group.c      |  4 ++--
>  fs/btrfs/ctree.h            | 14 +++++++++-----
>  fs/btrfs/delalloc-space.c   |  8 ++++----
>  fs/btrfs/delayed-inode.c    |  4 ++--
>  fs/btrfs/delayed-ref.c      |  8 ++++----
>  fs/btrfs/file.c             |  4 ++--
>  fs/btrfs/free-space-cache.c |  4 ++--
>  fs/btrfs/inode-map.c        |  2 +-
>  fs/btrfs/inode.c            |  6 +++---
>  fs/btrfs/props.c            |  2 +-
>  fs/btrfs/root-tree.c        |  2 +-
>  fs/btrfs/space-info.c       |  2 +-
>  fs/btrfs/transaction.c      |  4 ++--
>  13 files changed, 34 insertions(+), 30 deletions(-)

Reviewed-by: Nikolay Borisov <nborisov@suse.com> see one discussion
point below.
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index afae5c731904..3147e840f839 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3014,8 +3014,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
>  	num_devs = get_profile_num_devs(fs_info, type);
>  
>  	/* num_devs device items to update and 1 chunk item to add or remove */
> -	thresh = btrfs_calc_trunc_metadata_size(fs_info, num_devs) +
> -		btrfs_calc_trans_metadata_size(fs_info, 1);
> +	thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
> +		btrfs_calc_insert_metadata_size(fs_info, 1);
>  
>  	if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
>  		btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 85b808e3ea42..f352aa098015 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2450,17 +2450,21 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
>  
>  u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes);
>  
> -static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info,
> -						 unsigned num_items)
> +/*
> + * Use this if we would be adding new items, as we could split nodes as we cow
> + * down the tree.
> + */
> +static inline u64 btrfs_calc_insert_metadata_size(struct btrfs_fs_info *fs_info,
> +						  unsigned num_items)
>  {
>  	return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
>  }
>  

Isn't assuming that we are going to split on every level of the cow
rather pessimistic, bordering on impossible. Isn't it realistically
possible that we will only ever split up until root->level.

<snip>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] btrfs: only reserve metadata_size for inodes
  2019-08-16 15:05 ` [PATCH 2/3] btrfs: only reserve metadata_size for inodes Josef Bacik
@ 2019-08-19  9:17   ` Nikolay Borisov
  2019-08-19 12:49     ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Borisov @ 2019-08-19  9:17 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 16.08.19 г. 18:05 ч., Josef Bacik wrote:
> Historically we reserved worst case for every btree operation, and
> generally speaking we want to do that in cases where it could be the
> worst case.  However for updating inodes we know the inode items are
> already in the tree, so it will only be an update operation and never an
> insert operation.  This allows us to always reserve only the
> metadata_size amount for inode updates rather than the
> insert_metadata_size amount.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

This alleviates some of the reservation pressure so :

Reviewed-by: Nikolay Borisov <nborisov@suse.com>, however one small nit
below.

> ---
>  fs/btrfs/delalloc-space.c | 15 ++++++++++++---
>  fs/btrfs/delayed-inode.c  |  2 +-
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
> index 2412be4a3de2..b8111ebdc92a 100644
> --- a/fs/btrfs/delalloc-space.c
> +++ b/fs/btrfs/delalloc-space.c
> @@ -251,9 +251,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  
>  	lockdep_assert_held(&inode->lock);
>  	outstanding_extents = inode->outstanding_extents;
> -	if (outstanding_extents)
> +
> +	/*
> +	 * Insert size for the number of outstanding extents, 1 normal size for
> +	 * updating the inode.
> +	 */
> +	if (outstanding_extents) {
>  		reserve_size = btrfs_calc_insert_metadata_size(fs_info,
> -						outstanding_extents + 1);
> +						outstanding_extents);
> +		reserve_size += btrfs_calc_metadata_size(fs_info, 1);
> +	}
>  	csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
>  						 inode->csum_bytes);
>  	reserve_size += btrfs_calc_insert_metadata_size(fs_info,
> @@ -278,10 +285,12 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
>  {
>  	u64 nr_extents = count_max_extents(num_bytes);
>  	u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, num_bytes);
> +	u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
>  
>  	/* We add one for the inode update at finish ordered time */

This comment becomes somewhat outdated and should be removed/reworded.
Perhaps put above *meta_reserve += inode_update.

>  	*meta_reserve = btrfs_calc_insert_metadata_size(fs_info,
> -						nr_extents + csum_leaves + 1);
> +						nr_extents + csum_leaves);
> +	*meta_reserve += inode_update;
>  	*qgroup_reserve = nr_extents * fs_info->nodesize;
>  }
>  
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index de87ea7ce84d..9318cf761a07 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -612,7 +612,7 @@ static int btrfs_delayed_inode_reserve_metadata(
>  	src_rsv = trans->block_rsv;
>  	dst_rsv = &fs_info->delayed_block_rsv;
>  
> -	num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
> +	num_bytes = btrfs_calc_metadata_size(fs_info, 1);
>  
>  	/*
>  	 * btrfs_dirty_inode will update the inode under btrfs_join_transaction
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers
  2019-08-19  8:30   ` Nikolay Borisov
@ 2019-08-19 12:47     ` Josef Bacik
  2019-08-20 17:40       ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-08-19 12:47 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, kernel-team, linux-btrfs

On Mon, Aug 19, 2019 at 11:30:16AM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.08.19 г. 18:05 ч., Josef Bacik wrote:
> > btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
> > it doesn't take into account any splitting at the levels, because
> > truncate will never split nodes.  However truncate _and_ changing will
> > never split nodes, so rename btrfs_calc_trunc_metadata_size to
> > btrfs_calc_metadata_size.  Also btrfs_calc_trans_metadata_size is purely
> > for inserting items, so rename this to btrfs_calc_insert_metadata_size.
> > Making these clearer will help when I start using them differently in
> > upcoming patches.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/block-group.c      |  4 ++--
> >  fs/btrfs/ctree.h            | 14 +++++++++-----
> >  fs/btrfs/delalloc-space.c   |  8 ++++----
> >  fs/btrfs/delayed-inode.c    |  4 ++--
> >  fs/btrfs/delayed-ref.c      |  8 ++++----
> >  fs/btrfs/file.c             |  4 ++--
> >  fs/btrfs/free-space-cache.c |  4 ++--
> >  fs/btrfs/inode-map.c        |  2 +-
> >  fs/btrfs/inode.c            |  6 +++---
> >  fs/btrfs/props.c            |  2 +-
> >  fs/btrfs/root-tree.c        |  2 +-
> >  fs/btrfs/space-info.c       |  2 +-
> >  fs/btrfs/transaction.c      |  4 ++--
> >  13 files changed, 34 insertions(+), 30 deletions(-)
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com> see one discussion
> point below.
> > 
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index afae5c731904..3147e840f839 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -3014,8 +3014,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
> >  	num_devs = get_profile_num_devs(fs_info, type);
> >  
> >  	/* num_devs device items to update and 1 chunk item to add or remove */
> > -	thresh = btrfs_calc_trunc_metadata_size(fs_info, num_devs) +
> > -		btrfs_calc_trans_metadata_size(fs_info, 1);
> > +	thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
> > +		btrfs_calc_insert_metadata_size(fs_info, 1);
> >  
> >  	if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
> >  		btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 85b808e3ea42..f352aa098015 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2450,17 +2450,21 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
> >  
> >  u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes);
> >  
> > -static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info,
> > -						 unsigned num_items)
> > +/*
> > + * Use this if we would be adding new items, as we could split nodes as we cow
> > + * down the tree.
> > + */
> > +static inline u64 btrfs_calc_insert_metadata_size(struct btrfs_fs_info *fs_info,
> > +						  unsigned num_items)
> >  {
> >  	return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
> >  }
> >  
> 
> Isn't assuming that we are going to split on every level of the cow
> rather pessimistic, bordering on impossible. Isn't it realistically
> possible that we will only ever split up until root->level.
> 

I had this discussion with Chris when I messed with this.  We do pass in the
root we intend on messing with so we could very well do this, but it sort of
scares me because maybe we've been using more of our worst case reservations
than I expected.

My plan is to get these last corners worked out, get a few more months of
production testing, and then start experimenting with using the root->level + 1
for the maximum level and see how that goes.  The ramp up from 1 level to 3
level roots happens pretty fast, so I suspect there'll be weird corner cases
going from empty->not empty, but I _think_ we should be fine to make this change
further down the road?

Also I will probably start with _just_ the transaction reservations, and leave
the delayed refs calculations the absolute worst case since those can explode.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] btrfs: only reserve metadata_size for inodes
  2019-08-19  9:17   ` Nikolay Borisov
@ 2019-08-19 12:49     ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2019-08-19 12:49 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, kernel-team, linux-btrfs

On Mon, Aug 19, 2019 at 12:17:07PM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.08.19 г. 18:05 ч., Josef Bacik wrote:
> > Historically we reserved worst case for every btree operation, and
> > generally speaking we want to do that in cases where it could be the
> > worst case.  However for updating inodes we know the inode items are
> > already in the tree, so it will only be an update operation and never an
> > insert operation.  This allows us to always reserve only the
> > metadata_size amount for inode updates rather than the
> > insert_metadata_size amount.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> This alleviates some of the reservation pressure so :
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>, however one small nit
> below.
> 
> > ---
> >  fs/btrfs/delalloc-space.c | 15 ++++++++++++---
> >  fs/btrfs/delayed-inode.c  |  2 +-
> >  2 files changed, 13 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
> > index 2412be4a3de2..b8111ebdc92a 100644
> > --- a/fs/btrfs/delalloc-space.c
> > +++ b/fs/btrfs/delalloc-space.c
> > @@ -251,9 +251,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
> >  
> >  	lockdep_assert_held(&inode->lock);
> >  	outstanding_extents = inode->outstanding_extents;
> > -	if (outstanding_extents)
> > +
> > +	/*
> > +	 * Insert size for the number of outstanding extents, 1 normal size for
> > +	 * updating the inode.
> > +	 */
> > +	if (outstanding_extents) {
> >  		reserve_size = btrfs_calc_insert_metadata_size(fs_info,
> > -						outstanding_extents + 1);
> > +						outstanding_extents);
> > +		reserve_size += btrfs_calc_metadata_size(fs_info, 1);
> > +	}
> >  	csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
> >  						 inode->csum_bytes);
> >  	reserve_size += btrfs_calc_insert_metadata_size(fs_info,
> > @@ -278,10 +285,12 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
> >  {
> >  	u64 nr_extents = count_max_extents(num_bytes);
> >  	u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, num_bytes);
> > +	u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
> >  
> >  	/* We add one for the inode update at finish ordered time */
> 
> This comment becomes somewhat outdated and should be removed/reworded.
> Perhaps put above *meta_reserve += inode_update.
> 

Yup I'll update this so it's fixed in the next version.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers
  2019-08-19 12:47     ` Josef Bacik
@ 2019-08-20 17:40       ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2019-08-20 17:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Nikolay Borisov, kernel-team, linux-btrfs

On Mon, Aug 19, 2019 at 08:47:28AM -0400, Josef Bacik wrote:
> > > +static inline u64 btrfs_calc_insert_metadata_size(struct btrfs_fs_info *fs_info,
> > > +						  unsigned num_items)
> > >  {
> > >  	return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
> > >  }
> > 
> > Isn't assuming that we are going to split on every level of the cow
> > rather pessimistic, bordering on impossible. Isn't it realistically
> > possible that we will only ever split up until root->level.

This is a natural idea, I had it as well long time ago, and your
argument why this does not work is that the level can increase between
the time of reservatino and time of commit. So this would lead to ENOSPC
where not expected and then abort.

> I had this discussion with Chris when I messed with this.  We do pass in the
> root we intend on messing with so we could very well do this, but it sort of
> scares me because maybe we've been using more of our worst case reservations
> than I expected.
> 
> My plan is to get these last corners worked out, get a few more months of
> production testing, and then start experimenting with using the root->level + 1
> for the maximum level and see how that goes.  The ramp up from 1 level to 3
> level roots happens pretty fast, so I suspect there'll be weird corner cases
> going from empty->not empty, but I _think_ we should be fine to make this change
> further down the road?

I had a patch for that and tried to do worst case calculations for each
level, ie. what's the number minimum number of items that would require
splitting and increasing the level.

Starting with level 4, the +1 should be safe, for 5+ safe unless eg. the
commit period is so long that the number of dirty metadata is out of
scale (the period affected by commit=, so not a typical usecase).

So the whole idea should work. The formula I used was

  level = max(4, root->level + 1)

instead of BTRFS_MAX_LEVEL.

For start, setting the minimum to 5 would be IMHO safe enough, the
savings in the over-reservations would be 3*nodesize in most cases, or
~38%. This could help to avoid early ENOSPC.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-20 17:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 15:05 [PATCH 0/3] Rework the worst case calculations for space reservation Josef Bacik
2019-08-16 15:05 ` [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers Josef Bacik
2019-08-19  8:30   ` Nikolay Borisov
2019-08-19 12:47     ` Josef Bacik
2019-08-20 17:40       ` David Sterba
2019-08-16 15:05 ` [PATCH 2/3] btrfs: only reserve metadata_size for inodes Josef Bacik
2019-08-19  9:17   ` Nikolay Borisov
2019-08-19 12:49     ` Josef Bacik
2019-08-16 15:06 ` [PATCH 3/3] btrfs: global reserve fallback should use metadata_size Josef Bacik
2019-08-16 15:35   ` Filipe Manana
2019-08-16 16:51     ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).