All of lore.kernel.org
 help / color / mirror / Atom feed
From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 07/12] btrfs: improve batch deletion of delayed dir index items
Date: Tue, 31 May 2022 16:06:38 +0100	[thread overview]
Message-ID: <ddec7fd521fe5b0158241a2e111a54bab253f6d3.1654009356.git.fdmanana@suse.com> (raw)
In-Reply-To: <cover.1654009356.git.fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

Currently we group delayed dir index items for deletion in a single batch
(single btree operation) as long as they all exist in the same leaf and as
long as their keys are sequential in the key space. For example if we have
a leaf that has dir index items with offsets:

    2, 3, 4, 6, 7, 10

And we have delayed dir index items for deleting all these indexes, and
no delayed items for any other index keys in between, then we end up
deleting in 3 batches:

1) First batch for indexes 2, 3 and 4;
2) Second batch for indexes 6 and 7;
3) Third batch for index 10.

This is a waste because we can delete all the index keys in a single
batch. What matters is that each consecutive delayed index key matches
each consecutive dir index key in a leaf.

So update the logic at btrfs_batch_delete_items() to check only for a
key match between delayed dir index items and dir index items in a leaf.
Also avoid the useless first iteration on comparing the key of the
first slot to delete with the key of the first delayed item, as it's
silly since they always match, as the delayed item's key was used for
the btree search that gaves us the path we have.

This is more efficient and reduces runtime of running delayed items, as
well as lock contention on the subvolume's tree.

For example, the following test script:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  NUM_FILES=1000

  mkdir $MNT/testdir
  for ((i = 1; i <= $NUM_FILES; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  # Now delete every other file, to create gaps in the dir index keys.
  for ((i = 1; i <= $NUM_FILES; i += 2)); do
      rm -f $MNT/testdir/file_$i
  done

  # Sync to force any delayed items to be flushed to the tree.
  sync

  start=$(date +%s%N)
  rm -fr $MNT/testdir
  end=$(date +%s%N)
  dur=$(( (end - start) / 1000000 ))

  echo -e "\nrm -fr took $dur milliseconds"

  umount $MNT

Running that test script while having the following bpftrace script
running in another shell:

  $ cat bpf-measure.sh
  #!/usr/bin/bpftrace

  /* Add 'noinline' to btrfs_delete_delayed_items()'s definition. */
  k:btrfs_delete_delayed_items
  {
      @start_delete_delayed_items[tid] = nsecs;
  }

  k:btrfs_del_items
  /@start_delete_delayed_items[tid]/
  {
      @delete_batches = count();
  }

  kr:btrfs_delete_delayed_items
  /@start_delete_delayed_items[tid]/
  {
      $dur = (nsecs - @start_delete_delayed_items[tid]) / 1000;
      @btrfs_delete_delayed_items_total_time = sum($dur);
      delete(@start_delete_delayed_items[tid]);
  }

Before this change:

@btrfs_delete_delayed_items_total_time: 9563
@delete_batches: 1001

After this change:

@btrfs_delete_delayed_items_total_time: 7328
@delete_batches: 509

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-inode.c | 60 +++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 0125586fd565..74c806d3ab2a 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -791,68 +791,58 @@ static int btrfs_batch_delete_items(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_delayed_item *curr, *next;
 	struct extent_buffer *leaf = path->nodes[0];
-	struct btrfs_key key;
-	struct list_head head;
-	int nitems, i, last_item;
-	int ret = 0;
+	LIST_HEAD(batch_list);
+	int nitems, slot, last_slot;
+	int ret;
 
 	ASSERT(leaf != NULL);
 
-	i = path->slots[0];
-	last_item = btrfs_header_nritems(leaf) - 1;
+	slot = path->slots[0];
+	last_slot = btrfs_header_nritems(leaf) - 1;
 	/*
 	 * Our caller always gives us a path pointing to an existing item, so
 	 * this can not happen.
 	 */
-	ASSERT(i <= last_item);
-	if (WARN_ON(i > last_item))
+	ASSERT(slot <= last_slot);
+	if (WARN_ON(slot > last_slot))
 		return -ENOENT;
 
-	next = item;
-	INIT_LIST_HEAD(&head);
-	btrfs_item_key_to_cpu(leaf, &key, i);
-	nitems = 0;
+	nitems = 1;
+	curr = item;
+	list_add_tail(&curr->tree_list, &batch_list);
+
 	/*
-	 * count the number of the dir index items that we can delete in batch
+	 * Keep checking if the next delayed item matches the next item in the
+	 * leaf - if so, we can add it to the batch of items to delete from the
+	 * leaf.
 	 */
-	while (btrfs_comp_cpu_keys(&next->key, &key) == 0) {
-		list_add_tail(&next->tree_list, &head);
-		nitems++;
+	while (slot < last_slot) {
+		struct btrfs_key key;
 
-		curr = next;
 		next = __btrfs_next_delayed_item(curr);
 		if (!next)
 			break;
 
-		if (!btrfs_is_continuous_delayed_item(curr, next))
+		slot++;
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (btrfs_comp_cpu_keys(&next->key, &key) != 0)
 			break;
-
-		i++;
-		if (i > last_item)
-			break;
-		btrfs_item_key_to_cpu(leaf, &key, i);
+		nitems++;
+		curr = next;
+		list_add_tail(&curr->tree_list, &batch_list);
 	}
 
-	/*
-	 * Our caller always gives us a path pointing to an existing item, so
-	 * this can not happen.
-	 */
-	ASSERT(nitems >= 1);
-	if (nitems < 1)
-		return -ENOENT;
-
 	ret = btrfs_del_items(trans, root, path, path->slots[0], nitems);
 	if (ret)
-		goto out;
+		return ret;
 
-	list_for_each_entry_safe(curr, next, &head, tree_list) {
+	list_for_each_entry_safe(curr, next, &batch_list, tree_list) {
 		btrfs_delayed_item_release_metadata(root, curr);
 		list_del(&curr->tree_list);
 		btrfs_release_delayed_item(curr);
 	}
 
-out:
-	return ret;
+	return 0;
 }
 
 static int btrfs_delete_delayed_items(struct btrfs_trans_handle *trans,
-- 
2.35.1


  parent reply	other threads:[~2022-05-31 15:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31 15:06 [PATCH 00/12] btrfs: some improvements and cleanups around delayed items fdmanana
2022-05-31 15:06 ` [PATCH 01/12] btrfs: balance btree dirty pages and delayed items after a rename fdmanana
2022-05-31 15:16   ` Nikolay Borisov
2022-05-31 23:13   ` Anand Jain
2022-05-31 15:06 ` [PATCH 02/12] btrfs: free the path earlier when creating a new inode fdmanana
2022-05-31 15:21   ` Nikolay Borisov
2022-05-31 23:22   ` Anand Jain
2022-06-01  9:34     ` Filipe Manana
2022-06-01 11:11       ` Anand Jain
2022-06-01 11:51         ` David Sterba
2022-05-31 15:06 ` [PATCH 03/12] btrfs: balance btree dirty pages and delayed items after clone and dedupe fdmanana
2022-06-01  0:54   ` Anand Jain
2022-05-31 15:06 ` [PATCH 04/12] btrfs: add assertions when deleting batches of delayed items fdmanana
2022-06-01  1:34   ` Anand Jain
2022-05-31 15:06 ` [PATCH 05/12] btrfs: deal with deletion errors when deleting " fdmanana
2022-06-01  1:44   ` Anand Jain
2022-05-31 15:06 ` [PATCH 06/12] btrfs: refactor the delayed item deletion entry point fdmanana
2022-05-31 15:06 ` fdmanana [this message]
2022-06-02  8:24   ` [PATCH 07/12] btrfs: improve batch deletion of delayed dir index items Nikolay Borisov
2022-06-02  8:55     ` Filipe Manana
2022-05-31 15:06 ` [PATCH 08/12] btrfs: assert that delayed item is a dir index item when adding it fdmanana
2022-05-31 15:06 ` [PATCH 09/12] btrfs: improve batch insertion of delayed dir index items fdmanana
2022-05-31 15:06 ` [PATCH 10/12] btrfs: do not BUG_ON() on failure to reserve metadata for delayed item fdmanana
2022-05-31 15:06 ` [PATCH 11/12] btrfs: set delayed item type when initializing it fdmanana
2022-05-31 15:06 ` [PATCH 12/12] btrfs: reduce amount of reserved metadata for delayed item insertion fdmanana
2022-06-08 15:23   ` [btrfs] 62bd8124e2: WARNING:at_fs/btrfs/block-rsv.c:#btrfs_release_global_block_rsv[btrfs] kernel test robot
2022-06-08 15:23     ` kernel test robot
2022-06-09  9:46     ` Filipe Manana
2022-06-09  9:46       ` Filipe Manana
2022-06-10  1:26       ` Oliver Sang
2022-06-10  1:26         ` Oliver Sang
2022-06-12 14:36         ` Oliver Sang
2022-06-12 14:36           ` Oliver Sang
2022-06-13 10:50           ` Filipe Manana
2022-06-13 10:50             ` Filipe Manana
2022-06-16  2:42             ` Oliver Sang
2022-06-16  2:42               ` Oliver Sang
2022-06-17 10:32               ` Filipe Manana
2022-06-17 10:32                 ` Filipe Manana
2022-06-01 18:35 ` [PATCH 00/12] btrfs: some improvements and cleanups around delayed items David Sterba
2022-06-02  9:34 ` Nikolay Borisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddec7fd521fe5b0158241a2e111a54bab253f6d3.1654009356.git.fdmanana@suse.com \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.