All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/23] Change data reservations to use the ticketing infra
@ 2020-06-30 13:58 Josef Bacik
  2020-06-30 13:58 ` [PATCH 01/23] btrfs: change nr to u64 in btrfs_start_delalloc_roots Josef Bacik
                   ` (23 more replies)
  0 siblings, 24 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:58 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We've had two different things in place to reserve data and metadata space,
because generally speaking data is much simpler.  However the data reservations
still suffered from the same issues that plagued metadata reservations, you
could get multiple tasks racing in to get reservations.  This causes problems
with cases like write/delete loops where we should be able to fill up the fs,
delete everything, and go again.  You would sometimes race with a flusher that's
trying to unpin the free space, take it's reservations, and then it would fail.

Fix this by moving the data reservations under the metadata ticketing
infrastructure.  This gets rid of that problem, and simplifies our enospc code
more by consolidating it into a single path.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/23] btrfs: change nr to u64 in btrfs_start_delalloc_roots
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
@ 2020-06-30 13:58 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 02/23] btrfs: remove orig from shrink_delalloc Josef Bacik
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:58 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

We have btrfs_wait_ordered_roots() which takes a u64 for nr, but
btrfs_start_delalloc_roots() that takes an int for nr, which makes using
them in conjunction, especially for something like (u64)-1, annoying and
inconsistent.  Fix btrfs_start_delalloc_roots() to take a u64 for nr and
adjust start_delalloc_inodes() and it's callers appropriately.

This means we've adjusted start_delalloc_inodes() to take a pointer of
nr since we want to preserve the ability for start-delalloc_inodes() to
return an error, so simply make it do the nr adjusting as necessary.

Part of adjusting the callers to this means changing
btrfs_writeback_inodes_sb_nr() to take a u64 for items.  This may be
confusing because it seems unrelated, but the caller of
btrfs_writeback_inodes_sb_nr() already passes in a u64, it's just the
function variable that needs to be changed.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/inode.c       | 27 +++++++++++----------------
 fs/btrfs/ioctl.c       |  2 +-
 fs/btrfs/space-info.c  |  2 +-
 5 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e40eb210670d..93cd1ff878cb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2926,7 +2926,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 			       u32 min_type);
 
 int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
-int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr);
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
 			      unsigned int extra_bits,
 			      struct extent_state **cached_state);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index db93909b25e0..769ac3098880 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -630,7 +630,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	 * flush all outstanding I/O and inode extent mappings before the
 	 * copy operation is declared as being finished
 	 */
-	ret = btrfs_start_delalloc_roots(fs_info, -1);
+	ret = btrfs_start_delalloc_roots(fs_info, U64_MAX);
 	if (ret) {
 		mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
 		return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d301550b9c70..b1ca1e3fcc54 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9394,7 +9394,8 @@ static struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode
  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces them to disk.
  */
-static int start_delalloc_inodes(struct btrfs_root *root, int nr, bool snapshot)
+static int start_delalloc_inodes(struct btrfs_root *root, u64 *nr,
+				 bool snapshot)
 {
 	struct btrfs_inode *binode;
 	struct inode *inode;
@@ -9434,9 +9435,11 @@ static int start_delalloc_inodes(struct btrfs_root *root, int nr, bool snapshot)
 		list_add_tail(&work->list, &works);
 		btrfs_queue_work(root->fs_info->flush_workers,
 				 &work->work);
-		ret++;
-		if (nr != -1 && ret >= nr)
-			goto out;
+		if (*nr != U64_MAX) {
+			(*nr)--;
+			if (*nr == 0)
+				goto out;
+		}
 		cond_resched();
 		spin_lock(&root->delalloc_lock);
 	}
@@ -9461,18 +9464,15 @@ static int start_delalloc_inodes(struct btrfs_root *root, int nr, bool snapshot)
 int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	int ret;
+	u64 nr = U64_MAX;
 
 	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
 		return -EROFS;
 
-	ret = start_delalloc_inodes(root, -1, true);
-	if (ret > 0)
-		ret = 0;
-	return ret;
+	return start_delalloc_inodes(root, &nr, true);
 }
 
-int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr)
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr)
 {
 	struct btrfs_root *root;
 	struct list_head splice;
@@ -9495,15 +9495,10 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr)
 			       &fs_info->delalloc_roots);
 		spin_unlock(&fs_info->delalloc_root_lock);
 
-		ret = start_delalloc_inodes(root, nr, false);
+		ret = start_delalloc_inodes(root, &nr, false);
 		btrfs_put_root(root);
 		if (ret < 0)
 			goto out;
-
-		if (nr != -1) {
-			nr -= ret;
-			WARN_ON(nr < 0);
-		}
 		spin_lock(&fs_info->delalloc_root_lock);
 	}
 	spin_unlock(&fs_info->delalloc_root_lock);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b3e4c632d80c..0ffc4b2d2627 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4829,7 +4829,7 @@ long btrfs_ioctl(struct file *file, unsigned int
 	case BTRFS_IOC_SYNC: {
 		int ret;
 
-		ret = btrfs_start_delalloc_roots(fs_info, -1);
+		ret = btrfs_start_delalloc_roots(fs_info, U64_MAX);
 		if (ret)
 			return ret;
 		ret = btrfs_sync_fs(inode->i_sb, 1);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 41ee88633769..571b87cf1c41 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -477,7 +477,7 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 }
 
 static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info,
-					 unsigned long nr_pages, int nr_items)
+					 unsigned long nr_pages, u64 nr_items)
 {
 	struct super_block *sb = fs_info->sb;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/23] btrfs: remove orig from shrink_delalloc
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
  2020-06-30 13:58 ` [PATCH 01/23] btrfs: change nr to u64 in btrfs_start_delalloc_roots Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 03/23] btrfs: handle U64_MAX for shrink_delalloc Josef Bacik
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

We don't use this anywhere inside of shrink_delalloc, remove it.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 571b87cf1c41..266b9887fda4 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -517,7 +517,7 @@ static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
  * shrink metadata reservation for delalloc
  */
 static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
-			    u64 orig, bool wait_ordered)
+			    bool wait_ordered)
 {
 	struct btrfs_space_info *space_info;
 	struct btrfs_trans_handle *trans;
@@ -742,7 +742,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case FLUSH_DELALLOC:
 	case FLUSH_DELALLOC_WAIT:
-		shrink_delalloc(fs_info, num_bytes * 2, num_bytes,
+		shrink_delalloc(fs_info, num_bytes * 2,
 				state == FLUSH_DELALLOC_WAIT);
 		break;
 	case FLUSH_DELAYED_REFS_NR:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/23] btrfs: handle U64_MAX for shrink_delalloc
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
  2020-06-30 13:58 ` [PATCH 01/23] btrfs: change nr to u64 in btrfs_start_delalloc_roots Josef Bacik
  2020-06-30 13:59 ` [PATCH 02/23] btrfs: remove orig from shrink_delalloc Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 04/23] btrfs: make shrink_delalloc take space_info as an arg Josef Bacik
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

Data allocations are going to want to pass in U64_MAX for flushing
space, adjust shrink_delalloc to handle this properly.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 266b9887fda4..c2a2326f9a79 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -530,8 +530,19 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
 	int loops;
 
 	/* Calc the number of the pages we need flush for space reservation */
-	items = calc_reclaim_items_nr(fs_info, to_reclaim);
-	to_reclaim = items * EXTENT_SIZE_PER_ITEM;
+	if (to_reclaim == U64_MAX) {
+		items = U64_MAX;
+	} else {
+		/*
+		 * to_reclaim is set to however much metadata we need to
+		 * reclaim, but reclaiming that much data doesn't really track
+		 * exactly, so increase the amount to reclaim by 2x in order to
+		 * make sure we're flushing enough delalloc to hopefully reclaim
+		 * some metadata reservations.
+		 */
+		items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2;
+		to_reclaim = items * EXTENT_SIZE_PER_ITEM;
+	}
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
@@ -742,7 +753,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case FLUSH_DELALLOC:
 	case FLUSH_DELALLOC_WAIT:
-		shrink_delalloc(fs_info, num_bytes * 2,
+		shrink_delalloc(fs_info, num_bytes,
 				state == FLUSH_DELALLOC_WAIT);
 		break;
 	case FLUSH_DELAYED_REFS_NR:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/23] btrfs: make shrink_delalloc take space_info as an arg
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (2 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 03/23] btrfs: handle U64_MAX for shrink_delalloc Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 05/23] btrfs: make ALLOC_CHUNK use the space info flags Josef Bacik
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

Currently shrink_delalloc just looks up the metadata space info, but
this won't work if we're trying to reclaim space for data chunks.  We
get the right space_info we want passed into flush_space, so simply pass
that along to shrink_delalloc.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c2a2326f9a79..dda0e9ec68e3 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -516,10 +516,10 @@ static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
 /*
  * shrink metadata reservation for delalloc
  */
-static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
-			    bool wait_ordered)
+static void shrink_delalloc(struct btrfs_fs_info *fs_info,
+			    struct btrfs_space_info *space_info,
+			    u64 to_reclaim, bool wait_ordered)
 {
-	struct btrfs_space_info *space_info;
 	struct btrfs_trans_handle *trans;
 	u64 delalloc_bytes;
 	u64 dio_bytes;
@@ -545,7 +545,6 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
 	}
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
-	space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
 
 	delalloc_bytes = percpu_counter_sum_positive(
 						&fs_info->delalloc_bytes);
@@ -753,7 +752,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case FLUSH_DELALLOC:
 	case FLUSH_DELALLOC_WAIT:
-		shrink_delalloc(fs_info, num_bytes,
+		shrink_delalloc(fs_info, space_info, num_bytes,
 				state == FLUSH_DELALLOC_WAIT);
 		break;
 	case FLUSH_DELAYED_REFS_NR:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/23] btrfs: make ALLOC_CHUNK use the space info flags
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (3 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 04/23] btrfs: make shrink_delalloc take space_info as an arg Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 06/23] btrfs: call btrfs_try_granting_tickets when freeing reserved bytes Josef Bacik
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

We have traditionally used flush_space() to flush metadata space, so
we've been unconditionally using btrfs_metadata_alloc_profile() for our
profile to allocate a chunk.  However if we're going to use this for
data we need to use btrfs_get_alloc_profile() on the space_info we pass
in.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index dda0e9ec68e3..3f0e6fa31937 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -777,7 +777,8 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 			break;
 		}
 		ret = btrfs_chunk_alloc(trans,
-				btrfs_metadata_alloc_profile(fs_info),
+				btrfs_get_alloc_profile(fs_info,
+							space_info->flags),
 				(state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE :
 					CHUNK_ALLOC_FORCE);
 		btrfs_end_transaction(trans);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/23] btrfs: call btrfs_try_granting_tickets when freeing reserved bytes
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (4 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 05/23] btrfs: make ALLOC_CHUNK use the space info flags Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 07/23] btrfs: call btrfs_try_granting_tickets when unpinning anything Josef Bacik
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

We were missing a call to btrfs_try_granting_tickets in
btrfs_free_reserved_bytes, so add it to handle the case where we're able
to satisfy an allocation because we've freed a pending reservation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 09b796a081dd..7b04cf4eeff3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3090,6 +3090,8 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache,
 	if (delalloc)
 		cache->delalloc_bytes -= num_bytes;
 	spin_unlock(&cache->lock);
+
+	btrfs_try_granting_tickets(cache->fs_info, space_info);
 	spin_unlock(&space_info->lock);
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/23] btrfs: call btrfs_try_granting_tickets when unpinning anything
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (5 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 06/23] btrfs: call btrfs_try_granting_tickets when freeing reserved bytes Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 08/23] btrfs: call btrfs_try_granting_tickets when reserving space Josef Bacik
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

When unpinning we were only calling btrfs_try_granting_tickets() if
global_rsv->space_info == space_info, which is problematic because we
use ticketing for SYSTEM chunks, and want to use it for DATA as well.
Fix this by moving this call outside of that if statement.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c0bc35f932bf..4b8c59318f6e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2844,11 +2844,10 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
 				len -= to_add;
 			}
 			spin_unlock(&global_rsv->lock);
-			/* Add to any tickets we may have */
-			if (len)
-				btrfs_try_granting_tickets(fs_info,
-							   space_info);
 		}
+		/* Add to any tickets we may have */
+		if (!readonly && return_free_space && len)
+			btrfs_try_granting_tickets(fs_info, space_info);
 		spin_unlock(&space_info->lock);
 	}
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/23] btrfs: call btrfs_try_granting_tickets when reserving space
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (6 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 07/23] btrfs: call btrfs_try_granting_tickets when unpinning anything Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 09/23] btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc Josef Bacik
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

If we have compression on we could free up more space than we reserved,
and thus be able to make a space reservation.  Add the call for this
scenario.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 7b04cf4eeff3..df0ae5ab49bb 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3057,6 +3057,13 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group *cache,
 						      space_info, -ram_bytes);
 		if (delalloc)
 			cache->delalloc_bytes += num_bytes;
+
+		/*
+		 * Compression can use less space than we reserved, so wake
+		 * tickets if that happens.
+		 */
+		if (num_bytes < ram_bytes)
+			btrfs_try_granting_tickets(cache->fs_info, space_info);
 	}
 	spin_unlock(&cache->lock);
 	spin_unlock(&space_info->lock);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/23] btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (7 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 08/23] btrfs: call btrfs_try_granting_tickets when reserving space Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 10/23] btrfs: use btrfs_start_delalloc_roots in shrink_delalloc Josef Bacik
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov, Johannes Thumshirn

We are going to use the ticket infrastructure for data, so use the
btrfs_space_info_free_bytes_may_use() helper in
btrfs_free_reserved_data_space_noquota() so we get the
try_granting_tickets call when we free our reservation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delalloc-space.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index d05648f882ca..fcd75531272b 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -278,9 +278,7 @@ void btrfs_free_reserved_data_space_noquota(struct inode *inode,
 	ASSERT(IS_ALIGNED(len, fs_info->sectorsize));
 
 	data_sinfo = fs_info->data_sinfo;
-	spin_lock(&data_sinfo->lock);
-	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, -len);
-	spin_unlock(&data_sinfo->lock);
+	btrfs_space_info_free_bytes_may_use(fs_info, data_sinfo, len);
 }
 
 /*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/23] btrfs: use btrfs_start_delalloc_roots in shrink_delalloc
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (8 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 09/23] btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 11/23] btrfs: check tickets after waiting on ordered extents Josef Bacik
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

The original iteration of flushing had us flushing delalloc and then
checking to see if we could make our reservation, thus we were very
careful about how many pages we would flush at once.

But now that everything is async and we satisfy tickets as the space
becomes available we don't have to keep track of any of this, simply try
and flush the number of dirty inodes we may have in order to reclaim
space to make our reservation.  This cleans up our delalloc flushing
significantly.

The async_pages stuff is dropped because btrfs_start_delalloc_roots()
handles the case that we generate async extents for us, so we no longer
require this extra logic.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 55 +------------------------------------------
 1 file changed, 1 insertion(+), 54 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 3f0e6fa31937..1d3af37b0ad0 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -476,28 +476,6 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 	up_read(&info->groups_sem);
 }
 
-static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info,
-					 unsigned long nr_pages, u64 nr_items)
-{
-	struct super_block *sb = fs_info->sb;
-
-	if (down_read_trylock(&sb->s_umount)) {
-		writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE);
-		up_read(&sb->s_umount);
-	} else {
-		/*
-		 * We needn't worry the filesystem going from r/w to r/o though
-		 * we don't acquire ->s_umount mutex, because the filesystem
-		 * should guarantee the delalloc inodes list be empty after
-		 * the filesystem is readonly(all dirty pages are written to
-		 * the disk).
-		 */
-		btrfs_start_delalloc_roots(fs_info, nr_items);
-		if (!current->journal_info)
-			btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1);
-	}
-}
-
 static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
 					u64 to_reclaim)
 {
@@ -523,10 +501,8 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 	struct btrfs_trans_handle *trans;
 	u64 delalloc_bytes;
 	u64 dio_bytes;
-	u64 async_pages;
 	u64 items;
 	long time_left;
-	unsigned long nr_pages;
 	int loops;
 
 	/* Calc the number of the pages we need flush for space reservation */
@@ -567,37 +543,8 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 
 	loops = 0;
 	while ((delalloc_bytes || dio_bytes) && loops < 3) {
-		nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
-
-		/*
-		 * Triggers inode writeback for up to nr_pages. This will invoke
-		 * ->writepages callback and trigger delalloc filling
-		 *  (btrfs_run_delalloc_range()).
-		 */
-		btrfs_writeback_inodes_sb_nr(fs_info, nr_pages, items);
-
-		/*
-		 * We need to wait for the compressed pages to start before
-		 * we continue.
-		 */
-		async_pages = atomic_read(&fs_info->async_delalloc_pages);
-		if (!async_pages)
-			goto skip_async;
-
-		/*
-		 * Calculate how many compressed pages we want to be written
-		 * before we continue. I.e if there are more async pages than we
-		 * require wait_event will wait until nr_pages are written.
-		 */
-		if (async_pages <= nr_pages)
-			async_pages = 0;
-		else
-			async_pages -= nr_pages;
+		btrfs_start_delalloc_roots(fs_info, items);
 
-		wait_event(fs_info->async_submit_wait,
-			   atomic_read(&fs_info->async_delalloc_pages) <=
-			   (int)async_pages);
-skip_async:
 		spin_lock(&space_info->lock);
 		if (list_empty(&space_info->tickets) &&
 		    list_empty(&space_info->priority_tickets)) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/23] btrfs: check tickets after waiting on ordered extents
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (9 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 10/23] btrfs: use btrfs_start_delalloc_roots in shrink_delalloc Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 12/23] btrfs: add flushing states for handling data reservations Josef Bacik
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Right now if the space is free'd up after the ordered extents complete
(which is likely since the reservations are held until they complete),
we would do extra delalloc flushing before we'd notice that we didn't
have any more tickets.  Fix this by moving the tickets check after our
wait_ordered_extents check.

Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1d3af37b0ad0..0b6566a3da61 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -545,14 +545,6 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 	while ((delalloc_bytes || dio_bytes) && loops < 3) {
 		btrfs_start_delalloc_roots(fs_info, items);
 
-		spin_lock(&space_info->lock);
-		if (list_empty(&space_info->tickets) &&
-		    list_empty(&space_info->priority_tickets)) {
-			spin_unlock(&space_info->lock);
-			break;
-		}
-		spin_unlock(&space_info->lock);
-
 		loops++;
 		if (wait_ordered && !trans) {
 			btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);
@@ -561,6 +553,15 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 			if (time_left)
 				break;
 		}
+
+		spin_lock(&space_info->lock);
+		if (list_empty(&space_info->tickets) &&
+		    list_empty(&space_info->priority_tickets)) {
+			spin_unlock(&space_info->lock);
+			break;
+		}
+		spin_unlock(&space_info->lock);
+
 		delalloc_bytes = percpu_counter_sum_positive(
 						&fs_info->delalloc_bytes);
 		dio_bytes = percpu_counter_sum_positive(&fs_info->dio_bytes);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/23] btrfs: add flushing states for handling data reservations
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (10 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 11/23] btrfs: check tickets after waiting on ordered extents Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Currently the way we do data reservations is by seeing if we have enough
space in our space_info.  If we do not and we're a normal inode we'll

1) Attempt to force a chunk allocation until we can't anymore.
2) If that fails we'll flush delalloc, then commit the transaction, then
   run the delayed iputs.

If we are a free space inode we're only allowed to force a chunk
allocation.  In order to use the normal flushing mechanism we need to
encode this into a flush state array for normal inodes.  Since both will
start with allocating chunks until the space info is full there is no
need to add this as a flush state, this will be handled specially.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h      | 2 ++
 fs/btrfs/space-info.c | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 93cd1ff878cb..efa72a204b91 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2569,6 +2569,8 @@ enum btrfs_reserve_flush_enum {
 	 */
 	BTRFS_RESERVE_FLUSH_LIMIT,
 	BTRFS_RESERVE_FLUSH_EVICT,
+	BTRFS_RESERVE_FLUSH_DATA,
+	BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE,
 	BTRFS_RESERVE_FLUSH_ALL,
 	BTRFS_RESERVE_FLUSH_ALL_STEAL,
 };
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 0b6566a3da61..e041b1d58e28 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1018,6 +1018,12 @@ static const enum btrfs_flush_state evict_flush_states[] = {
 	COMMIT_TRANS,
 };
 
+static const enum btrfs_flush_state data_flush_states[] = {
+	FLUSH_DELALLOC_WAIT,
+	COMMIT_TRANS,
+	RUN_DELAYED_IPUTS,
+};
+
 static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 				struct btrfs_space_info *space_info,
 				struct reserve_ticket *ticket,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (11 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 12/23] btrfs: add flushing states for handling data reservations Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-07-07 14:39   ` Nikolay Borisov
  2020-06-30 13:59 ` [PATCH 14/23] btrfs: add btrfs_reserve_data_bytes and use it Josef Bacik
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy it's reservation before deciding to commit the
transaction for the 3rd and final time.

Encode this logic into may_commit_transaction().  In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.

This patch exists soley to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index e041b1d58e28..fb63ddc31540 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -579,21 +579,33 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info)
+				  struct btrfs_space_info *space_info,
+				  u64 bytes_needed)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
 	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
 	u64 cur_free_bytes = 0;
+	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
+	/*
+	 * If we are data and have passed in U64_MAX we just want to
+	 * unconditionally commit the transaction to match the previous data
+	 * flushing behavior.
+	 */
+	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
+	   bytes_needed == U64_MAX) {
+		do_commit = true;
+		goto check_pinned;
+	}
+
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -607,7 +619,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -618,6 +630,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
+check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -627,7 +640,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (do_commit ||
+	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -743,7 +757,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info);
+		ret = may_commit_transaction(fs_info, space_info, num_bytes);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/23] btrfs: add btrfs_reserve_data_bytes and use it
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (12 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 15/23] btrfs: use ticketing for data space reservations Josef Bacik
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Create a new function btrfs_reserve_data_bytes() in order to handle data
reservations.  This uses the new flush types and flush states to handle
making data reservations.

This patch specifically does not change any functionality, and is
purposefully not cleaned up in order to make bisection easier for the
future patches.  The new helper is identical to the old helper in how it
handles data reservations.  We first try to force a chunk allocation,
and then we run through the flush states all at once and in the same
order that they were done with the old helper.

Subsequent patches will clean this up and change the behavior of the
flushing, and it is important to keep those changes separate so we can
easily bisect down to the patch that caused the regression, rather than
the patch that made us start using the new infrastructure.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delalloc-space.c | 119 ++------------------------------------
 fs/btrfs/space-info.c     |  92 +++++++++++++++++++++++++++++
 fs/btrfs/space-info.h     |   2 +
 3 files changed, 98 insertions(+), 115 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index fcd75531272b..52e84552d662 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -115,126 +115,15 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 {
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
-	u64 used;
-	int ret = 0;
-	int need_commit = 2;
-	int have_pinned_space;
+	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_DATA;
 
 	/* Make sure bytes are sectorsize aligned */
 	bytes = ALIGN(bytes, fs_info->sectorsize);
 
-	if (btrfs_is_free_space_inode(inode)) {
-		need_commit = 0;
-		ASSERT(current->journal_info);
-	}
-
-again:
-	/* Make sure we have enough space to handle the data first */
-	spin_lock(&data_sinfo->lock);
-	used = btrfs_space_info_used(data_sinfo, true);
-
-	if (used + bytes > data_sinfo->total_bytes) {
-		struct btrfs_trans_handle *trans;
-
-		/*
-		 * If we don't have enough free bytes in this space then we need
-		 * to alloc a new chunk.
-		 */
-		if (!data_sinfo->full) {
-			u64 alloc_target;
-
-			data_sinfo->force_alloc = CHUNK_ALLOC_FORCE;
-			spin_unlock(&data_sinfo->lock);
-
-			alloc_target = btrfs_data_alloc_profile(fs_info);
-			/*
-			 * It is ugly that we don't call nolock join
-			 * transaction for the free space inode case here.
-			 * But it is safe because we only do the data space
-			 * reservation for the free space cache in the
-			 * transaction context, the common join transaction
-			 * just increase the counter of the current transaction
-			 * handler, doesn't try to acquire the trans_lock of
-			 * the fs.
-			 */
-			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
-				return PTR_ERR(trans);
-
-			ret = btrfs_chunk_alloc(trans, alloc_target,
-						CHUNK_ALLOC_NO_FORCE);
-			btrfs_end_transaction(trans);
-			if (ret < 0) {
-				if (ret != -ENOSPC)
-					return ret;
-				else {
-					have_pinned_space = 1;
-					goto commit_trans;
-				}
-			}
-
-			goto again;
-		}
-
-		/*
-		 * If we don't have enough pinned space to deal with this
-		 * allocation, and no removed chunk in current transaction,
-		 * don't bother committing the transaction.
-		 */
-		have_pinned_space = __percpu_counter_compare(
-			&data_sinfo->total_bytes_pinned,
-			used + bytes - data_sinfo->total_bytes,
-			BTRFS_TOTAL_BYTES_PINNED_BATCH);
-		spin_unlock(&data_sinfo->lock);
-
-		/* Commit the current transaction and try again */
-commit_trans:
-		if (need_commit) {
-			need_commit--;
-
-			if (need_commit > 0) {
-				btrfs_start_delalloc_roots(fs_info, -1);
-				btrfs_wait_ordered_roots(fs_info, U64_MAX, 0,
-							 (u64)-1);
-			}
-
-			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
-				return PTR_ERR(trans);
-			if (have_pinned_space >= 0 ||
-			    test_bit(BTRFS_TRANS_HAVE_FREE_BGS,
-				     &trans->transaction->flags) ||
-			    need_commit > 0) {
-				ret = btrfs_commit_transaction(trans);
-				if (ret)
-					return ret;
-				/*
-				 * The cleaner kthread might still be doing iput
-				 * operations. Wait for it to finish so that
-				 * more space is released.  We don't need to
-				 * explicitly run the delayed iputs here because
-				 * the commit_transaction would have woken up
-				 * the cleaner.
-				 */
-				ret = btrfs_wait_on_delayed_iputs(fs_info);
-				if (ret)
-					return ret;
-				goto again;
-			} else {
-				btrfs_end_transaction(trans);
-			}
-		}
-
-		trace_btrfs_space_reservation(fs_info,
-					      "space_info:enospc",
-					      data_sinfo->flags, bytes, 1);
-		return -ENOSPC;
-	}
-	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
-	spin_unlock(&data_sinfo->lock);
+	if (btrfs_is_free_space_inode(inode))
+		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
 
-	return 0;
+	return btrfs_reserve_data_bytes(fs_info, bytes, flush);
 }
 
 int btrfs_check_data_free_space(struct inode *inode,
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index fb63ddc31540..799ee6090693 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1327,3 +1327,95 @@ int btrfs_reserve_metadata_bytes(struct btrfs_root *root,
 	}
 	return ret;
 }
+
+/**
+ * btrfs_reserve_data_bytes - try to reserve data bytes for an allocation
+ * @root - the root we are allocating for
+ * @bytes - the number of bytes we need
+ * @flush - how we are allowed to flush
+ *
+ * This will reserve bytes from the data space info.  If there is not enough
+ * space then we will attempt to flush space as specified ty flush.
+ */
+int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+			     enum btrfs_reserve_flush_enum flush)
+{
+	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+	const enum btrfs_flush_state *states = NULL;
+	u64 used;
+	int states_nr = 0;
+	int commit_cycles = 2;
+	int ret = -ENOSPC;
+
+	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
+
+	if (flush == BTRFS_RESERVE_FLUSH_DATA) {
+		states = data_flush_states;
+		states_nr = ARRAY_SIZE(data_flush_states);
+	}
+
+	spin_lock(&data_sinfo->lock);
+again:
+	used = btrfs_space_info_used(data_sinfo, true);
+
+	if (used + bytes > data_sinfo->total_bytes) {
+		u64 prev_total_bytes = data_sinfo->total_bytes;
+		int flush_state = 0;
+
+		spin_unlock(&data_sinfo->lock);
+
+		/*
+		 * Everybody can force chunk allocation, so try this first to
+		 * see if we can just bail here and make our reservation.
+		 */
+		flush_space(fs_info, data_sinfo, bytes, ALLOC_CHUNK_FORCE);
+		spin_lock(&data_sinfo->lock);
+		if (prev_total_bytes < data_sinfo->total_bytes)
+			goto again;
+		spin_unlock(&data_sinfo->lock);
+
+		/*
+		 * Cycle through the rest of the flushing options for our flush
+		 * type, then try again.
+		 */
+		while (flush_state < states_nr) {
+			u64 flush_bytes = U64_MAX;
+
+			/*
+			 * Previously we unconditionally committed the
+			 * transaction twice before finally checking against
+			 * pinned space before committing the final time.  We
+			 * also skipped flushing delalloc the final pass
+			 * through.
+			 */
+			if (!commit_cycles) {
+				if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
+					flush_state++;
+					continue;
+				}
+				if (states[flush_state] == COMMIT_TRANS)
+					flush_bytes = bytes;
+			}
+
+			flush_space(fs_info, data_sinfo, flush_bytes,
+				    states[flush_state]);
+			flush_state++;
+		}
+
+		if (!commit_cycles)
+			goto out;
+
+		commit_cycles--;
+		spin_lock(&data_sinfo->lock);
+		goto again;
+	}
+	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
+	ret = 0;
+	spin_unlock(&data_sinfo->lock);
+out:
+	if (ret)
+		trace_btrfs_space_reservation(fs_info,
+					      "space_info:enospc",
+					      data_sinfo->flags, bytes, 1);
+	return ret;
+}
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index c3c64019950a..5646393b928c 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -149,5 +149,7 @@ static inline void btrfs_space_info_free_bytes_may_use(
 	btrfs_try_granting_tickets(fs_info, space_info);
 	spin_unlock(&space_info->lock);
 }
+int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
+			     enum btrfs_reserve_flush_enum flush);
 
 #endif /* BTRFS_SPACE_INFO_H */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/23] btrfs: use ticketing for data space reservations
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (13 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 14/23] btrfs: add btrfs_reserve_data_bytes and use it Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-07-07 14:46   ` Nikolay Borisov
  2020-06-30 13:59 ` [PATCH 16/23] btrfs: serialize data reservations if we are flushing Josef Bacik
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Now that we have all the infrastructure in place, use the ticketing
infrastructure to make data allocations.  This still maintains the exact
same flushing behavior, but now we're using tickets to get our
reservations satisfied.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 125 ++++++++++++++++++++++--------------------
 1 file changed, 67 insertions(+), 58 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 799ee6090693..ee4747917b81 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1068,6 +1068,54 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 	} while (flush_state < states_nr);
 }
 
+static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
+					struct btrfs_space_info *space_info,
+					struct reserve_ticket *ticket,
+					const enum btrfs_flush_state *states,
+					int states_nr)
+{
+	int flush_state = 0;
+	int commit_cycles = 2;
+
+	while (!space_info->full) {
+		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
+		spin_lock(&space_info->lock);
+		if (ticket->bytes == 0) {
+			spin_unlock(&space_info->lock);
+			return;
+		}
+		spin_unlock(&space_info->lock);
+	}
+again:
+	while (flush_state < states_nr) {
+		u64 flush_bytes = U64_MAX;
+
+		if (!commit_cycles) {
+			if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
+				flush_state++;
+				continue;
+			}
+			if (states[flush_state] == COMMIT_TRANS)
+				flush_bytes = ticket->bytes;
+		}
+
+		flush_space(fs_info, space_info, flush_bytes,
+			    states[flush_state]);
+		spin_lock(&space_info->lock);
+		if (ticket->bytes == 0) {
+			spin_unlock(&space_info->lock);
+			return;
+		}
+		spin_unlock(&space_info->lock);
+		flush_state++;
+	}
+	if (commit_cycles) {
+		commit_cycles--;
+		flush_state = 0;
+		goto again;
+	}
+}
+
 static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
 				struct btrfs_space_info *space_info,
 				struct reserve_ticket *ticket)
@@ -1134,6 +1182,15 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 						evict_flush_states,
 						ARRAY_SIZE(evict_flush_states));
 		break;
+	case BTRFS_RESERVE_FLUSH_DATA:
+		priority_reclaim_data_space(fs_info, space_info, ticket,
+					data_flush_states,
+					ARRAY_SIZE(data_flush_states));
+		break;
+	case BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE:
+		priority_reclaim_data_space(fs_info, space_info, ticket,
+					    NULL, 0);
+		break;
 	default:
 		ASSERT(0);
 		break;
@@ -1341,78 +1398,30 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 			     enum btrfs_reserve_flush_enum flush)
 {
 	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
-	const enum btrfs_flush_state *states = NULL;
 	u64 used;
-	int states_nr = 0;
-	int commit_cycles = 2;
 	int ret = -ENOSPC;
 
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
-	if (flush == BTRFS_RESERVE_FLUSH_DATA) {
-		states = data_flush_states;
-		states_nr = ARRAY_SIZE(data_flush_states);
-	}
-
 	spin_lock(&data_sinfo->lock);
-again:
 	used = btrfs_space_info_used(data_sinfo, true);
 
 	if (used + bytes > data_sinfo->total_bytes) {
-		u64 prev_total_bytes = data_sinfo->total_bytes;
-		int flush_state = 0;
+		struct reserve_ticket ticket;
 
+		init_waitqueue_head(&ticket.wait);
+		ticket.bytes = bytes;
+		ticket.error = 0;
+		list_add_tail(&ticket.list, &data_sinfo->priority_tickets);
 		spin_unlock(&data_sinfo->lock);
 
-		/*
-		 * Everybody can force chunk allocation, so try this first to
-		 * see if we can just bail here and make our reservation.
-		 */
-		flush_space(fs_info, data_sinfo, bytes, ALLOC_CHUNK_FORCE);
-		spin_lock(&data_sinfo->lock);
-		if (prev_total_bytes < data_sinfo->total_bytes)
-			goto again;
+		ret = handle_reserve_ticket(fs_info, data_sinfo, &ticket,
+					    flush);
+	} else {
+		btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
+		ret = 0;
 		spin_unlock(&data_sinfo->lock);
-
-		/*
-		 * Cycle through the rest of the flushing options for our flush
-		 * type, then try again.
-		 */
-		while (flush_state < states_nr) {
-			u64 flush_bytes = U64_MAX;
-
-			/*
-			 * Previously we unconditionally committed the
-			 * transaction twice before finally checking against
-			 * pinned space before committing the final time.  We
-			 * also skipped flushing delalloc the final pass
-			 * through.
-			 */
-			if (!commit_cycles) {
-				if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
-					flush_state++;
-					continue;
-				}
-				if (states[flush_state] == COMMIT_TRANS)
-					flush_bytes = bytes;
-			}
-
-			flush_space(fs_info, data_sinfo, flush_bytes,
-				    states[flush_state]);
-			flush_state++;
-		}
-
-		if (!commit_cycles)
-			goto out;
-
-		commit_cycles--;
-		spin_lock(&data_sinfo->lock);
-		goto again;
 	}
-	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
-	ret = 0;
-	spin_unlock(&data_sinfo->lock);
-out:
 	if (ret)
 		trace_btrfs_space_reservation(fs_info,
 					      "space_info:enospc",
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/23] btrfs: serialize data reservations if we are flushing
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (14 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 15/23] btrfs: use ticketing for data space reservations Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 17/23] btrfs: use the same helper for data and metadata reservations Josef Bacik
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Nikolay reported a problem where generic/371 would fail sometimes with a
slow drive.  The gist of the test is that we fallocate a file in
parallel with a pwrite of a different file.  These two files combined
are smaller than the file system, but sometimes the pwrite would ENOSPC.

A fair bit of investigation uncovered the fact that the fallocate
workload was racing in and grabbing the free space that the pwrite
workload was trying to free up so it could make its own reservation.
After a few loops of this eventually the pwrite workload would error out
with an ENOSPC.

We've had the same problem with metadata as well, and we serialized all
metadata allocations to satisfy this problem.  This wasn't usually a
problem with data because data reservations are more straightforward,
but obviously could still happen.

Fix this by not allowing reservations to occur if there are any pending
tickets waiting to be satisfied on the space info.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index ee4747917b81..d9a6aefb3a9f 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1400,13 +1400,17 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
 	u64 used;
 	int ret = -ENOSPC;
+	bool pending_tickets;
 
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
 	spin_lock(&data_sinfo->lock);
 	used = btrfs_space_info_used(data_sinfo, true);
+	pending_tickets = !list_empty(&data_sinfo->tickets) ||
+		!list_empty(&data_sinfo->priority_tickets);
 
-	if (used + bytes > data_sinfo->total_bytes) {
+	if (pending_tickets ||
+	    used + bytes > data_sinfo->total_bytes) {
 		struct reserve_ticket ticket;
 
 		init_waitqueue_head(&ticket.wait);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/23] btrfs: use the same helper for data and metadata reservations
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (15 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 16/23] btrfs: serialize data reservations if we are flushing Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 18/23] btrfs: drop the commit_cycles stuff for data reservations Josef Bacik
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Now that data reservations follow the same pattern as metadata
reservations we can simply rename __reserve_metadata_bytes to
__reserve_bytes and use that helper for data reservations.

Things to keep in mind, btrfs_can_overcommit() returns 0 for data,
because we can never overcommit.  We also will never pass in FLUSH_ALL
for data, so we'll simply be added to the priority list and go straight
into handle_reserve_ticket.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 48 +++++++++++++------------------------------
 1 file changed, 14 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index d9a6aefb3a9f..b5a007fab743 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1250,10 +1250,9 @@ static inline bool is_normal_flushing(enum btrfs_reserve_flush_enum flush)
  * regain reservations will be made and this will fail if there is not enough
  * space already.
  */
-static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
-				    struct btrfs_space_info *space_info,
-				    u64 orig_bytes,
-				    enum btrfs_reserve_flush_enum flush)
+static int __reserve_bytes(struct btrfs_fs_info *fs_info,
+			   struct btrfs_space_info *space_info, u64 orig_bytes,
+			   enum btrfs_reserve_flush_enum flush)
 {
 	struct reserve_ticket ticket;
 	u64 used;
@@ -1365,8 +1364,8 @@ int btrfs_reserve_metadata_bytes(struct btrfs_root *root,
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	int ret;
 
-	ret = __reserve_metadata_bytes(fs_info, block_rsv->space_info,
-				       orig_bytes, flush);
+	ret = __reserve_bytes(fs_info, block_rsv->space_info, orig_bytes,
+			      flush);
 	if (ret == -ENOSPC &&
 	    unlikely(root->orphan_cleanup_state == ORPHAN_CLEANUP_STARTED)) {
 		if (block_rsv != global_rsv &&
@@ -1398,37 +1397,18 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 			     enum btrfs_reserve_flush_enum flush)
 {
 	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
-	u64 used;
-	int ret = -ENOSPC;
-	bool pending_tickets;
+	int ret;
 
+	ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
+	       flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
-	spin_lock(&data_sinfo->lock);
-	used = btrfs_space_info_used(data_sinfo, true);
-	pending_tickets = !list_empty(&data_sinfo->tickets) ||
-		!list_empty(&data_sinfo->priority_tickets);
-
-	if (pending_tickets ||
-	    used + bytes > data_sinfo->total_bytes) {
-		struct reserve_ticket ticket;
-
-		init_waitqueue_head(&ticket.wait);
-		ticket.bytes = bytes;
-		ticket.error = 0;
-		list_add_tail(&ticket.list, &data_sinfo->priority_tickets);
-		spin_unlock(&data_sinfo->lock);
-
-		ret = handle_reserve_ticket(fs_info, data_sinfo, &ticket,
-					    flush);
-	} else {
-		btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
-		ret = 0;
-		spin_unlock(&data_sinfo->lock);
-	}
-	if (ret)
-		trace_btrfs_space_reservation(fs_info,
-					      "space_info:enospc",
+	ret = __reserve_bytes(fs_info, data_sinfo, bytes, flush);
+	if (ret == -ENOSPC) {
+		trace_btrfs_space_reservation(fs_info, "space_info:enospc",
 					      data_sinfo->flags, bytes, 1);
+		if (btrfs_test_opt(fs_info, ENOSPC_DEBUG))
+			btrfs_dump_space_info(fs_info, data_sinfo, bytes, 0);
+	}
 	return ret;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 18/23] btrfs: drop the commit_cycles stuff for data reservations
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (16 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 17/23] btrfs: use the same helper for data and metadata reservations Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 19/23] btrfs: don't force commit if we are data Josef Bacik
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

This was an old wart left over from how we previously did data
reservations.  Before we could have people race in and take a
reservation while we were flushing space, so we needed to make sure we
looped a few times before giving up.  Now that we're using the ticketing
infrastructure we don't have to worry about this and can drop the logic
altogether.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 22 ++--------------------
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index b5a007fab743..c88a31210b9a 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1075,7 +1075,6 @@ static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
 					int states_nr)
 {
 	int flush_state = 0;
-	int commit_cycles = 2;
 
 	while (!space_info->full) {
 		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
@@ -1086,21 +1085,9 @@ static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
 		}
 		spin_unlock(&space_info->lock);
 	}
-again:
-	while (flush_state < states_nr) {
-		u64 flush_bytes = U64_MAX;
-
-		if (!commit_cycles) {
-			if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
-				flush_state++;
-				continue;
-			}
-			if (states[flush_state] == COMMIT_TRANS)
-				flush_bytes = ticket->bytes;
-		}
 
-		flush_space(fs_info, space_info, flush_bytes,
-			    states[flush_state]);
+	while (flush_state < states_nr) {
+		flush_space(fs_info, space_info, U64_MAX, states[flush_state]);
 		spin_lock(&space_info->lock);
 		if (ticket->bytes == 0) {
 			spin_unlock(&space_info->lock);
@@ -1109,11 +1096,6 @@ static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
 		spin_unlock(&space_info->lock);
 		flush_state++;
 	}
-	if (commit_cycles) {
-		commit_cycles--;
-		flush_state = 0;
-		goto again;
-	}
 }
 
 static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 19/23] btrfs: don't force commit if we are data
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (17 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 18/23] btrfs: drop the commit_cycles stuff for data reservations Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 20/23] btrfs: run delayed iputs before committing the transaction for data Josef Bacik
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

We used to unconditionally commit the transaction at least 2 times and
then on the 3rd try check against pinned space to make sure committing
the transaction was worth the effort.  This is overkill, we know nobody
is going to steal our reservation, and if we can't make our reservation
with the pinned amount simply bail out.

This also cleans up the passing of bytes_needed to
may_commit_transaction, as that was the thing we added into place in
order to accomplish this behavior.  We no longer need it so remove that
mess.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c88a31210b9a..490dab7779e8 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -579,8 +579,7 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info,
-				  u64 bytes_needed)
+				  struct btrfs_space_info *space_info)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
@@ -588,24 +587,13 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
 	struct btrfs_trans_handle *trans;
 	u64 reclaim_bytes = 0;
+	u64 bytes_needed;
 	u64 cur_free_bytes = 0;
-	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
-	/*
-	 * If we are data and have passed in U64_MAX we just want to
-	 * unconditionally commit the transaction to match the previous data
-	 * flushing behavior.
-	 */
-	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
-	   bytes_needed == U64_MAX) {
-		do_commit = true;
-		goto check_pinned;
-	}
-
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -619,7 +607,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
+	bytes_needed = (ticket) ? ticket->bytes : 0;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -630,7 +618,6 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
-check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -640,8 +627,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (do_commit ||
-	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -757,7 +743,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info, num_bytes);
+		ret = may_commit_transaction(fs_info, space_info);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 20/23] btrfs: run delayed iputs before committing the transaction for data
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (18 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 19/23] btrfs: don't force commit if we are data Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 21/23] btrfs: flush delayed refs when trying to reserve data space Josef Bacik
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Before we were waiting on iputs after we committed the transaction, but
this doesn't really make much sense.  We want to reclaim any space we
may have in order to be more likely to commit the transaction, due to
pinned space being added by running the delayed iputs.  Fix this by
making delayed iputs run before committing the transaction.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 490dab7779e8..ba09085b6122 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1020,8 +1020,8 @@ static const enum btrfs_flush_state evict_flush_states[] = {
 
 static const enum btrfs_flush_state data_flush_states[] = {
 	FLUSH_DELALLOC_WAIT,
-	COMMIT_TRANS,
 	RUN_DELAYED_IPUTS,
+	COMMIT_TRANS,
 };
 
 static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 21/23] btrfs: flush delayed refs when trying to reserve data space
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (19 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 20/23] btrfs: run delayed iputs before committing the transaction for data Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 22/23] btrfs: do async reclaim for data reservations Josef Bacik
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

We can end up with free'd extents in the delayed refs, and thus
may_commit_transaction() may not think we have enough pinned space to
commit the transaction and we'll ENOSPC early.  Handle this by running
the delayed refs in order to make sure pinned is uptodate before we try
to commit the transaction.

Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index ba09085b6122..8b1a5b644d2f 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1021,6 +1021,7 @@ static const enum btrfs_flush_state evict_flush_states[] = {
 static const enum btrfs_flush_state data_flush_states[] = {
 	FLUSH_DELALLOC_WAIT,
 	RUN_DELAYED_IPUTS,
+	FLUSH_DELAYED_REFS,
 	COMMIT_TRANS,
 };
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 22/23] btrfs: do async reclaim for data reservations
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (20 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 21/23] btrfs: flush delayed refs when trying to reserve data space Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-06-30 13:59 ` [PATCH 23/23] btrfs: add a comment explaining the data flush steps Josef Bacik
  2020-07-03 16:30 ` [PATCH 00/23] Change data reservations to use the ticketing infra David Sterba
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Now that we have the data ticketing stuff in place, move normal data
reservations to use an async reclaim helper to satisfy tickets.  Before
we could have multiple tasks race in and both allocate chunks, resulting
in more data chunks than we would necessarily need.  Serializing these
allocations and making a single thread responsible for flushing will
only allocate chunks as needed, as well as cut down on transaction
commits and other flush related activities.

Priority reservations will still work as they have before, simply
trying to allocate a chunk until they can make their reservation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h      |   3 +-
 fs/btrfs/disk-io.c    |   3 +-
 fs/btrfs/space-info.c | 123 ++++++++++++++++++++++++++++++------------
 fs/btrfs/super.c      |   1 +
 4 files changed, 93 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index efa72a204b91..998bda3132fd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -494,7 +494,7 @@ enum btrfs_orphan_cleanup_state {
 	ORPHAN_CLEANUP_DONE	= 2,
 };
 
-void btrfs_init_async_reclaim_work(struct work_struct *work);
+void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info);
 
 /* fs_info */
 struct reloc_control;
@@ -916,6 +916,7 @@ struct btrfs_fs_info {
 
 	/* Used to reclaim the metadata space in the background. */
 	struct work_struct async_reclaim_work;
+	struct work_struct async_data_reclaim_work;
 
 	spinlock_t unused_bgs_lock;
 	struct list_head unused_bgs;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7c07578866f3..0907e02eea54 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2713,7 +2713,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	fs_info->check_integrity_print_mask = 0;
 #endif
 	btrfs_init_balance(fs_info);
-	btrfs_init_async_reclaim_work(&fs_info->async_reclaim_work);
+	btrfs_init_async_reclaim_work(fs_info);
 
 	spin_lock_init(&fs_info->block_group_cache_lock);
 	fs_info->block_group_cache_tree = RB_ROOT;
@@ -4033,6 +4033,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_cleanup_defrag_inodes(fs_info);
 
 	cancel_work_sync(&fs_info->async_reclaim_work);
+	cancel_work_sync(&fs_info->async_data_reclaim_work);
 
 	/* Cancel or finish ongoing discard work */
 	btrfs_discard_cleanup(fs_info);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 8b1a5b644d2f..0a2fdfaa9fe6 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -996,9 +996,83 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 	} while (flush_state <= COMMIT_TRANS);
 }
 
-void btrfs_init_async_reclaim_work(struct work_struct *work)
+static const enum btrfs_flush_state data_flush_states[] = {
+	FLUSH_DELALLOC_WAIT,
+	RUN_DELAYED_IPUTS,
+	FLUSH_DELAYED_REFS,
+	COMMIT_TRANS,
+};
+
+static void btrfs_async_reclaim_data_space(struct work_struct *work)
 {
-	INIT_WORK(work, btrfs_async_reclaim_metadata_space);
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_space_info *space_info;
+	u64 last_tickets_id;
+	int flush_state = 0;
+
+	fs_info = container_of(work, struct btrfs_fs_info,
+			       async_data_reclaim_work);
+	space_info = fs_info->data_sinfo;
+
+	spin_lock(&space_info->lock);
+	if (list_empty(&space_info->tickets)) {
+		space_info->flush = 0;
+		spin_unlock(&space_info->lock);
+		return;
+	}
+	last_tickets_id = space_info->tickets_id;
+	spin_unlock(&space_info->lock);
+
+	while (!space_info->full) {
+		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
+		spin_lock(&space_info->lock);
+		if (list_empty(&space_info->tickets)) {
+			space_info->flush = 0;
+			spin_unlock(&space_info->lock);
+			return;
+		}
+		last_tickets_id = space_info->tickets_id;
+		spin_unlock(&space_info->lock);
+	}
+
+	while (flush_state < ARRAY_SIZE(data_flush_states)) {
+		flush_space(fs_info, space_info, U64_MAX,
+			    data_flush_states[flush_state]);
+		spin_lock(&space_info->lock);
+		if (list_empty(&space_info->tickets)) {
+			space_info->flush = 0;
+			spin_unlock(&space_info->lock);
+			return;
+		}
+
+		if (last_tickets_id == space_info->tickets_id) {
+			flush_state++;
+		} else {
+			last_tickets_id = space_info->tickets_id;
+			flush_state = 0;
+		}
+
+		if (flush_state >= ARRAY_SIZE(data_flush_states)) {
+			if (space_info->full) {
+				if (maybe_fail_all_tickets(fs_info,
+							   space_info))
+					flush_state = 0;
+				else
+					space_info->flush = 0;
+			} else {
+				flush_state = 0;
+			}
+		}
+		spin_unlock(&space_info->lock);
+	}
+}
+
+void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
+{
+	INIT_WORK(&fs_info->async_reclaim_work,
+		  btrfs_async_reclaim_metadata_space);
+	INIT_WORK(&fs_info->async_data_reclaim_work,
+		  btrfs_async_reclaim_data_space);
 }
 
 static const enum btrfs_flush_state priority_flush_states[] = {
@@ -1018,13 +1092,6 @@ static const enum btrfs_flush_state evict_flush_states[] = {
 	COMMIT_TRANS,
 };
 
-static const enum btrfs_flush_state data_flush_states[] = {
-	FLUSH_DELALLOC_WAIT,
-	RUN_DELAYED_IPUTS,
-	FLUSH_DELAYED_REFS,
-	COMMIT_TRANS,
-};
-
 static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 				struct btrfs_space_info *space_info,
 				struct reserve_ticket *ticket,
@@ -1057,12 +1124,8 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 
 static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
 					struct btrfs_space_info *space_info,
-					struct reserve_ticket *ticket,
-					const enum btrfs_flush_state *states,
-					int states_nr)
+					struct reserve_ticket *ticket)
 {
-	int flush_state = 0;
-
 	while (!space_info->full) {
 		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
 		spin_lock(&space_info->lock);
@@ -1072,17 +1135,6 @@ static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
 		}
 		spin_unlock(&space_info->lock);
 	}
-
-	while (flush_state < states_nr) {
-		flush_space(fs_info, space_info, U64_MAX, states[flush_state]);
-		spin_lock(&space_info->lock);
-		if (ticket->bytes == 0) {
-			spin_unlock(&space_info->lock);
-			return;
-		}
-		spin_unlock(&space_info->lock);
-		flush_state++;
-	}
 }
 
 static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
@@ -1137,6 +1189,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 	int ret;
 
 	switch (flush) {
+	case BTRFS_RESERVE_FLUSH_DATA:
 	case BTRFS_RESERVE_FLUSH_ALL:
 	case BTRFS_RESERVE_FLUSH_ALL_STEAL:
 		wait_reserve_ticket(fs_info, space_info, ticket);
@@ -1151,14 +1204,8 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 						evict_flush_states,
 						ARRAY_SIZE(evict_flush_states));
 		break;
-	case BTRFS_RESERVE_FLUSH_DATA:
-		priority_reclaim_data_space(fs_info, space_info, ticket,
-					data_flush_states,
-					ARRAY_SIZE(data_flush_states));
-		break;
 	case BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE:
-		priority_reclaim_data_space(fs_info, space_info, ticket,
-					    NULL, 0);
+		priority_reclaim_data_space(fs_info, space_info, ticket);
 		break;
 	default:
 		ASSERT(0);
@@ -1223,6 +1270,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
 			   struct btrfs_space_info *space_info, u64 orig_bytes,
 			   enum btrfs_reserve_flush_enum flush)
 {
+	struct work_struct *async_work;
 	struct reserve_ticket ticket;
 	u64 used;
 	int ret = 0;
@@ -1231,6 +1279,11 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
 	ASSERT(orig_bytes);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_ALL);
 
+	if (flush == BTRFS_RESERVE_FLUSH_DATA)
+		async_work = &fs_info->async_data_reclaim_work;
+	else
+		async_work = &fs_info->async_reclaim_work;
+
 	spin_lock(&space_info->lock);
 	ret = -ENOSPC;
 	used = btrfs_space_info_used(space_info, true);
@@ -1272,7 +1325,8 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
 		init_waitqueue_head(&ticket.wait);
 		ticket.steal = (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL);
 		if (flush == BTRFS_RESERVE_FLUSH_ALL ||
-		    flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
+		    flush == BTRFS_RESERVE_FLUSH_ALL_STEAL ||
+		    flush == BTRFS_RESERVE_FLUSH_DATA) {
 			list_add_tail(&ticket.list, &space_info->tickets);
 			if (!space_info->flush) {
 				space_info->flush = 1;
@@ -1280,8 +1334,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
 							  space_info->flags,
 							  orig_bytes, flush,
 							  "enospc");
-				queue_work(system_unbound_wq,
-					   &fs_info->async_reclaim_work);
+				queue_work(system_unbound_wq, async_work);
 			}
 		} else {
 			list_add_tail(&ticket.list,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3c9ebd4f2b61..66094f076385 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1880,6 +1880,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		 * the filesystem is busy.
 		 */
 		cancel_work_sync(&fs_info->async_reclaim_work);
+		cancel_work_sync(&fs_info->async_data_reclaim_work);
 
 		btrfs_discard_cleanup(fs_info);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 23/23] btrfs: add a comment explaining the data flush steps
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (21 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 22/23] btrfs: do async reclaim for data reservations Josef Bacik
@ 2020-06-30 13:59 ` Josef Bacik
  2020-07-03 16:30 ` [PATCH 00/23] Change data reservations to use the ticketing infra David Sterba
  23 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-06-30 13:59 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The data flushing steps are not obvious to people other than myself and
Chris.  Write a giant comment explaining the reasoning behind each flush
step for data as well as why it is in that particular order.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 47 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 0a2fdfaa9fe6..818f0a6caacd 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -996,6 +996,53 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 	} while (flush_state <= COMMIT_TRANS);
 }
 
+/*
+ * FLUSH_DELALLOC_WAIT:
+ *   Space is free'd from flushing delalloc in one of two ways.
+ *
+ *   1) compression is on and we allocate less space than we reserved.
+ *   2) We are overwriting existing space.
+ *
+ *   For #1 that extra space is reclaimed as soon as the delalloc pages are
+ *   cow'ed, by way of btrfs_add_reserved_bytes() which adds the actual extent
+ *   length to ->bytes_reserved, and subtracts the reserved space from
+ *   ->bytes_may_use.
+ *
+ *   For #2 this is trickier.  Once the ordered extent runs we will drop the
+ *   extent in the range we are overwriting, which creates a delayed ref for
+ *   that freed extent.  This however is not reclaimed until the transaction
+ *   commits, thus the next stages.
+ *
+ * RUN_DELAYED_IPUTS
+ *   If we are freeing inodes, we want to make sure all delayed iputs have
+ *   completed, because they could have been on an inode with i_nlink == 0, and
+ *   thus have been trunated and free'd up space.  But again this space is not
+ *   immediately re-usable, it comes in the form of a delayed ref, which must be
+ *   run and then the transaction must be committed.
+ *
+ * FLUSH_DELAYED_REFS
+ *   The above two cases generate delayed refs that will affect
+ *   ->total_bytes_pinned.  However this counter can be inconsistent with
+ *   reality if there are outstanding delayed refs.  This is because we adjust
+ *   the counter based soley on the current set of delayed refs and disregard
+ *   any on-disk state which might include more refs.  So for example, if we
+ *   have an extent with 2 references, but we only drop 1, we'll see that there
+ *   is a negative delayed ref count for the extent and assume that the space
+ *   will be free'd, and thus increase ->total_bytes_pinned.
+ *
+ *   Running the delayed refs gives us the actual real view of what will be
+ *   freed at the transaction commit time.  This stage will not actually free
+ *   space for us, it just makes sure that may_commit_transaction() has all of
+ *   the information it needs to make the right decision.
+ *
+ * COMMIT_TRANS
+ *   This is where we reclaim all of the pinned space generated by the previous
+ *   two stages.  We will not commit the transaction if we don't think we're
+ *   likely to satisfy our request, which means if our current free space +
+ *   total_bytes_pinned < reservation we will not commit.  This is why the
+ *   previous states are actually important, to make sure we know for sure
+ *   whether committing the transaction will allow us to make progress.
+ */
 static const enum btrfs_flush_state data_flush_states[] = {
 	FLUSH_DELALLOC_WAIT,
 	RUN_DELAYED_IPUTS,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/23] Change data reservations to use the ticketing infra
  2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
                   ` (22 preceding siblings ...)
  2020-06-30 13:59 ` [PATCH 23/23] btrfs: add a comment explaining the data flush steps Josef Bacik
@ 2020-07-03 16:30 ` David Sterba
  2020-07-07 15:28   ` Josef Bacik
  23 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2020-07-03 16:30 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jun 30, 2020 at 09:58:58AM -0400, Josef Bacik wrote:
> We've had two different things in place to reserve data and metadata space,
> because generally speaking data is much simpler.  However the data reservations
> still suffered from the same issues that plagued metadata reservations, you
> could get multiple tasks racing in to get reservations.  This causes problems
> with cases like write/delete loops where we should be able to fill up the fs,
> delete everything, and go again.  You would sometimes race with a flusher that's
> trying to unpin the free space, take it's reservations, and then it would fail.
> 
> Fix this by moving the data reservations under the metadata ticketing
> infrastructure.  This gets rid of that problem, and simplifies our enospc code
> more by consolidating it into a single path.  Thanks,

I created a for-next-20200703 snapshot with this branch included, and it
went up to generic/102 and got stuck due to softlockups. This could mean
the machine is overloaded but it usually recovers from that, while not
in this case as it took more than 7 hours before I killed the VM.

There is the dio-iomap so the patchsets could interact in some way, I'll
do more tests next week, this is an early warning that something might
be going on.

generic/102             [02:56:08][11376.253779] run fstests generic/102 at 2020-07-03 02:56:08
[11376.582218] BTRFS info (device vda): disk space caching is enabled
[11376.585713] BTRFS info (device vda): has skinny extents
[11376.692139] BTRFS: device fsid 55b09ca8-60e8-4488-8fa0-3ff05c3e6906 devid 1 transid 5 /dev/vdb scanned by mkfs.btrfs (5196)
[11376.717361] BTRFS info (device vdb): disk space caching is enabled
[11376.720437] BTRFS info (device vdb): has skinny extents
[11376.722974] BTRFS info (device vdb): flagging fs with big metadata feature
[11376.731535] BTRFS info (device vdb): checking UUID tree
[11436.612816] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u8:17:8005]
[11436.615341] Modules linked in: dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_log_writes dm_flakey dm_mod dax btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compres
[11436.621957] irq event stamp: 0
[11436.623062] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[11436.624882] hardirqs last disabled at (0): [<ffffffff9a07fd9b>] copy_process+0x67b/0x1b00
[11436.627545] softirqs last  enabled at (0): [<ffffffff9a07fd9b>] copy_process+0x67b/0x1b00
[11436.630161] softirqs last disabled at (0): [<0000000000000000>] 0x0
[11436.631919] CPU: 1 PID: 8005 Comm: kworker/u8:17 Tainted: G             L    5.8.0-rc3-default+ #1173
[11436.634311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[11436.637193] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[11436.638878] RIP: 0010:__slab_alloc+0x5c/0x70
[11436.644622] RSP: 0018:ffffbc1dc60f7cd0 EFLAGS: 00000246
[11436.645961] RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000120001
[11436.647989] RDX: ffffa2a70b349cc0 RSI: ffffffff9a262774 RDI: ffffffff9a2639ba
[11436.649924] RBP: 0000000000000d40 R08: ffffa2a76d1cf138 R09: 0000000000000001
[11436.652014] R10: 0000000000000000 R11: ffffa2a76d1cec10 R12: ffffa2a76d1cf138
[11436.654134] R13: 00000000ffffffff R14: ffffffffc0149633 R15: ffffdc1dbf802b50
[11436.656186] FS:  0000000000000000(0000) GS:ffffa2a77d800000(0000) knlGS:0000000000000000
[11436.658920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11436.660679] CR2: 00005641be0178b8 CR3: 0000000041011002 CR4: 0000000000160ee0
[11436.662693] Call Trace:
[11436.663642]  ? start_transaction+0x133/0x5e0 [btrfs]
[11436.664991]  kmem_cache_alloc+0x1a2/0x2f0
[11436.666248]  start_transaction+0x133/0x5e0 [btrfs]
[11436.667733]  flush_space+0x315/0x670 [btrfs]
[11436.669116]  btrfs_async_reclaim_data_space+0xd8/0x1a0 [btrfs]
[11436.670853]  process_one_work+0x22c/0x600
[11436.672182]  worker_thread+0x50/0x3b0
[11436.673436]  ? process_one_work+0x600/0x600
[11436.674675]  kthread+0x137/0x150
[11436.675795]  ? kthread_stop+0x2a0/0x2a0
[11436.677099]  ret_from_fork+0x1f/0x30

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-06-30 13:59 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
@ 2020-07-07 14:39   ` Nikolay Borisov
  2020-07-07 14:54     ` Josef Bacik
  2020-07-07 15:01     ` Josef Bacik
  0 siblings, 2 replies; 37+ messages in thread
From: Nikolay Borisov @ 2020-07-07 14:39 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 30.06.20 г. 16:59 ч., Josef Bacik wrote:
> Data space flushing currently unconditionally commits the transaction
> twice in a row, and the last time it checks if there's enough pinned
> extents to satisfy it's reservation before deciding to commit the
> transaction for the 3rd and final time.
> 
> Encode this logic into may_commit_transaction().  In the next patch we
> will pass in U64_MAX for bytes_needed the first two times, and the final
> time we will pass in the actual bytes we need so the normal logic will
> apply.
> 
> This patch exists soley to make the logical changes I will make to the
> flushing state machine separate to make it easier to bisect any
> performance related regressions.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Tested-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

On a second pass through I'm now somewhat reluctant to merge this code.
may_commit_transaction has grown more logic which pertain solely to
metadata. As such I think we should separate that logic (i.e current
may_commit_transaction) and any further adjustments we might want to
make for data space info. For example all the delayed refs/trans
reservation checks etc. make absolutely no sense for data space info.

> ---
>  fs/btrfs/space-info.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index e041b1d58e28..fb63ddc31540 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -579,21 +579,33 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
>   * will return -ENOSPC.
>   */
>  static int may_commit_transaction(struct btrfs_fs_info *fs_info,
> -				  struct btrfs_space_info *space_info)
> +				  struct btrfs_space_info *space_info,
> +				  u64 bytes_needed)
>  {
>  	struct reserve_ticket *ticket = NULL;
>  	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
>  	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
>  	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
>  	struct btrfs_trans_handle *trans;
> -	u64 bytes_needed;
>  	u64 reclaim_bytes = 0;
>  	u64 cur_free_bytes = 0;
> +	bool do_commit = false;
>  
>  	trans = (struct btrfs_trans_handle *)current->journal_info;
>  	if (trans)
>  		return -EAGAIN;
>  
> +	/*
> +	 * If we are data and have passed in U64_MAX we just want to
> +	 * unconditionally commit the transaction to match the previous data
> +	 * flushing behavior.
> +	 */
> +	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
> +	   bytes_needed == U64_MAX) {
> +		do_commit = true;
> +		goto check_pinned;
> +	}
> +
>  	spin_lock(&space_info->lock);
>  	cur_free_bytes = btrfs_space_info_used(space_info, true);
>  	if (cur_free_bytes < space_info->total_bytes)
> @@ -607,7 +619,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	else if (!list_empty(&space_info->tickets))
>  		ticket = list_first_entry(&space_info->tickets,
>  					  struct reserve_ticket, list);
> -	bytes_needed = (ticket) ? ticket->bytes : 0;
> +	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
>  
>  	if (bytes_needed > cur_free_bytes)
>  		bytes_needed -= cur_free_bytes;
> @@ -618,6 +630,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	if (!bytes_needed)
>  		return 0;
>  
> +check_pinned:
>  	trans = btrfs_join_transaction(fs_info->extent_root);
>  	if (IS_ERR(trans))
>  		return PTR_ERR(trans);
> @@ -627,7 +640,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	 * we have block groups that are going to be freed, allowing us to
>  	 * possibly do a chunk allocation the next loop through.
>  	 */
> -	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
> +	if (do_commit ||
> +	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
>  	    __percpu_counter_compare(&space_info->total_bytes_pinned,
>  				     bytes_needed,
>  				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
> @@ -743,7 +757,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
>  		btrfs_wait_on_delayed_iputs(fs_info);
>  		break;
>  	case COMMIT_TRANS:
> -		ret = may_commit_transaction(fs_info, space_info);
> +		ret = may_commit_transaction(fs_info, space_info, num_bytes);
>  		break;
>  	default:
>  		ret = -ENOSPC;
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/23] btrfs: use ticketing for data space reservations
  2020-06-30 13:59 ` [PATCH 15/23] btrfs: use ticketing for data space reservations Josef Bacik
@ 2020-07-07 14:46   ` Nikolay Borisov
  2020-07-07 14:56     ` Josef Bacik
  0 siblings, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2020-07-07 14:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 30.06.20 г. 16:59 ч., Josef Bacik wrote:
> Now that we have all the infrastructure in place, use the ticketing
> infrastructure to make data allocations.  This still maintains the exact
> same flushing behavior, but now we're using tickets to get our
> reservations satisfied.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Tested-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/space-info.c | 125 ++++++++++++++++++++++--------------------
>  1 file changed, 67 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 799ee6090693..ee4747917b81 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1068,6 +1068,54 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
>  	} while (flush_state < states_nr);
>  }
>  
> +static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
> +					struct btrfs_space_info *space_info,
> +					struct reserve_ticket *ticket,
> +					const enum btrfs_flush_state *states,
> +					int states_nr)
> +{
> +	int flush_state = 0;
> +	int commit_cycles = 2;
> +
> +	while (!space_info->full) {
> +		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
> +		spin_lock(&space_info->lock);
> +		if (ticket->bytes == 0) {
> +			spin_unlock(&space_info->lock);
> +			return;
> +		}
> +		spin_unlock(&space_info->lock);
> +	}
> +again:
> +	while (flush_state < states_nr) {
> +		u64 flush_bytes = U64_MAX;
> +
> +		if (!commit_cycles) {
> +			if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
> +				flush_state++;
> +				continue;
> +			}
> +			if (states[flush_state] == COMMIT_TRANS)
> +				flush_bytes = ticket->bytes;
> +		}
> +
> +		flush_space(fs_info, space_info, flush_bytes,
> +			    states[flush_state]);
> +		spin_lock(&space_info->lock);
> +		if (ticket->bytes == 0) {
> +			spin_unlock(&space_info->lock);
> +			return;
> +		}
> +		spin_unlock(&space_info->lock);
> +		flush_state++;
> +	}
> +	if (commit_cycles) {
> +		commit_cycles--;
> +		flush_state = 0;
> +		goto again;
> +	}
> +}
> +
>  static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
>  				struct btrfs_space_info *space_info,
>  				struct reserve_ticket *ticket)
> @@ -1134,6 +1182,15 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
>  						evict_flush_states,
>  						ARRAY_SIZE(evict_flush_states));
>  		break;
> +	case BTRFS_RESERVE_FLUSH_DATA:
> +		priority_reclaim_data_space(fs_info, space_info, ticket,
> +					data_flush_states,
> +					ARRAY_SIZE(data_flush_states));
> +		break;
> +	case BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE:
> +		priority_reclaim_data_space(fs_info, space_info, ticket,
> +					    NULL, 0);
> +		break;
>  	default:
>  		ASSERT(0);
>  		break;
> @@ -1341,78 +1398,30 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
>  			     enum btrfs_reserve_flush_enum flush)
>  {
>  	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
> -	const enum btrfs_flush_state *states = NULL;
>  	u64 used;
> -	int states_nr = 0;
> -	int commit_cycles = 2;
>  	int ret = -ENOSPC;
>  
>  	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
>  
> -	if (flush == BTRFS_RESERVE_FLUSH_DATA) {
> -		states = data_flush_states;
> -		states_nr = ARRAY_SIZE(data_flush_states);
> -	}
> -
>  	spin_lock(&data_sinfo->lock);
> -again:
>  	used = btrfs_space_info_used(data_sinfo, true);
>  
>  	if (used + bytes > data_sinfo->total_bytes) {
> -		u64 prev_total_bytes = data_sinfo->total_bytes;
> -		int flush_state = 0;
> +		struct reserve_ticket ticket;
>  
> +		init_waitqueue_head(&ticket.wait);
> +		ticket.bytes = bytes;
> +		ticket.error = 0;
> +		list_add_tail(&ticket.list, &data_sinfo->priority_tickets);

nit: Shouldn't adding the ticket also be recorded in
spac_info->reclaim_size?
I see later that you are removing this code and relying on the existing
logic in __reserve_metadata_bytes( renamed to reserve_bytes) which
correctly modifies reclaim_size, but this just means this particular
patch is slightly broken.

>  		spin_unlock(&data_sinfo->lock);
>  
> -		/*
> -		 * Everybody can force chunk allocation, so try this first to
> -		 * see if we can just bail here and make our reservation.
> -		 */
> -		flush_space(fs_info, data_sinfo, bytes, ALLOC_CHUNK_FORCE);
> -		spin_lock(&data_sinfo->lock);
> -		if (prev_total_bytes < data_sinfo->total_bytes)
> -			goto again;
> +		ret = handle_reserve_ticket(fs_info, data_sinfo, &ticket,
> +					    flush);
> +	} else {
> +		btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
> +		ret = 0;

<snip>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-07 14:39   ` Nikolay Borisov
@ 2020-07-07 14:54     ` Josef Bacik
  2020-07-07 15:01     ` Josef Bacik
  1 sibling, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-07 14:54 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs, kernel-team

On 7/7/20 10:39 AM, Nikolay Borisov wrote:
> 
> 
> On 30.06.20 г. 16:59 ч., Josef Bacik wrote:
>> Data space flushing currently unconditionally commits the transaction
>> twice in a row, and the last time it checks if there's enough pinned
>> extents to satisfy it's reservation before deciding to commit the
>> transaction for the 3rd and final time.
>>
>> Encode this logic into may_commit_transaction().  In the next patch we
>> will pass in U64_MAX for bytes_needed the first two times, and the final
>> time we will pass in the actual bytes we need so the normal logic will
>> apply.
>>
>> This patch exists soley to make the logical changes I will make to the
>> flushing state machine separate to make it easier to bisect any
>> performance related regressions.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Tested-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> On a second pass through I'm now somewhat reluctant to merge this code.
> may_commit_transaction has grown more logic which pertain solely to
> metadata. As such I think we should separate that logic (i.e current
> may_commit_transaction) and any further adjustments we might want to
> make for data space info. For example all the delayed refs/trans
> reservation checks etc. make absolutely no sense for data space info.
> 

Right, which is why there's this check

         /*
          * See if there is some space in the delayed insertion reservation for
          * this reservation.
          */
         if (space_info != delayed_rsv->space_info)
                 goto enospc;

before all of those checks.  The logic is the same, minus the delayed ref stuff, 
which is kept separate.  If you want I can make the comment a bit more clear, 
it's not very verbose.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/23] btrfs: use ticketing for data space reservations
  2020-07-07 14:46   ` Nikolay Borisov
@ 2020-07-07 14:56     ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-07 14:56 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs, kernel-team

On 7/7/20 10:46 AM, Nikolay Borisov wrote:
> 
> 
> On 30.06.20 г. 16:59 ч., Josef Bacik wrote:
>> Now that we have all the infrastructure in place, use the ticketing
>> infrastructure to make data allocations.  This still maintains the exact
>> same flushing behavior, but now we're using tickets to get our
>> reservations satisfied.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Tested-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/space-info.c | 125 ++++++++++++++++++++++--------------------
>>   1 file changed, 67 insertions(+), 58 deletions(-)
>>
>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>> index 799ee6090693..ee4747917b81 100644
>> --- a/fs/btrfs/space-info.c
>> +++ b/fs/btrfs/space-info.c
>> @@ -1068,6 +1068,54 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
>>   	} while (flush_state < states_nr);
>>   }
>>   
>> +static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info,
>> +					struct btrfs_space_info *space_info,
>> +					struct reserve_ticket *ticket,
>> +					const enum btrfs_flush_state *states,
>> +					int states_nr)
>> +{
>> +	int flush_state = 0;
>> +	int commit_cycles = 2;
>> +
>> +	while (!space_info->full) {
>> +		flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE);
>> +		spin_lock(&space_info->lock);
>> +		if (ticket->bytes == 0) {
>> +			spin_unlock(&space_info->lock);
>> +			return;
>> +		}
>> +		spin_unlock(&space_info->lock);
>> +	}
>> +again:
>> +	while (flush_state < states_nr) {
>> +		u64 flush_bytes = U64_MAX;
>> +
>> +		if (!commit_cycles) {
>> +			if (states[flush_state] == FLUSH_DELALLOC_WAIT) {
>> +				flush_state++;
>> +				continue;
>> +			}
>> +			if (states[flush_state] == COMMIT_TRANS)
>> +				flush_bytes = ticket->bytes;
>> +		}
>> +
>> +		flush_space(fs_info, space_info, flush_bytes,
>> +			    states[flush_state]);
>> +		spin_lock(&space_info->lock);
>> +		if (ticket->bytes == 0) {
>> +			spin_unlock(&space_info->lock);
>> +			return;
>> +		}
>> +		spin_unlock(&space_info->lock);
>> +		flush_state++;
>> +	}
>> +	if (commit_cycles) {
>> +		commit_cycles--;
>> +		flush_state = 0;
>> +		goto again;
>> +	}
>> +}
>> +
>>   static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
>>   				struct btrfs_space_info *space_info,
>>   				struct reserve_ticket *ticket)
>> @@ -1134,6 +1182,15 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
>>   						evict_flush_states,
>>   						ARRAY_SIZE(evict_flush_states));
>>   		break;
>> +	case BTRFS_RESERVE_FLUSH_DATA:
>> +		priority_reclaim_data_space(fs_info, space_info, ticket,
>> +					data_flush_states,
>> +					ARRAY_SIZE(data_flush_states));
>> +		break;
>> +	case BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE:
>> +		priority_reclaim_data_space(fs_info, space_info, ticket,
>> +					    NULL, 0);
>> +		break;
>>   	default:
>>   		ASSERT(0);
>>   		break;
>> @@ -1341,78 +1398,30 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
>>   			     enum btrfs_reserve_flush_enum flush)
>>   {
>>   	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
>> -	const enum btrfs_flush_state *states = NULL;
>>   	u64 used;
>> -	int states_nr = 0;
>> -	int commit_cycles = 2;
>>   	int ret = -ENOSPC;
>>   
>>   	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
>>   
>> -	if (flush == BTRFS_RESERVE_FLUSH_DATA) {
>> -		states = data_flush_states;
>> -		states_nr = ARRAY_SIZE(data_flush_states);
>> -	}
>> -
>>   	spin_lock(&data_sinfo->lock);
>> -again:
>>   	used = btrfs_space_info_used(data_sinfo, true);
>>   
>>   	if (used + bytes > data_sinfo->total_bytes) {
>> -		u64 prev_total_bytes = data_sinfo->total_bytes;
>> -		int flush_state = 0;
>> +		struct reserve_ticket ticket;
>>   
>> +		init_waitqueue_head(&ticket.wait);
>> +		ticket.bytes = bytes;
>> +		ticket.error = 0;
>> +		list_add_tail(&ticket.list, &data_sinfo->priority_tickets);
> 
> nit: Shouldn't adding the ticket also be recorded in
> spac_info->reclaim_size?
> I see later that you are removing this code and relying on the existing
> logic in __reserve_metadata_bytes( renamed to reserve_bytes) which
> correctly modifies reclaim_size, but this just means this particular
> patch is slightly broken.

Yup I'll fix this up, thanks.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-07 14:39   ` Nikolay Borisov
  2020-07-07 14:54     ` Josef Bacik
@ 2020-07-07 15:01     ` Josef Bacik
  1 sibling, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-07 15:01 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs, kernel-team

On 7/7/20 10:39 AM, Nikolay Borisov wrote:
> 
> 
> On 30.06.20 г. 16:59 ч., Josef Bacik wrote:
>> Data space flushing currently unconditionally commits the transaction
>> twice in a row, and the last time it checks if there's enough pinned
>> extents to satisfy it's reservation before deciding to commit the
>> transaction for the 3rd and final time.
>>
>> Encode this logic into may_commit_transaction().  In the next patch we
>> will pass in U64_MAX for bytes_needed the first two times, and the final
>> time we will pass in the actual bytes we need so the normal logic will
>> apply.
>>
>> This patch exists soley to make the logical changes I will make to the
>> flushing state machine separate to make it easier to bisect any
>> performance related regressions.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Tested-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> On a second pass through I'm now somewhat reluctant to merge this code.
> may_commit_transaction has grown more logic which pertain solely to
> metadata. As such I think we should separate that logic (i.e current
> may_commit_transaction) and any further adjustments we might want to
> make for data space info. For example all the delayed refs/trans
> reservation checks etc. make absolutely no sense for data space info.

Oooh I forgot another thing, the reason I did this is because I was trying to 
maintain the behavior while converting, and then change the behavior in 
subsequent patches.  I'm doing this specifically because we may have performance 
regressions, and I didn't want a bisect to show up at "Giant patch that changes 
all the things".  I want it to show up at the patch that actually changed the 
behavior, so we have a better idea of where to look.  So this intermediate step 
is a bit wonky for sure, but the end result at the end of the series is less 
wonky.  But I'm still going to update that comment.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/23] Change data reservations to use the ticketing infra
  2020-07-03 16:30 ` [PATCH 00/23] Change data reservations to use the ticketing infra David Sterba
@ 2020-07-07 15:28   ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-07 15:28 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 7/3/20 12:30 PM, David Sterba wrote:
> On Tue, Jun 30, 2020 at 09:58:58AM -0400, Josef Bacik wrote:
>> We've had two different things in place to reserve data and metadata space,
>> because generally speaking data is much simpler.  However the data reservations
>> still suffered from the same issues that plagued metadata reservations, you
>> could get multiple tasks racing in to get reservations.  This causes problems
>> with cases like write/delete loops where we should be able to fill up the fs,
>> delete everything, and go again.  You would sometimes race with a flusher that's
>> trying to unpin the free space, take it's reservations, and then it would fail.
>>
>> Fix this by moving the data reservations under the metadata ticketing
>> infrastructure.  This gets rid of that problem, and simplifies our enospc code
>> more by consolidating it into a single path.  Thanks,
> 
> I created a for-next-20200703 snapshot with this branch included, and it
> went up to generic/102 and got stuck due to softlockups. This could mean
> the machine is overloaded but it usually recovers from that, while not
> in this case as it took more than 7 hours before I killed the VM.
> 
> There is the dio-iomap so the patchsets could interact in some way, I'll
> do more tests next week, this is an early warning that something might
> be going on.

Softlockup in __slab_alloc, I think we were just the victim here.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-21 14:22 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
@ 2020-08-13 16:35   ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-08-13 16:35 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Nikolay Borisov

On Tue, Jul 21, 2020 at 10:22:24AM -0400, Josef Bacik wrote:
> -	bytes_needed = (ticket) ? ticket->bytes : 0;
> +	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;

I'd rather write that as

	if (ticket)
		bytes_needed = ticket->bytes;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-21 14:22 [PATCH 00/23][v4] " Josef Bacik
@ 2020-07-21 14:22 ` Josef Bacik
  2020-08-13 16:35   ` David Sterba
  0 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2020-07-21 14:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy it's reservation before deciding to commit the
transaction for the 3rd and final time.

Encode this logic into may_commit_transaction().  In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.

This patch exists soley to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 94da7b43e152..be0f117d4ccf 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -579,21 +579,33 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info)
+				  struct btrfs_space_info *space_info,
+				  u64 bytes_needed)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
 	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
 	u64 cur_free_bytes = 0;
+	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
+	/*
+	 * If we are data and have passed in U64_MAX we just want to
+	 * unconditionally commit the transaction to match the previous data
+	 * flushing behavior.
+	 */
+	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
+	   bytes_needed == U64_MAX) {
+		do_commit = true;
+		goto check_pinned;
+	}
+
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -607,7 +619,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -618,6 +630,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
+check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -627,7 +640,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (do_commit ||
+	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -635,7 +649,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 
 	/*
 	 * See if there is some space in the delayed insertion reservation for
-	 * this reservation.
+	 * this reservation.  If the space_info's don't match (like for DATA or
+	 * SYSTEM) then just enospc, reclaiming this space won't recover any
+	 * space to satisfy those reservations.
 	 */
 	if (space_info != delayed_rsv->space_info)
 		goto enospc;
@@ -743,7 +759,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info);
+		ret = may_commit_transaction(fs_info, space_info, num_bytes);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-08 13:59 [PATCH 00/23][v3] Change data reservations to use the ticketing infra Josef Bacik
@ 2020-07-08 14:00 ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-08 14:00 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy it's reservation before deciding to commit the
transaction for the 3rd and final time.

Encode this logic into may_commit_transaction().  In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.

This patch exists soley to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 94da7b43e152..be0f117d4ccf 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -579,21 +579,33 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info)
+				  struct btrfs_space_info *space_info,
+				  u64 bytes_needed)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
 	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
 	u64 cur_free_bytes = 0;
+	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
+	/*
+	 * If we are data and have passed in U64_MAX we just want to
+	 * unconditionally commit the transaction to match the previous data
+	 * flushing behavior.
+	 */
+	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
+	   bytes_needed == U64_MAX) {
+		do_commit = true;
+		goto check_pinned;
+	}
+
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -607,7 +619,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -618,6 +630,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
+check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -627,7 +640,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (do_commit ||
+	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -635,7 +649,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 
 	/*
 	 * See if there is some space in the delayed insertion reservation for
-	 * this reservation.
+	 * this reservation.  If the space_info's don't match (like for DATA or
+	 * SYSTEM) then just enospc, reclaiming this space won't recover any
+	 * space to satisfy those reservations.
 	 */
 	if (space_info != delayed_rsv->space_info)
 		goto enospc;
@@ -743,7 +759,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info);
+		ret = may_commit_transaction(fs_info, space_info, num_bytes);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-07-07 15:42 [PATCH 00/23][v2] Change data reservations to use the ticketing infra Josef Bacik
@ 2020-07-07 15:42 ` Josef Bacik
  0 siblings, 0 replies; 37+ messages in thread
From: Josef Bacik @ 2020-07-07 15:42 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy it's reservation before deciding to commit the
transaction for the 3rd and final time.

Encode this logic into may_commit_transaction().  In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.

This patch exists soley to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index e041b1d58e28..e29c3e318d1b 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -579,21 +579,33 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info)
+				  struct btrfs_space_info *space_info,
+				  u64 bytes_needed)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
 	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
 	u64 cur_free_bytes = 0;
+	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
+	/*
+	 * If we are data and have passed in U64_MAX we just want to
+	 * unconditionally commit the transaction to match the previous data
+	 * flushing behavior.
+	 */
+	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
+	   bytes_needed == U64_MAX) {
+		do_commit = true;
+		goto check_pinned;
+	}
+
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -607,7 +619,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -618,6 +630,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
+check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -627,7 +640,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (do_commit ||
+	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -635,7 +649,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 
 	/*
 	 * See if there is some space in the delayed insertion reservation for
-	 * this reservation.
+	 * this reservation.  If the space_info's don't match (like for DATA or
+	 * SYSTEM) then just enospc, reclaiming this space won't recover any
+	 * space to satisfy those reservations.
 	 */
 	if (space_info != delayed_rsv->space_info)
 		goto enospc;
@@ -743,7 +759,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info);
+		ret = may_commit_transaction(fs_info, space_info, num_bytes);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-02-04 16:19 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
@ 2020-02-04 16:56   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-02-04 16:56 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction
  2020-02-04 16:19 [PATCH 0/23][v4] Convert data reservations to the ticketing infrastructure Josef Bacik
@ 2020-02-04 16:19 ` Josef Bacik
  2020-02-04 16:56   ` Johannes Thumshirn
  0 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2020-02-04 16:19 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy it's reservation before deciding to commit the
transaction for the 3rd and final time.

Encode this logic into may_commit_transaction().  In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.

This patch exists soley to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index e348468489c7..ad203717269c 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -412,20 +412,32 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
  * will return -ENOSPC.
  */
 static int may_commit_transaction(struct btrfs_fs_info *fs_info,
-				  struct btrfs_space_info *space_info)
+				  struct btrfs_space_info *space_info,
+				  u64 bytes_needed)
 {
 	struct reserve_ticket *ticket = NULL;
 	struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
 	struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
 	struct btrfs_trans_handle *trans;
-	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
 	u64 cur_free_bytes = 0;
+	bool do_commit = false;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
+	/*
+	 * If we are data and have passed in U64_MAX we just want to
+	 * unconditionally commit the transaction to match the previous data
+	 * flushing behavior.
+	 */
+	if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) &&
+	   bytes_needed == U64_MAX) {
+		do_commit = true;
+		goto check_pinned;
+	}
+
 	spin_lock(&space_info->lock);
 	cur_free_bytes = btrfs_space_info_used(space_info, true);
 	if (cur_free_bytes < space_info->total_bytes)
@@ -439,7 +451,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	else if (!list_empty(&space_info->tickets))
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-	bytes_needed = (ticket) ? ticket->bytes : 0;
+	bytes_needed = (ticket) ? ticket->bytes : bytes_needed;
 
 	if (bytes_needed > cur_free_bytes)
 		bytes_needed -= cur_free_bytes;
@@ -450,6 +462,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes_needed)
 		return 0;
 
+check_pinned:
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
@@ -459,7 +472,8 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * we have block groups that are going to be freed, allowing us to
 	 * possibly do a chunk allocation the next loop through.
 	 */
-	if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
+	if (do_commit ||
+	    test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
 	    __percpu_counter_compare(&space_info->total_bytes_pinned,
 				     bytes_needed,
 				     BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
@@ -570,7 +584,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		btrfs_wait_on_delayed_iputs(fs_info);
 		break;
 	case COMMIT_TRANS:
-		ret = may_commit_transaction(fs_info, space_info);
+		ret = may_commit_transaction(fs_info, space_info, num_bytes);
 		break;
 	default:
 		ret = -ENOSPC;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-08-13 16:36 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30 13:58 [PATCH 00/23] Change data reservations to use the ticketing infra Josef Bacik
2020-06-30 13:58 ` [PATCH 01/23] btrfs: change nr to u64 in btrfs_start_delalloc_roots Josef Bacik
2020-06-30 13:59 ` [PATCH 02/23] btrfs: remove orig from shrink_delalloc Josef Bacik
2020-06-30 13:59 ` [PATCH 03/23] btrfs: handle U64_MAX for shrink_delalloc Josef Bacik
2020-06-30 13:59 ` [PATCH 04/23] btrfs: make shrink_delalloc take space_info as an arg Josef Bacik
2020-06-30 13:59 ` [PATCH 05/23] btrfs: make ALLOC_CHUNK use the space info flags Josef Bacik
2020-06-30 13:59 ` [PATCH 06/23] btrfs: call btrfs_try_granting_tickets when freeing reserved bytes Josef Bacik
2020-06-30 13:59 ` [PATCH 07/23] btrfs: call btrfs_try_granting_tickets when unpinning anything Josef Bacik
2020-06-30 13:59 ` [PATCH 08/23] btrfs: call btrfs_try_granting_tickets when reserving space Josef Bacik
2020-06-30 13:59 ` [PATCH 09/23] btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc Josef Bacik
2020-06-30 13:59 ` [PATCH 10/23] btrfs: use btrfs_start_delalloc_roots in shrink_delalloc Josef Bacik
2020-06-30 13:59 ` [PATCH 11/23] btrfs: check tickets after waiting on ordered extents Josef Bacik
2020-06-30 13:59 ` [PATCH 12/23] btrfs: add flushing states for handling data reservations Josef Bacik
2020-06-30 13:59 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-07-07 14:39   ` Nikolay Borisov
2020-07-07 14:54     ` Josef Bacik
2020-07-07 15:01     ` Josef Bacik
2020-06-30 13:59 ` [PATCH 14/23] btrfs: add btrfs_reserve_data_bytes and use it Josef Bacik
2020-06-30 13:59 ` [PATCH 15/23] btrfs: use ticketing for data space reservations Josef Bacik
2020-07-07 14:46   ` Nikolay Borisov
2020-07-07 14:56     ` Josef Bacik
2020-06-30 13:59 ` [PATCH 16/23] btrfs: serialize data reservations if we are flushing Josef Bacik
2020-06-30 13:59 ` [PATCH 17/23] btrfs: use the same helper for data and metadata reservations Josef Bacik
2020-06-30 13:59 ` [PATCH 18/23] btrfs: drop the commit_cycles stuff for data reservations Josef Bacik
2020-06-30 13:59 ` [PATCH 19/23] btrfs: don't force commit if we are data Josef Bacik
2020-06-30 13:59 ` [PATCH 20/23] btrfs: run delayed iputs before committing the transaction for data Josef Bacik
2020-06-30 13:59 ` [PATCH 21/23] btrfs: flush delayed refs when trying to reserve data space Josef Bacik
2020-06-30 13:59 ` [PATCH 22/23] btrfs: do async reclaim for data reservations Josef Bacik
2020-06-30 13:59 ` [PATCH 23/23] btrfs: add a comment explaining the data flush steps Josef Bacik
2020-07-03 16:30 ` [PATCH 00/23] Change data reservations to use the ticketing infra David Sterba
2020-07-07 15:28   ` Josef Bacik
  -- strict thread matches above, loose matches on Subject: below --
2020-07-21 14:22 [PATCH 00/23][v4] " Josef Bacik
2020-07-21 14:22 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-08-13 16:35   ` David Sterba
2020-07-08 13:59 [PATCH 00/23][v3] Change data reservations to use the ticketing infra Josef Bacik
2020-07-08 14:00 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-07-07 15:42 [PATCH 00/23][v2] Change data reservations to use the ticketing infra Josef Bacik
2020-07-07 15:42 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-02-04 16:19 [PATCH 0/23][v4] Convert data reservations to the ticketing infrastructure Josef Bacik
2020-02-04 16:19 ` [PATCH 13/23] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-02-04 16:56   ` Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.