linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9][v3] Rework reserve ticket handling
@ 2019-08-22 19:10 Josef Bacik
  2019-08-22 19:10 ` [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets Josef Bacik
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

This is the next round of my reserve ticket refinements.  Most of the changes
are just fixing issues brought up by review.  The updated diffstat is as follows

 fs/btrfs/block-group.c    |   5 +-
 fs/btrfs/block-rsv.c      |  10 +--
 fs/btrfs/delalloc-space.c |   4 --
 fs/btrfs/delayed-ref.c    |   2 +-
 fs/btrfs/extent-tree.c    |  13 +---
 fs/btrfs/space-info.c     | 171 +++++++++++++++++++---------------------------
 fs/btrfs/space-info.h     |  30 +++++---
 7 files changed, 98 insertions(+), 137 deletions(-)

v2->v3:
- added 9/9 to rename btrfs_space_info_add_old_bytes as per discussions with
  Nikolay.
- added a few comments.
- made the logic clearer in the may_commit_transaction patch.
- a few lockdep_assert_held()'s.
- added the reviewed-by's.

v1->v2:
- added "btrfs: fix may_commit_transaction to deal with no partial filling"
- fixed "btrfs: refactor the ticket wakeup code" to return true if we find a
  smaller ticket than our first ticket in the list.

- Original email -
While cleaning up some things around the global reserve and can_overcommit I
started getting ENOSPC's with plenty of space to make reservations.  The root
cause of the problem has to do with how we satisfy ticket reservations.

Previously we would add any space we were returning to the space info to the
first ticket we found.  The reason we did this was because new reservations just
check the counters to see if they can continue, so we didn't want them to get
reservations when we had waiters already queued up.  So instead of returning the
bytes to the space info, I'd add it to the ticket.  Then if we failed to satisfy
that ticket reservation we'd take any space we found and add it to the next guy
in case it satisfied the next ticket reservation.

This works generally well in practice, but there are several xfstests that run
ENOSPC tests against very small file systems.  These tests uncovered a corner
case when it comes to overcommitting.  If we overcommit the space, and then are
no longer allowed to overcommit, we won't actually give any returned space to
the tickets, because that would be really bad.  Instead we return that space to
the space_info and carry on.

What was biting us in these test cases was the fact that we had very small
metadata area, 8mib, and unlink asks for about 2mib of space.  If we had
overcommitted 8.1mib, we'd give back almost 2mib of space to the space_info,
which could have instead been used for the reservation.  This would result in an
early ENOSPC.

Since we are only doing this partial filling dance to avoid racing with new
reservations we just fix that race by checking if we have pending reservations
on the list, closing that race.  Then we are free to use the normal checks to
see if a ticket can be woken up.  This simplifies the code a bunch, we no longer
have to keep track of how much space the tickets were given and return those
bytes, and I could consolidate the wakeup code into one function instead of two.

The diffstat is as follows, this all passes xfstests, and sets us up nicely for
the upcoming changesets.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-23  7:33   ` Nikolay Borisov
  2019-08-22 19:10 ` [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper Josef Bacik
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

If we already have tickets on the list we don't want to steal their
reservations.  This is a preparation patch for upcoming changes,
technically this shouldn't happen today because of the way we add bytes
to tickets before adding them to the space_info in most cases.

This does not change the FIFO nature of reserve tickets, it simply
allows us to enforce it in a different way.  Previously it was enforced
because any new space would be added to the first ticket on the list,
which would result in new reservations getting a reserve ticket.  This
replaces that mechanism by simply checking to see if we have outstanding
reserve tickets and skipping straight to adding a ticket for our
reservation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 5f8f65599de1..33fa0ba49759 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -993,6 +993,7 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
 	struct reserve_ticket ticket;
 	u64 used;
 	int ret = 0;
+	bool pending_tickets;
 
 	ASSERT(orig_bytes);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_ALL);
@@ -1000,14 +1001,17 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
 	spin_lock(&space_info->lock);
 	ret = -ENOSPC;
 	used = btrfs_space_info_used(space_info, true);
+	pending_tickets = !list_empty(&space_info->tickets) ||
+		!list_empty(&space_info->priority_tickets);
 
 	/*
 	 * Carry on if we have enough space (short-circuit) OR call
 	 * can_overcommit() to ensure we can overcommit to continue.
 	 */
-	if ((used + orig_bytes <= space_info->total_bytes) ||
-	    can_overcommit(fs_info, space_info, orig_bytes, flush,
-			   system_chunk)) {
+	if (!pending_tickets &&
+	    ((used + orig_bytes <= space_info->total_bytes) ||
+	     can_overcommit(fs_info, space_info, orig_bytes, flush,
+			   system_chunk))) {
 		btrfs_space_info_update_bytes_may_use(fs_info, space_info,
 						      orig_bytes);
 		trace_btrfs_space_reservation(fs_info, "space_info",
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
  2019-08-22 19:10 ` [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-23 12:12   ` David Sterba
  2019-08-22 19:10 ` [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes Josef Bacik
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

We duplicate this tracepoint everywhere we call these helpers, so update
the helper to have the tracepoint as well.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c    |  3 ---
 fs/btrfs/block-rsv.c      |  5 -----
 fs/btrfs/delalloc-space.c |  4 ----
 fs/btrfs/extent-tree.c    |  9 ---------
 fs/btrfs/space-info.c     | 10 ----------
 fs/btrfs/space-info.h     | 10 +++++++---
 6 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 262e62ef52a5..9867c5d98650 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2696,9 +2696,6 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			spin_unlock(&cache->lock);
 			spin_unlock(&cache->space_info->lock);
 
-			trace_btrfs_space_reservation(info, "pinned",
-						      cache->space_info->flags,
-						      num_bytes, 1);
 			percpu_counter_add_batch(
 					&cache->space_info->total_bytes_pinned,
 					num_bytes,
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index 698470b9f32d..c64b460a4301 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -283,16 +283,11 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
 			block_rsv->reserved += num_bytes;
 			btrfs_space_info_update_bytes_may_use(fs_info, sinfo,
 							      num_bytes);
-			trace_btrfs_space_reservation(fs_info, "space_info",
-						      sinfo->flags, num_bytes,
-						      1);
 		}
 	} else if (block_rsv->reserved > block_rsv->size) {
 		num_bytes = block_rsv->reserved - block_rsv->size;
 		btrfs_space_info_update_bytes_may_use(fs_info, sinfo,
 						      -num_bytes);
-		trace_btrfs_space_reservation(fs_info, "space_info",
-				      sinfo->flags, num_bytes, 0);
 		block_rsv->reserved = block_rsv->size;
 	}
 
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index d2dfc201b2e1..1fc6bef3ccdf 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -130,8 +130,6 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 		return -ENOSPC;
 	}
 	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, bytes);
-	trace_btrfs_space_reservation(fs_info, "space_info",
-				      data_sinfo->flags, bytes, 1);
 	spin_unlock(&data_sinfo->lock);
 
 	return 0;
@@ -183,8 +181,6 @@ void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 	data_sinfo = fs_info->data_sinfo;
 	spin_lock(&data_sinfo->lock);
 	btrfs_space_info_update_bytes_may_use(fs_info, data_sinfo, -len);
-	trace_btrfs_space_reservation(fs_info, "space_info",
-				      data_sinfo->flags, len, 0);
 	spin_unlock(&data_sinfo->lock);
 }
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 444844b431f7..8c3e8fdbf2c1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2583,8 +2583,6 @@ static int pin_down_extent(struct btrfs_block_group_cache *cache,
 	spin_unlock(&cache->lock);
 	spin_unlock(&cache->space_info->lock);
 
-	trace_btrfs_space_reservation(fs_info, "pinned",
-				      cache->space_info->flags, num_bytes, 1);
 	percpu_counter_add_batch(&cache->space_info->total_bytes_pinned,
 		    num_bytes, BTRFS_TOTAL_BYTES_PINNED_BATCH);
 	set_extent_dirty(fs_info->pinned_extents, bytenr,
@@ -2842,9 +2840,6 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
 		spin_lock(&cache->lock);
 		cache->pinned -= len;
 		btrfs_space_info_update_bytes_pinned(fs_info, space_info, -len);
-
-		trace_btrfs_space_reservation(fs_info, "pinned",
-					      space_info->flags, len, 0);
 		space_info->max_extent_size = 0;
 		percpu_counter_add_batch(&space_info->total_bytes_pinned,
 			    -len, BTRFS_TOTAL_BYTES_PINNED_BATCH);
@@ -2866,10 +2861,6 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
 						space_info, to_add);
 				if (global_rsv->reserved >= global_rsv->size)
 					global_rsv->full = 1;
-				trace_btrfs_space_reservation(fs_info,
-							      "space_info",
-							      space_info->flags,
-							      to_add, 1);
 				len -= to_add;
 			}
 			spin_unlock(&global_rsv->lock);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 33fa0ba49759..a0a36d5768e1 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -279,8 +279,6 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
 		goto again;
 	}
 	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
-	trace_btrfs_space_reservation(fs_info, "space_info",
-				      space_info->flags, num_bytes, 0);
 	spin_unlock(&space_info->lock);
 }
 
@@ -301,9 +299,6 @@ void btrfs_space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
 		ticket = list_first_entry(head, struct reserve_ticket,
 					  list);
 		if (num_bytes >= ticket->bytes) {
-			trace_btrfs_space_reservation(fs_info, "space_info",
-						      space_info->flags,
-						      ticket->bytes, 1);
 			list_del_init(&ticket->list);
 			num_bytes -= ticket->bytes;
 			btrfs_space_info_update_bytes_may_use(fs_info,
@@ -313,9 +308,6 @@ void btrfs_space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
 			space_info->tickets_id++;
 			wake_up(&ticket->wait);
 		} else {
-			trace_btrfs_space_reservation(fs_info, "space_info",
-						      space_info->flags,
-						      num_bytes, 1);
 			btrfs_space_info_update_bytes_may_use(fs_info,
 							      space_info,
 							      num_bytes);
@@ -1014,8 +1006,6 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
 			   system_chunk))) {
 		btrfs_space_info_update_bytes_may_use(fs_info, space_info,
 						      orig_bytes);
-		trace_btrfs_space_reservation(fs_info, "space_info",
-					      space_info->flags, orig_bytes, 1);
 		ret = 0;
 	}
 
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index c2b54b8e1a14..025f7ce2c9b1 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -87,14 +87,18 @@ static inline bool btrfs_mixed_space_info(struct btrfs_space_info *space_info)
  *
  * Declare a helper function to detect underflow of various space info members
  */
-#define DECLARE_SPACE_INFO_UPDATE(name)					\
+#define DECLARE_SPACE_INFO_UPDATE(name, trace_name)			\
 static inline void							\
 btrfs_space_info_update_##name(struct btrfs_fs_info *fs_info,		\
 			       struct btrfs_space_info *sinfo,		\
 			       s64 bytes)				\
 {									\
+	u64 abs_bytes = (bytes < 0) ? -bytes : bytes;			\
 	lockdep_assert_held(&sinfo->lock);				\
 	trace_update_##name(fs_info, sinfo, sinfo->name, bytes);	\
+	trace_btrfs_space_reservation(fs_info, trace_name,		\
+				      sinfo->flags, abs_bytes,		\
+				      bytes > 0);			\
 	if (bytes < 0 && sinfo->name < -bytes) {			\
 		WARN_ON(1);						\
 		sinfo->name = 0;					\
@@ -103,8 +107,8 @@ btrfs_space_info_update_##name(struct btrfs_fs_info *fs_info,		\
 	sinfo->name += bytes;						\
 }
 
-DECLARE_SPACE_INFO_UPDATE(bytes_may_use);
-DECLARE_SPACE_INFO_UPDATE(bytes_pinned);
+DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "space_info");
+DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned");
 
 void btrfs_space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
 				    struct btrfs_space_info *space_info,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
  2019-08-22 19:10 ` [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets Josef Bacik
  2019-08-22 19:10 ` [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-23 12:17   ` David Sterba
  2019-08-22 19:10 ` [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes Josef Bacik
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

I noticed when folding the trace_btrfs_space_reservation() tracepoint
into the btrfs_space_info_update_* helpers that we didn't emit a
tracepoint when doing btrfs_add_reserved_bytes().  I know this is
because we were swapping bytes_may_use for bytes_reserved, so in my mind
there was no reason to have the tracepoint there.  But now there is
because we always emit the unreserve for the bytes_may_use side, and
this would have broken if compression was on anyway.  Add a tracepoint
to cover the bytes_reserved counter so the math still comes out right.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9867c5d98650..afae5c731904 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2758,6 +2758,8 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
 	} else {
 		cache->reserved += num_bytes;
 		space_info->bytes_reserved += num_bytes;
+		trace_btrfs_space_reservation(cache->fs_info, "space_info",
+					      space_info->flags, num_bytes, 1);
 		btrfs_space_info_update_bytes_may_use(cache->fs_info,
 						      space_info, -ram_bytes);
 		if (delalloc)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (2 preceding siblings ...)
  2019-08-22 19:10 ` [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-23  7:48   ` Nikolay Borisov
  2019-08-28 15:15   ` [PATCH][v2] btrfs: stop partially refilling tickets when releasing space Josef Bacik
  2019-08-22 19:10 ` [PATCH 5/9] btrfs: refactor the ticket wakeup code Josef Bacik
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

If there are pending tickets and we are overcommitted we will simply
return free'd reservations to space_info->bytes_may_use if we cannot
overcommit any more.  This is problematic because we assume any free
space would have been added to the tickets, and so if we go from an
overcommitted state to not overcommitted we could have plenty of space
in our space_info but be unable to make progress on our tickets because
we only refill tickets from previous reservations.

Consider the following example.  We were allowed to overcommit to
space_info->total_bytes + 2mib.  Now we've allocated all of our chunks
and are no longer allowed to overcommit those extra 2mib.  Assume there
is a 3mib reservation we are now freeing.  Because we can no longer
overcommit we do not partially refill the ticket with the 3mib, instead
we subtract it from space_info->bytes_may_use.  Now the total reserved
space is 1mib less than total_bytes, meaning we have 1mib of space we
could reserve.  Now assume that our ticket is 2mib, and we only have
1mib of space to reclaim, so we have a partial refilling to 1mib.  We
keep trying to flush and eventually give up and ENOSPC the ticket, when
there was the remaining 1mib left in the space_info for usage.

Instead of doing this partial filling of tickets dance we need to simply
add our space to the space_info, and then do the normal check to see if
we can satisfy the whole reservation.  If we can then we wake up the
ticket and carry on.  This solves the above problem and makes it much
more straightforward to understand how the tickets are satisfied.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 43 ++++++++++++++++---------------------------
 1 file changed, 16 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index a0a36d5768e1..357fe7548e07 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -233,52 +233,41 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
 				    struct btrfs_space_info *space_info,
 				    u64 num_bytes)
 {
-	struct reserve_ticket *ticket;
 	struct list_head *head;
-	u64 used;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
-	bool check_overcommit = false;
 
 	spin_lock(&space_info->lock);
 	head = &space_info->priority_tickets;
+	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
 
-	/*
-	 * If we are over our limit then we need to check and see if we can
-	 * overcommit, and if we can't then we just need to free up our space
-	 * and not satisfy any requests.
-	 */
-	used = btrfs_space_info_used(space_info, true);
-	if (used - num_bytes >= space_info->total_bytes)
-		check_overcommit = true;
 again:
-	while (!list_empty(head) && num_bytes) {
-		ticket = list_first_entry(head, struct reserve_ticket,
-					  list);
-		/*
-		 * We use 0 bytes because this space is already reserved, so
-		 * adding the ticket space would be a double count.
-		 */
-		if (check_overcommit &&
-		    !can_overcommit(fs_info, space_info, 0, flush, false))
-			break;
-		if (num_bytes >= ticket->bytes) {
+	while (!list_empty(head)) {
+		struct reserve_ticket *ticket;
+		u64 used = btrfs_space_info_used(space_info, true);
+
+		ticket = list_first_entry(head, struct reserve_ticket, list);
+
+		/* Check and see if our ticket can be satisified now. */
+		if ((used + ticket->bytes <= space_info->total_bytes) ||
+		    can_overcommit(fs_info, space_info, ticket->bytes, flush,
+				   false)) {
+			btrfs_space_info_update_bytes_may_use(fs_info,
+							      space_info,
+							      ticket->bytes);
 			list_del_init(&ticket->list);
-			num_bytes -= ticket->bytes;
 			ticket->bytes = 0;
 			space_info->tickets_id++;
 			wake_up(&ticket->wait);
 		} else {
-			ticket->bytes -= num_bytes;
-			num_bytes = 0;
+			break;
 		}
 	}
 
-	if (num_bytes && head == &space_info->priority_tickets) {
+	if (head == &space_info->priority_tickets) {
 		head = &space_info->tickets;
 		flush = BTRFS_RESERVE_FLUSH_ALL;
 		goto again;
 	}
-	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
 	spin_unlock(&space_info->lock);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/9] btrfs: refactor the ticket wakeup code
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (3 preceding siblings ...)
  2019-08-22 19:10 ` [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-22 19:10 ` [PATCH 6/9] btrfs: rework wake_all_tickets Josef Bacik
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs; +Cc: Nikolay Borisov

Now that btrfs_space_info_add_old_bytes simply checks if we can make the
reservation and updates bytes_may_use, there's no reason to have both
helpers in place.  Factor out the ticket wakeup logic into it's own
helper, make btrfs_space_info_add_old_bytes() update bytes_may_use and
then call the wakeup helper, and replace all calls to
btrfs_space_info_add_new_bytes() with the wakeup helper.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c |  4 +--
 fs/btrfs/space-info.c  | 55 ++++--------------------------------------
 fs/btrfs/space-info.h  | 19 ++++++++++-----
 3 files changed, 20 insertions(+), 58 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8c3e8fdbf2c1..2a56232309a3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2866,8 +2866,8 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
 			spin_unlock(&global_rsv->lock);
 			/* Add to any tickets we may have */
 			if (len)
-				btrfs_space_info_add_new_bytes(fs_info,
-						space_info, len);
+				btrfs_try_granting_tickets(fs_info,
+							   space_info);
 		}
 		spin_unlock(&space_info->lock);
 	}
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 357fe7548e07..c2143ddb7f4a 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -131,9 +131,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags,
 	found->bytes_readonly += bytes_readonly;
 	if (total_bytes > 0)
 		found->full = 0;
-	btrfs_space_info_add_new_bytes(info, found,
-				       total_bytes - bytes_used -
-				       bytes_readonly);
+	btrfs_try_granting_tickets(info, found);
 	spin_unlock(&found->lock);
 	*space_info = found;
 }
@@ -229,17 +227,15 @@ static int can_overcommit(struct btrfs_fs_info *fs_info,
  * This is for space we already have accounted in space_info->bytes_may_use, so
  * basically when we're returning space from block_rsv's.
  */
-void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
-				    struct btrfs_space_info *space_info,
-				    u64 num_bytes)
+void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
+				struct btrfs_space_info *space_info)
 {
 	struct list_head *head;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
 
-	spin_lock(&space_info->lock);
-	head = &space_info->priority_tickets;
-	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
+	lockdep_assert_held(&space_info->lock);
 
+	head = &space_info->priority_tickets;
 again:
 	while (!list_empty(head)) {
 		struct reserve_ticket *ticket;
@@ -268,47 +264,6 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
 		flush = BTRFS_RESERVE_FLUSH_ALL;
 		goto again;
 	}
-	spin_unlock(&space_info->lock);
-}
-
-/*
- * This is for newly allocated space that isn't accounted in
- * space_info->bytes_may_use yet.  So if we allocate a chunk or unpin an extent
- * we use this helper.
- */
-void btrfs_space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
-				    struct btrfs_space_info *space_info,
-				    u64 num_bytes)
-{
-	struct reserve_ticket *ticket;
-	struct list_head *head = &space_info->priority_tickets;
-
-again:
-	while (!list_empty(head) && num_bytes) {
-		ticket = list_first_entry(head, struct reserve_ticket,
-					  list);
-		if (num_bytes >= ticket->bytes) {
-			list_del_init(&ticket->list);
-			num_bytes -= ticket->bytes;
-			btrfs_space_info_update_bytes_may_use(fs_info,
-							      space_info,
-							      ticket->bytes);
-			ticket->bytes = 0;
-			space_info->tickets_id++;
-			wake_up(&ticket->wait);
-		} else {
-			btrfs_space_info_update_bytes_may_use(fs_info,
-							      space_info,
-							      num_bytes);
-			ticket->bytes -= num_bytes;
-			num_bytes = 0;
-		}
-	}
-
-	if (num_bytes && head == &space_info->priority_tickets) {
-		head = &space_info->tickets;
-		goto again;
-	}
 }
 
 #define DUMP_BLOCK_RSV(fs_info, rsv_name)				\
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 025f7ce2c9b1..0e805b5b1fca 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -110,12 +110,6 @@ btrfs_space_info_update_##name(struct btrfs_fs_info *fs_info,		\
 DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "space_info");
 DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned");
 
-void btrfs_space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
-				    struct btrfs_space_info *space_info,
-				    u64 num_bytes);
-void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
-				    struct btrfs_space_info *space_info,
-				    u64 num_bytes);
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
 void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags,
 			     u64 total_bytes, u64 bytes_used,
@@ -133,5 +127,18 @@ int btrfs_reserve_metadata_bytes(struct btrfs_root *root,
 				 struct btrfs_block_rsv *block_rsv,
 				 u64 orig_bytes,
 				 enum btrfs_reserve_flush_enum flush);
+void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
+				struct btrfs_space_info *space_info);
+
+static inline void
+btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
+			       struct btrfs_space_info *space_info,
+			       u64 num_bytes)
+{
+	spin_lock(&space_info->lock);
+	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
+	btrfs_try_granting_tickets(fs_info, space_info);
+	spin_unlock(&space_info->lock);
+}
 
 #endif /* BTRFS_SPACE_INFO_H */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/9] btrfs: rework wake_all_tickets
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (4 preceding siblings ...)
  2019-08-22 19:10 ` [PATCH 5/9] btrfs: refactor the ticket wakeup code Josef Bacik
@ 2019-08-22 19:10 ` Josef Bacik
  2019-08-23  8:17   ` Nikolay Borisov
  2019-08-28 15:12   ` [PATCH][v2] " Josef Bacik
  2019-08-22 19:11 ` [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling Josef Bacik
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:10 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

Now that we no longer partially fill tickets we need to rework
wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
if any subsequent tickets are able to be satisfied.  If our tickets_id
changes we know something happened and we can keep flushing.

Also if we find a ticket that is smaller than the first ticket in our
queue then we want to retry the flushing loop again in case
may_commit_transaction() decides we could satisfy the ticket by
committing the transaction.

Rename this to maybe_fail_all_tickets() while we're at it, to better
reflect what the function is actually doing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c2143ddb7f4a..dd4adfa75a71 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -679,19 +679,46 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
 		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
 }
 
-static bool wake_all_tickets(struct list_head *head)
+static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
+				   struct btrfs_space_info *space_info)
 {
 	struct reserve_ticket *ticket;
+	u64 tickets_id = space_info->tickets_id;
+	u64 first_ticket_bytes = 0;
+
+	while (!list_empty(&space_info->tickets) &&
+	       tickets_id == space_info->tickets_id) {
+		ticket = list_first_entry(&space_info->tickets,
+					  struct reserve_ticket, list);
+
+		/*
+		 * may_commit_transaction will avoid committing the transaction
+		 * if it doesn't feel like the space reclaimed by the commit
+		 * would result in the ticket succeeding.  However if we have a
+		 * smaller ticket in the queue it may be small enough to be
+		 * satisified by committing the transaction, so if any
+		 * subsequent ticket is smaller than the first ticket go ahead
+		 * and send us back for another loop through the enospc flushing
+		 * code.
+		 */
+		if (first_ticket_bytes == 0)
+			first_ticket_bytes = ticket->bytes;
+		else if (first_ticket_bytes > ticket->bytes)
+			return true;
 
-	while (!list_empty(head)) {
-		ticket = list_first_entry(head, struct reserve_ticket, list);
 		list_del_init(&ticket->list);
 		ticket->error = -ENOSPC;
 		wake_up(&ticket->wait);
-		if (ticket->bytes != ticket->orig_bytes)
-			return true;
+
+		/*
+		 * We're just throwing tickets away, so more flushing may not
+		 * trip over btrfs_try_granting_tickets, so we need to call it
+		 * here to see if we can make progress with the next ticket in
+		 * the list.
+		 */
+		btrfs_try_granting_tickets(fs_info, space_info);
 	}
-	return false;
+	return (tickets_id != space_info->tickets_id);
 }
 
 /*
@@ -759,7 +786,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		if (flush_state > COMMIT_TRANS) {
 			commit_cycles++;
 			if (commit_cycles > 2) {
-				if (wake_all_tickets(&space_info->tickets)) {
+				if (maybe_fail_all_tickets(fs_info, space_info)) {
 					flush_state = FLUSH_DELAYED_ITEMS_NR;
 					commit_cycles--;
 				} else {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (5 preceding siblings ...)
  2019-08-22 19:10 ` [PATCH 6/9] btrfs: rework wake_all_tickets Josef Bacik
@ 2019-08-22 19:11 ` Josef Bacik
  2019-08-23  8:18   ` Nikolay Borisov
  2019-08-22 19:11 ` [PATCH 8/9] btrfs: remove orig_bytes from reserve_ticket Josef Bacik
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:11 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

Now that we aren't partially filling tickets we may have some slack
space left in the space_info.  We need to account for this in
may_commit_transaction, otherwise we may choose to not commit the
transaction despite it actually having enough space to satisfy our
ticket.

Calculate the free space we have in the space_info, if any, and subtract
this from the ticket we have and use that amount to determine if we will
need to commit to reclaim enough space.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index dd4adfa75a71..cec6db0c41cc 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -473,12 +473,19 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	struct btrfs_trans_handle *trans;
 	u64 bytes_needed;
 	u64 reclaim_bytes = 0;
+	u64 cur_free_bytes = 0;
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
 		return -EAGAIN;
 
 	spin_lock(&space_info->lock);
+	cur_free_bytes = btrfs_space_info_used(space_info, true);
+	if (cur_free_bytes < space_info->total_bytes)
+		cur_free_bytes = space_info->total_bytes - cur_free_bytes;
+	else
+		cur_free_bytes = 0;
+
 	if (!list_empty(&space_info->priority_tickets))
 		ticket = list_first_entry(&space_info->priority_tickets,
 					  struct reserve_ticket, list);
@@ -486,6 +493,11 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
 	bytes_needed = (ticket) ? ticket->bytes : 0;
+
+	if (bytes_needed > cur_free_bytes)
+		bytes_needed -= cur_free_bytes;
+	else
+		bytes_needed = 0;
 	spin_unlock(&space_info->lock);
 
 	if (!bytes_needed)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/9] btrfs: remove orig_bytes from reserve_ticket
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (6 preceding siblings ...)
  2019-08-22 19:11 ` [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling Josef Bacik
@ 2019-08-22 19:11 ` Josef Bacik
  2019-08-22 19:11 ` [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes Josef Bacik
  2019-08-23 12:55 ` [PATCH 0/9][v3] Rework reserve ticket handling David Sterba
  9 siblings, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:11 UTC (permalink / raw)
  To: kernel-team, linux-btrfs; +Cc: Nikolay Borisov

Now that we do not do partial filling of tickets simply remove
orig_bytes, it is no longer needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/space-info.c | 8 --------
 fs/btrfs/space-info.h | 1 -
 2 files changed, 9 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index cec6db0c41cc..c5939c24c963 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -905,7 +905,6 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 				 struct reserve_ticket *ticket,
 				 enum btrfs_reserve_flush_enum flush)
 {
-	u64 reclaim_bytes = 0;
 	int ret;
 
 	switch (flush) {
@@ -930,17 +929,11 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 	spin_lock(&space_info->lock);
 	ret = ticket->error;
 	if (ticket->bytes || ticket->error) {
-		if (ticket->bytes < ticket->orig_bytes)
-			reclaim_bytes = ticket->orig_bytes - ticket->bytes;
 		list_del_init(&ticket->list);
 		if (!ret)
 			ret = -ENOSPC;
 	}
 	spin_unlock(&space_info->lock);
-
-	if (reclaim_bytes)
-		btrfs_space_info_add_old_bytes(fs_info, space_info,
-					       reclaim_bytes);
 	ASSERT(list_empty(&ticket->list));
 	return ret;
 }
@@ -1000,7 +993,6 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
 	 * the list and we will do our own flushing further down.
 	 */
 	if (ret && flush != BTRFS_RESERVE_NO_FLUSH) {
-		ticket.orig_bytes = orig_bytes;
 		ticket.bytes = orig_bytes;
 		ticket.error = 0;
 		init_waitqueue_head(&ticket.wait);
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 0e805b5b1fca..d61550f06c94 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -70,7 +70,6 @@ struct btrfs_space_info {
 };
 
 struct reserve_ticket {
-	u64 orig_bytes;
 	u64 bytes;
 	int error;
 	struct list_head list;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (7 preceding siblings ...)
  2019-08-22 19:11 ` [PATCH 8/9] btrfs: remove orig_bytes from reserve_ticket Josef Bacik
@ 2019-08-22 19:11 ` Josef Bacik
  2019-08-23  8:18   ` Nikolay Borisov
  2019-08-23 12:55 ` [PATCH 0/9][v3] Rework reserve ticket handling David Sterba
  9 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2019-08-22 19:11 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

This name doesn't really fit with how the space reservation stuff works
now, rename it to btrfs_space_info_free_bytes_may_use so it's clear what
the function is doing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-rsv.c   | 5 +++--
 fs/btrfs/delayed-ref.c | 2 +-
 fs/btrfs/space-info.h  | 6 +++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index c64b460a4301..60f313888a7d 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -54,8 +54,9 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
 			spin_unlock(&dest->lock);
 		}
 		if (num_bytes)
-			btrfs_space_info_add_old_bytes(fs_info, space_info,
-						       num_bytes);
+			btrfs_space_info_free_bytes_may_use(fs_info,
+							    space_info,
+							    num_bytes);
 	}
 	if (qgroup_to_release_ret)
 		*qgroup_to_release_ret = qgroup_to_release;
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 9a91d1eb0af4..3822edbf54a7 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -158,7 +158,7 @@ void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
 		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
 					      0, num_bytes, 1);
 	if (to_free)
-		btrfs_space_info_add_old_bytes(fs_info,
+		btrfs_space_info_free_bytes_may_use(fs_info,
 				delayed_refs_rsv->space_info, to_free);
 }
 
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index d61550f06c94..c93fe9808dc0 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -130,9 +130,9 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
 				struct btrfs_space_info *space_info);
 
 static inline void
-btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
-			       struct btrfs_space_info *space_info,
-			       u64 num_bytes)
+btrfs_space_info_free_bytes_may_use(struct btrfs_fs_info *fs_info,
+				    struct btrfs_space_info *space_info,
+				    u64 num_bytes)
 {
 	spin_lock(&space_info->lock);
 	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets
  2019-08-22 19:10 ` [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets Josef Bacik
@ 2019-08-23  7:33   ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2019-08-23  7:33 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 22.08.19 г. 22:10 ч., Josef Bacik wrote:
> If we already have tickets on the list we don't want to steal their
> reservations.  This is a preparation patch for upcoming changes,
> technically this shouldn't happen today because of the way we add bytes
> to tickets before adding them to the space_info in most cases.
> 
> This does not change the FIFO nature of reserve tickets, it simply
> allows us to enforce it in a different way.  Previously it was enforced
> because any new space would be added to the first ticket on the list,
> which would result in new reservations getting a reserve ticket.  This
> replaces that mechanism by simply checking to see if we have outstanding
> reserve tickets and skipping straight to adding a ticket for our
> reservation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/space-info.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 5f8f65599de1..33fa0ba49759 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -993,6 +993,7 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
>  	struct reserve_ticket ticket;
>  	u64 used;
>  	int ret = 0;
> +	bool pending_tickets;
>  
>  	ASSERT(orig_bytes);
>  	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_ALL);
> @@ -1000,14 +1001,17 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
>  	spin_lock(&space_info->lock);
>  	ret = -ENOSPC;
>  	used = btrfs_space_info_used(space_info, true);
> +	pending_tickets = !list_empty(&space_info->tickets) ||
> +		!list_empty(&space_info->priority_tickets);
>  
>  	/*
>  	 * Carry on if we have enough space (short-circuit) OR call
>  	 * can_overcommit() to ensure we can overcommit to continue.
>  	 */
> -	if ((used + orig_bytes <= space_info->total_bytes) ||
> -	    can_overcommit(fs_info, space_info, orig_bytes, flush,
> -			   system_chunk)) {
> +	if (!pending_tickets &&
> +	    ((used + orig_bytes <= space_info->total_bytes) ||
> +	     can_overcommit(fs_info, space_info, orig_bytes, flush,
> +			   system_chunk))) {
>  		btrfs_space_info_update_bytes_may_use(fs_info, space_info,
>  						      orig_bytes);
>  		trace_btrfs_space_reservation(fs_info, "space_info",
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes
  2019-08-22 19:10 ` [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes Josef Bacik
@ 2019-08-23  7:48   ` Nikolay Borisov
  2019-08-23 12:30     ` David Sterba
  2019-08-28 15:15   ` [PATCH][v2] btrfs: stop partially refilling tickets when releasing space Josef Bacik
  1 sibling, 1 reply; 23+ messages in thread
From: Nikolay Borisov @ 2019-08-23  7:48 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 22.08.19 г. 22:10 ч., Josef Bacik wrote:
> If there are pending tickets and we are overcommitted we will simply
> return free'd reservations to space_info->bytes_may_use if we cannot
> overcommit any more.  This is problematic because we assume any free
> space would have been added to the tickets, and so if we go from an
> overcommitted state to not overcommitted we could have plenty of space
> in our space_info but be unable to make progress on our tickets because
> we only refill tickets from previous reservations.
> 
> Consider the following example.  We were allowed to overcommit to
> space_info->total_bytes + 2mib.  Now we've allocated all of our chunks
> and are no longer allowed to overcommit those extra 2mib.  Assume there
> is a 3mib reservation we are now freeing.  Because we can no longer
> overcommit we do not partially refill the ticket with the 3mib, instead
> we subtract it from space_info->bytes_may_use.  Now the total reserved
> space is 1mib less than total_bytes, meaning we have 1mib of space we
> could reserve.  Now assume that our ticket is 2mib, and we only have
> 1mib of space to reclaim, so we have a partial refilling to 1mib.  We
> keep trying to flush and eventually give up and ENOSPC the ticket, when
> there was the remaining 1mib left in the space_info for usage.

The wording of this paragraph makes it a bit hard to understand. How
about something like:

Consider an example where a request is allowed to overcommit
space_info->total_bytes + 2mib. At this point it's no longer possible to
overcommit extra space. At the same time there is an existing 3mib
reservation which is being freed. Due to the existing check failing:

if (check_overcommit &&
  !can_overcommit(fs_info, space_info, 0, flush, false))

btrfs_space_info_add_old_bytes won't partially refill tickets with those
3mib, instead it will subtract them from space_info->bytes_may_use. This
results in the total reserved space being 1mib below
space_info->total_bytes. <You need to expand on where the 2mib ticket
came - was it part of the original reservation that caused the
overcommit or is it a new reservation that comes while we are at 1mb
below space_info->total_bytes>

> 
> Instead of doing this partial filling of tickets dance we need to simply
> add our space to the space_info, and then do the normal check to see if
> we can satisfy the whole reservation.  If we can then we wake up the
> ticket and carry on.  This solves the above problem and makes it much
> more straightforward to understand how the tickets are satisfied.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/space-info.c | 43 ++++++++++++++++---------------------------
>  1 file changed, 16 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index a0a36d5768e1..357fe7548e07 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -233,52 +233,41 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
>  				    struct btrfs_space_info *space_info,
>  				    u64 num_bytes)
>  {
> -	struct reserve_ticket *ticket;
>  	struct list_head *head;
> -	u64 used;
>  	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
> -	bool check_overcommit = false;
>  
>  	spin_lock(&space_info->lock);
>  	head = &space_info->priority_tickets;
> +	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
>  
> -	/*
> -	 * If we are over our limit then we need to check and see if we can
> -	 * overcommit, and if we can't then we just need to free up our space
> -	 * and not satisfy any requests.
> -	 */
> -	used = btrfs_space_info_used(space_info, true);
> -	if (used - num_bytes >= space_info->total_bytes)
> -		check_overcommit = true;
>  again:
> -	while (!list_empty(head) && num_bytes) {
> -		ticket = list_first_entry(head, struct reserve_ticket,
> -					  list);
> -		/*
> -		 * We use 0 bytes because this space is already reserved, so
> -		 * adding the ticket space would be a double count.
> -		 */
> -		if (check_overcommit &&
> -		    !can_overcommit(fs_info, space_info, 0, flush, false))
> -			break;
> -		if (num_bytes >= ticket->bytes) {
> +	while (!list_empty(head)) {
> +		struct reserve_ticket *ticket;
> +		u64 used = btrfs_space_info_used(space_info, true);
> +
> +		ticket = list_first_entry(head, struct reserve_ticket, list);
> +
> +		/* Check and see if our ticket can be satisified now. */
> +		if ((used + ticket->bytes <= space_info->total_bytes) ||
> +		    can_overcommit(fs_info, space_info, ticket->bytes, flush,
> +				   false)) {
> +			btrfs_space_info_update_bytes_may_use(fs_info,
> +							      space_info,
> +							      ticket->bytes);
>  			list_del_init(&ticket->list);
> -			num_bytes -= ticket->bytes;
>  			ticket->bytes = 0;
>  			space_info->tickets_id++;
>  			wake_up(&ticket->wait);
>  		} else {
> -			ticket->bytes -= num_bytes;
> -			num_bytes = 0;
> +			break;
>  		}
>  	}
>  
> -	if (num_bytes && head == &space_info->priority_tickets) {
> +	if (head == &space_info->priority_tickets) {
>  		head = &space_info->tickets;
>  		flush = BTRFS_RESERVE_FLUSH_ALL;
>  		goto again;
>  	}
> -	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
>  	spin_unlock(&space_info->lock);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/9] btrfs: rework wake_all_tickets
  2019-08-22 19:10 ` [PATCH 6/9] btrfs: rework wake_all_tickets Josef Bacik
@ 2019-08-23  8:17   ` Nikolay Borisov
  2019-08-27 13:04     ` David Sterba
  2019-08-28 15:12   ` [PATCH][v2] " Josef Bacik
  1 sibling, 1 reply; 23+ messages in thread
From: Nikolay Borisov @ 2019-08-23  8:17 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 22.08.19 г. 22:10 ч., Josef Bacik wrote:
> Now that we no longer partially fill tickets we need to rework
> wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
> if any subsequent tickets are able to be satisfied.  If our tickets_id
> changes we know something happened and we can keep flushing.
> 
> Also if we find a ticket that is smaller than the first ticket in our
> queue then we want to retry the flushing loop again in case
> may_commit_transaction() decides we could satisfy the ticket by
> committing the transaction.
> 
> Rename this to maybe_fail_all_tickets() while we're at it, to better
> reflect what the function is actually doing.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/space-info.c | 41 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index c2143ddb7f4a..dd4adfa75a71 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -679,19 +679,46 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
>  		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
>  }
>  
> -static bool wake_all_tickets(struct list_head *head)
> +static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
> +				   struct btrfs_space_info *space_info)
>  {
>  	struct reserve_ticket *ticket;
> +	u64 tickets_id = space_info->tickets_id;
> +	u64 first_ticket_bytes = 0;
> +
> +	while (!list_empty(&space_info->tickets) &&
> +	       tickets_id == space_info->tickets_id) {
> +		ticket = list_first_entry(&space_info->tickets,
> +					  struct reserve_ticket, list);
> +
> +		/*
> +		 * may_commit_transaction will avoid committing the transaction
> +		 * if it doesn't feel like the space reclaimed by the commit
> +		 * would result in the ticket succeeding.  However if we have a
> +		 * smaller ticket in the queue it may be small enough to be
> +		 * satisified by committing the transaction, so if any
> +		 * subsequent ticket is smaller than the first ticket go ahead
> +		 * and send us back for another loop through the enospc flushing
> +		 * code.
> +		 */
> +		if (first_ticket_bytes == 0)
> +			first_ticket_bytes = ticket->bytes;
> +		else if (first_ticket_bytes > ticket->bytes)
> +			return true;
>  
> -	while (!list_empty(head)) {
> -		ticket = list_first_entry(head, struct reserve_ticket, list);
>  		list_del_init(&ticket->list);
>  		ticket->error = -ENOSPC;
>  		wake_up(&ticket->wait);
> -		if (ticket->bytes != ticket->orig_bytes)
> -			return true;
> +
> +		/*
> +		 * We're just throwing tickets away, so more flushing may not
> +		 * trip over btrfs_try_granting_tickets, so we need to call it
> +		 * here to see if we can make progress with the next ticket in
> +		 * the list.
> +		 */
> +		btrfs_try_granting_tickets(fs_info, space_info);
>  	}
> -	return false;
> +	return (tickets_id != space_info->tickets_id);
>  }
>  
>  /*
> @@ -759,7 +786,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
>  		if (flush_state > COMMIT_TRANS) {
>  			commit_cycles++;
>  			if (commit_cycles > 2) {
> -				if (wake_all_tickets(&space_info->tickets)) {
> +				if (maybe_fail_all_tickets(fs_info, space_info)) {

This looks odd. A function called "maybe_fail" which if it returns true
then we are sure we haven't failed all tickets, instead make another go
through the flushing machinery. I think the problem stems from the fact
it's doing 3 things, namely:

1. Failing all tickets, that aren't smaller than the initial one
2. Trying to satisfy other tickets apart from the one failed
3. If it succeeded it signals to the flushing machinery to make another go

The function's name really reflects what's going on in 1. But 2 and 3
are also major part of the logic. I think there is 'impedance mismatch'
here. I'm at a loss what to do here, honestly.

>  					flush_state = FLUSH_DELAYED_ITEMS_NR;
>  					commit_cycles--;
>  				} else {
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling
  2019-08-22 19:11 ` [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling Josef Bacik
@ 2019-08-23  8:18   ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2019-08-23  8:18 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 22.08.19 г. 22:11 ч., Josef Bacik wrote:
> Now that we aren't partially filling tickets we may have some slack
> space left in the space_info.  We need to account for this in
> may_commit_transaction, otherwise we may choose to not commit the
> transaction despite it actually having enough space to satisfy our
> ticket.
> 
> Calculate the free space we have in the space_info, if any, and subtract
> this from the ticket we have and use that amount to determine if we will
> need to commit to reclaim enough space.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/space-info.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index dd4adfa75a71..cec6db0c41cc 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -473,12 +473,19 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	struct btrfs_trans_handle *trans;
>  	u64 bytes_needed;
>  	u64 reclaim_bytes = 0;
> +	u64 cur_free_bytes = 0;
>  
>  	trans = (struct btrfs_trans_handle *)current->journal_info;
>  	if (trans)
>  		return -EAGAIN;
>  
>  	spin_lock(&space_info->lock);
> +	cur_free_bytes = btrfs_space_info_used(space_info, true);
> +	if (cur_free_bytes < space_info->total_bytes)
> +		cur_free_bytes = space_info->total_bytes - cur_free_bytes;
> +	else
> +		cur_free_bytes = 0;
> +
>  	if (!list_empty(&space_info->priority_tickets))
>  		ticket = list_first_entry(&space_info->priority_tickets,
>  					  struct reserve_ticket, list);
> @@ -486,6 +493,11 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  		ticket = list_first_entry(&space_info->tickets,
>  					  struct reserve_ticket, list);
>  	bytes_needed = (ticket) ? ticket->bytes : 0;
> +
> +	if (bytes_needed > cur_free_bytes)
> +		bytes_needed -= cur_free_bytes;
> +	else
> +		bytes_needed = 0;
>  	spin_unlock(&space_info->lock);
>  
>  	if (!bytes_needed)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes
  2019-08-22 19:11 ` [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes Josef Bacik
@ 2019-08-23  8:18   ` Nikolay Borisov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikolay Borisov @ 2019-08-23  8:18 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-btrfs



On 22.08.19 г. 22:11 ч., Josef Bacik wrote:
> This name doesn't really fit with how the space reservation stuff works
> now, rename it to btrfs_space_info_free_bytes_may_use so it's clear what
> the function is doing.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/block-rsv.c   | 5 +++--
>  fs/btrfs/delayed-ref.c | 2 +-
>  fs/btrfs/space-info.h  | 6 +++---
>  3 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
> index c64b460a4301..60f313888a7d 100644
> --- a/fs/btrfs/block-rsv.c
> +++ b/fs/btrfs/block-rsv.c
> @@ -54,8 +54,9 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  			spin_unlock(&dest->lock);
>  		}
>  		if (num_bytes)
> -			btrfs_space_info_add_old_bytes(fs_info, space_info,
> -						       num_bytes);
> +			btrfs_space_info_free_bytes_may_use(fs_info,
> +							    space_info,
> +							    num_bytes);
>  	}
>  	if (qgroup_to_release_ret)
>  		*qgroup_to_release_ret = qgroup_to_release;
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 9a91d1eb0af4..3822edbf54a7 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -158,7 +158,7 @@ void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
>  		trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv",
>  					      0, num_bytes, 1);
>  	if (to_free)
> -		btrfs_space_info_add_old_bytes(fs_info,
> +		btrfs_space_info_free_bytes_may_use(fs_info,
>  				delayed_refs_rsv->space_info, to_free);
>  }
>  
> diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
> index d61550f06c94..c93fe9808dc0 100644
> --- a/fs/btrfs/space-info.h
> +++ b/fs/btrfs/space-info.h
> @@ -130,9 +130,9 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
>  				struct btrfs_space_info *space_info);
>  
>  static inline void
> -btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
> -			       struct btrfs_space_info *space_info,
> -			       u64 num_bytes)
> +btrfs_space_info_free_bytes_may_use(struct btrfs_fs_info *fs_info,
> +				    struct btrfs_space_info *space_info,
> +				    u64 num_bytes)
>  {
>  	spin_lock(&space_info->lock);
>  	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper
  2019-08-22 19:10 ` [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper Josef Bacik
@ 2019-08-23 12:12   ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2019-08-23 12:12 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On Thu, Aug 22, 2019 at 03:10:55PM -0400, Josef Bacik wrote:
> We duplicate this tracepoint everywhere we call these helpers, so update
> the helper to have the tracepoint as well.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes
  2019-08-22 19:10 ` [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes Josef Bacik
@ 2019-08-23 12:17   ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2019-08-23 12:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On Thu, Aug 22, 2019 at 03:10:56PM -0400, Josef Bacik wrote:
> I noticed when folding the trace_btrfs_space_reservation() tracepoint
> into the btrfs_space_info_update_* helpers that we didn't emit a
> tracepoint when doing btrfs_add_reserved_bytes().  I know this is
> because we were swapping bytes_may_use for bytes_reserved, so in my mind
> there was no reason to have the tracepoint there.  But now there is
> because we always emit the unreserve for the bytes_may_use side, and
> this would have broken if compression was on anyway.  Add a tracepoint
> to cover the bytes_reserved counter so the math still comes out right.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes
  2019-08-23  7:48   ` Nikolay Borisov
@ 2019-08-23 12:30     ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2019-08-23 12:30 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, kernel-team, linux-btrfs

On Fri, Aug 23, 2019 at 10:48:35AM +0300, Nikolay Borisov wrote:
> 
> 
> On 22.08.19 г. 22:10 ч., Josef Bacik wrote:
> > If there are pending tickets and we are overcommitted we will simply
> > return free'd reservations to space_info->bytes_may_use if we cannot
> > overcommit any more.  This is problematic because we assume any free
> > space would have been added to the tickets, and so if we go from an
> > overcommitted state to not overcommitted we could have plenty of space
> > in our space_info but be unable to make progress on our tickets because
> > we only refill tickets from previous reservations.
> > 
> > Consider the following example.  We were allowed to overcommit to
> > space_info->total_bytes + 2mib.  Now we've allocated all of our chunks
> > and are no longer allowed to overcommit those extra 2mib.  Assume there
> > is a 3mib reservation we are now freeing.  Because we can no longer
> > overcommit we do not partially refill the ticket with the 3mib, instead
> > we subtract it from space_info->bytes_may_use.  Now the total reserved
> > space is 1mib less than total_bytes, meaning we have 1mib of space we
> > could reserve.  Now assume that our ticket is 2mib, and we only have
> > 1mib of space to reclaim, so we have a partial refilling to 1mib.  We
> > keep trying to flush and eventually give up and ENOSPC the ticket, when
> > there was the remaining 1mib left in the space_info for usage.
> 
> The wording of this paragraph makes it a bit hard to understand. How
> about something like:

I got lost there too. It's hard too keep track of what changes,
something a bit more strucutred could help understanding it.

Also the subject is too generic, I'd suggest "update overcommit logic
when refilling tickets" or something like that. Using 'rework' or
'revamp' in the subject applies to very few patches.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/9][v3] Rework reserve ticket handling
  2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
                   ` (8 preceding siblings ...)
  2019-08-22 19:11 ` [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes Josef Bacik
@ 2019-08-23 12:55 ` David Sterba
  2019-08-28 18:02   ` David Sterba
  9 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2019-08-23 12:55 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On Thu, Aug 22, 2019 at 03:10:53PM -0400, Josef Bacik wrote:
> This is the next round of my reserve ticket refinements.  Most of the changes
> are just fixing issues brought up by review.  The updated diffstat is as follows
> 
>  fs/btrfs/block-group.c    |   5 +-
>  fs/btrfs/block-rsv.c      |  10 +--
>  fs/btrfs/delalloc-space.c |   4 --
>  fs/btrfs/delayed-ref.c    |   2 +-
>  fs/btrfs/extent-tree.c    |  13 +---
>  fs/btrfs/space-info.c     | 171 +++++++++++++++++++---------------------------
>  fs/btrfs/space-info.h     |  30 +++++---
>  7 files changed, 98 insertions(+), 137 deletions(-)

I'll add the series to for-next as topic branch, the comments seem to be
more in the changelog or function names, that I'll updated and fold to
the patches (no need to resend the whole patchset).

We're running out of time before the merge window freeze, so the enospc
updates are probably the last big thing going in, I'll continue with the
remaining patchsets and will try to push them in for-next today.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/9] btrfs: rework wake_all_tickets
  2019-08-23  8:17   ` Nikolay Borisov
@ 2019-08-27 13:04     ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2019-08-27 13:04 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, kernel-team, linux-btrfs

On Fri, Aug 23, 2019 at 11:17:06AM +0300, Nikolay Borisov wrote:
> > @@ -759,7 +786,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
> >  		if (flush_state > COMMIT_TRANS) {
> >  			commit_cycles++;
> >  			if (commit_cycles > 2) {
> > -				if (wake_all_tickets(&space_info->tickets)) {
> > +				if (maybe_fail_all_tickets(fs_info, space_info)) {
> 
> This looks odd. A function called "maybe_fail" which if it returns true
> then we are sure we haven't failed all tickets, instead make another go
> through the flushing machinery. I think the problem stems from the fact
> it's doing 3 things, namely:
> 
> 1. Failing all tickets, that aren't smaller than the initial one
> 2. Trying to satisfy other tickets apart from the one failed
> 3. If it succeeded it signals to the flushing machinery to make another go
> 
> The function's name really reflects what's going on in 1. But 2 and 3
> are also major part of the logic. I think there is 'impedance mismatch'
> here. I'm at a loss what to do here, honestly.

The function is quite short and splitting it may not be an improvement,
so the semantics should be at least documented, the 3 points you write
look comprehensible so I'd stick that to the function. As this is not
functional change documentation is probably best we can do now to move
forward.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH][v2] btrfs: rework wake_all_tickets
  2019-08-22 19:10 ` [PATCH 6/9] btrfs: rework wake_all_tickets Josef Bacik
  2019-08-23  8:17   ` Nikolay Borisov
@ 2019-08-28 15:12   ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-28 15:12 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

Now that we no longer partially fill tickets we need to rework
wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
if any subsequent tickets are able to be satisfied.  If our tickets_id
changes we know something happened and we can keep flushing.

Also if we find a ticket that is smaller than the first ticket in our
queue then we want to retry the flushing loop again in case
may_commit_transaction() decides we could satisfy the ticket by
committing the transaction.

Rename this to maybe_fail_all_tickets() while we're at it, to better
reflect what the function is actually doing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
v1->v2:
- added a comment for maybe_fail_all_tickets

 fs/btrfs/space-info.c | 56 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c2143ddb7f4a..b2bb9d0bd44e 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -679,19 +679,61 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
 		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
 }
 
-static bool wake_all_tickets(struct list_head *head)
+/*
+ * maybe_fail_all_tickets - we've exhausted our flushing, start failing tickets
+ * @fs_info - fs_info for this fs
+ * @space_info - the space info we were flushing
+ *
+ * We call this when we've exhausted our flushing ability and haven't made
+ * progress in satisfying tickets.  The reservation code handles tickets in
+ * order, so if there is a large ticket first and then smaller ones we could
+ * very well satisfy the smaller tickets.  This will attempt to wake up any
+ * tickets in the list to catch this case.
+ *
+ * This function returns true if it was able to make progress by clearing out
+ * other tickets, or if it stumbles across a ticket that was smaller than the
+ * first ticket.
+ */
+static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
+				   struct btrfs_space_info *space_info)
 {
 	struct reserve_ticket *ticket;
+	u64 tickets_id = space_info->tickets_id;
+	u64 first_ticket_bytes = 0;
+
+	while (!list_empty(&space_info->tickets) &&
+	       tickets_id == space_info->tickets_id) {
+		ticket = list_first_entry(&space_info->tickets,
+					  struct reserve_ticket, list);
+
+		/*
+		 * may_commit_transaction will avoid committing the transaction
+		 * if it doesn't feel like the space reclaimed by the commit
+		 * would result in the ticket succeeding.  However if we have a
+		 * smaller ticket in the queue it may be small enough to be
+		 * satisified by committing the transaction, so if any
+		 * subsequent ticket is smaller than the first ticket go ahead
+		 * and send us back for another loop through the enospc flushing
+		 * code.
+		 */
+		if (first_ticket_bytes == 0)
+			first_ticket_bytes = ticket->bytes;
+		else if (first_ticket_bytes > ticket->bytes)
+			return true;
 
-	while (!list_empty(head)) {
-		ticket = list_first_entry(head, struct reserve_ticket, list);
 		list_del_init(&ticket->list);
 		ticket->error = -ENOSPC;
 		wake_up(&ticket->wait);
-		if (ticket->bytes != ticket->orig_bytes)
-			return true;
+
+		/*
+		 * We're just throwing tickets away, so more flushing may not
+		 * trip over btrfs_try_granting_tickets, so we need to call it
+		 * here to see if we can make progress with the next ticket in
+		 * the list.
+		 */
+		btrfs_try_granting_tickets(fs_info, space_info);
 	}
-	return false;
+	return (tickets_id != space_info->tickets_id);
 }
 
 /*
@@ -759,7 +801,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		if (flush_state > COMMIT_TRANS) {
 			commit_cycles++;
 			if (commit_cycles > 2) {
-				if (wake_all_tickets(&space_info->tickets)) {
+				if (maybe_fail_all_tickets(fs_info, space_info)) {
 					flush_state = FLUSH_DELAYED_ITEMS_NR;
 					commit_cycles--;
 				} else {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH][v2] btrfs: stop partially refilling tickets when releasing space
  2019-08-22 19:10 ` [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes Josef Bacik
  2019-08-23  7:48   ` Nikolay Borisov
@ 2019-08-28 15:15   ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2019-08-28 15:15 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_space_info_add_old_bytes is used when adding the extra space from
an existing reservation back into the space_info to be used by any
waiting tickets.  In order to keep us from overcommitting we check to
make sure that we can still use this space for our reserve ticket, and
if we cannot we'll simply subtract it from space_info->bytes_may_use.

However this is problematic, because it assumes that only changes to
bytes_may_use would affect our ability to make reservations.  Any
changes to bytes_reserved would be missed.  If we were unable to make a
reservation prior because of reserved space, but that reserved space was
free'd due to unlink or truncate and we were allowed to immediately
reclaim that metadata space we would still ENOSPC.

Consider the example where we create a file with a bunch of extents,
using up 2mib of actual space for the new tree blocks.  Then we try to
make a reservation of 2mib but we do not have enough space to make this
reservation.  The iput() occurs in another thread and we remove this
space, and since we did not write the blocks we simply do
space_info->bytes_reserved -= 2mib.  We would never see this because we
do not check our space info used, we just try to re-use the freed
reservations.

To fix this problem, and to greatly simplify the wakeup code, do away
with this partial refilling nonsense.  Use
btrfs_space_info_add_old_bytes to subtract the reservation from
space_info->bytes_may_use, and then check the ticket against the total
used of the space_info the same way we do with the initial reservation
attempt.

This keeps the reservation logic consistent and solves the problem of
early ENOSPC in the case that we free up space in places other than
bytes_may_use and bytes_pinned.  Thanks,

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
v1->v2:
- Simply updated the changelog, what I was describing couldn't actually happen.
  I went back and re-ran tests and added in tracing and realized it was
  bytes_reserved that was changing without telling anybody, not that we were
  removing more of bytes_may_use than expected.  Updated the changelog to
  reflect this as well as hopefully make it clearer the motivation for the
  change.

 fs/btrfs/space-info.c | 43 ++++++++++++++++---------------------------
 1 file changed, 16 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index a0a36d5768e1..357fe7548e07 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -233,52 +233,41 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info,
 				    struct btrfs_space_info *space_info,
 				    u64 num_bytes)
 {
-	struct reserve_ticket *ticket;
 	struct list_head *head;
-	u64 used;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
-	bool check_overcommit = false;
 
 	spin_lock(&space_info->lock);
 	head = &space_info->priority_tickets;
+	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
 
-	/*
-	 * If we are over our limit then we need to check and see if we can
-	 * overcommit, and if we can't then we just need to free up our space
-	 * and not satisfy any requests.
-	 */
-	used = btrfs_space_info_used(space_info, true);
-	if (used - num_bytes >= space_info->total_bytes)
-		check_overcommit = true;
 again:
-	while (!list_empty(head) && num_bytes) {
-		ticket = list_first_entry(head, struct reserve_ticket,
-					  list);
-		/*
-		 * We use 0 bytes because this space is already reserved, so
-		 * adding the ticket space would be a double count.
-		 */
-		if (check_overcommit &&
-		    !can_overcommit(fs_info, space_info, 0, flush, false))
-			break;
-		if (num_bytes >= ticket->bytes) {
+	while (!list_empty(head)) {
+		struct reserve_ticket *ticket;
+		u64 used = btrfs_space_info_used(space_info, true);
+
+		ticket = list_first_entry(head, struct reserve_ticket, list);
+
+		/* Check and see if our ticket can be satisified now. */
+		if ((used + ticket->bytes <= space_info->total_bytes) ||
+		    can_overcommit(fs_info, space_info, ticket->bytes, flush,
+				   false)) {
+			btrfs_space_info_update_bytes_may_use(fs_info,
+							      space_info,
+							      ticket->bytes);
 			list_del_init(&ticket->list);
-			num_bytes -= ticket->bytes;
 			ticket->bytes = 0;
 			space_info->tickets_id++;
 			wake_up(&ticket->wait);
 		} else {
-			ticket->bytes -= num_bytes;
-			num_bytes = 0;
+			break;
 		}
 	}
 
-	if (num_bytes && head == &space_info->priority_tickets) {
+	if (head == &space_info->priority_tickets) {
 		head = &space_info->tickets;
 		flush = BTRFS_RESERVE_FLUSH_ALL;
 		goto again;
 	}
-	btrfs_space_info_update_bytes_may_use(fs_info, space_info, -num_bytes);
 	spin_unlock(&space_info->lock);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/9][v3] Rework reserve ticket handling
  2019-08-23 12:55 ` [PATCH 0/9][v3] Rework reserve ticket handling David Sterba
@ 2019-08-28 18:02   ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2019-08-28 18:02 UTC (permalink / raw)
  To: dsterba, Josef Bacik, kernel-team, linux-btrfs

On Fri, Aug 23, 2019 at 02:55:33PM +0200, David Sterba wrote:
> On Thu, Aug 22, 2019 at 03:10:53PM -0400, Josef Bacik wrote:
> > This is the next round of my reserve ticket refinements.  Most of the changes
> > are just fixing issues brought up by review.  The updated diffstat is as follows
> > 
> >  fs/btrfs/block-group.c    |   5 +-
> >  fs/btrfs/block-rsv.c      |  10 +--
> >  fs/btrfs/delalloc-space.c |   4 --
> >  fs/btrfs/delayed-ref.c    |   2 +-
> >  fs/btrfs/extent-tree.c    |  13 +---
> >  fs/btrfs/space-info.c     | 171 +++++++++++++++++++---------------------------
> >  fs/btrfs/space-info.h     |  30 +++++---
> >  7 files changed, 98 insertions(+), 137 deletions(-)
> 
> I'll add the series to for-next as topic branch, the comments seem to be
> more in the changelog or function names, that I'll updated and fold to
> the patches (no need to resend the whole patchset).

Updated and pushed to misc-next.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-08-28 18:02 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 19:10 [PATCH 0/9][v3] Rework reserve ticket handling Josef Bacik
2019-08-22 19:10 ` [PATCH 1/9] btrfs: do not allow reservations if we have pending tickets Josef Bacik
2019-08-23  7:33   ` Nikolay Borisov
2019-08-22 19:10 ` [PATCH 2/9] btrfs: roll tracepoint into btrfs_space_info_update helper Josef Bacik
2019-08-23 12:12   ` David Sterba
2019-08-22 19:10 ` [PATCH 3/9] btrfs: add space reservation tracepoint for reserved bytes Josef Bacik
2019-08-23 12:17   ` David Sterba
2019-08-22 19:10 ` [PATCH 4/9] btrfs: rework btrfs_space_info_add_old_bytes Josef Bacik
2019-08-23  7:48   ` Nikolay Borisov
2019-08-23 12:30     ` David Sterba
2019-08-28 15:15   ` [PATCH][v2] btrfs: stop partially refilling tickets when releasing space Josef Bacik
2019-08-22 19:10 ` [PATCH 5/9] btrfs: refactor the ticket wakeup code Josef Bacik
2019-08-22 19:10 ` [PATCH 6/9] btrfs: rework wake_all_tickets Josef Bacik
2019-08-23  8:17   ` Nikolay Borisov
2019-08-27 13:04     ` David Sterba
2019-08-28 15:12   ` [PATCH][v2] " Josef Bacik
2019-08-22 19:11 ` [PATCH 7/9] btrfs: fix may_commit_transaction to deal with no partial filling Josef Bacik
2019-08-23  8:18   ` Nikolay Borisov
2019-08-22 19:11 ` [PATCH 8/9] btrfs: remove orig_bytes from reserve_ticket Josef Bacik
2019-08-22 19:11 ` [PATCH 9/9] btrfs: rename btrfs_space_info_add_old_bytes Josef Bacik
2019-08-23  8:18   ` Nikolay Borisov
2019-08-23 12:55 ` [PATCH 0/9][v3] Rework reserve ticket handling David Sterba
2019-08-28 18:02   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).