All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects
@ 2020-04-07 10:38 fdmanana
  2020-04-07 11:33 ` Nikolay Borisov
  2020-04-08 17:18 ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: fdmanana @ 2020-04-07 10:38 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Whenever we add a ticket to a space_info object we increment the object's
reclaim_size counter witht the ticket's bytes, and we decrement it with
the corresponding amount only when we are able to grant the requested
space to the ticket. When we are not able to grant the space to a ticket,
or when the ticket is removed due to a signal (e.g. an application has
received sigterm from the terminal) we never decrement the counter with
the corresponding bytes from the ticket. This leak can result in the
space reclaim code to later do much more work than necessary. So fix it
by decrementing the counter when those two cases happen as well.

Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/block-group.c |  1 +
 fs/btrfs/space-info.c  | 20 ++++++++++++++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 786849fcc319..47f66c6a7d7f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3370,6 +3370,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 			    space_info->bytes_reserved > 0 ||
 			    space_info->bytes_may_use > 0))
 			btrfs_dump_space_info(info, space_info, 0, 0);
+		WARN_ON(space_info->reclaim_size > 0);
 		list_del(&space_info->list);
 		btrfs_sysfs_remove_space_info(space_info);
 	}
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 8b0fe053a25d..ff17a4420358 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -361,6 +361,16 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static void remove_ticket(struct btrfs_space_info *space_info,
+			  struct reserve_ticket *ticket)
+{
+	if (!list_empty(&ticket->list)) {
+		list_del_init(&ticket->list);
+		ASSERT(space_info->reclaim_size >= ticket->bytes);
+		space_info->reclaim_size -= ticket->bytes;
+	}
+}
+
 /*
  * This is for space we already have accounted in space_info->bytes_may_use, so
  * basically when we're returning space from block_rsv's.
@@ -388,9 +398,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
 			btrfs_space_info_update_bytes_may_use(fs_info,
 							      space_info,
 							      ticket->bytes);
-			list_del_init(&ticket->list);
-			ASSERT(space_info->reclaim_size >= ticket->bytes);
-			space_info->reclaim_size -= ticket->bytes;
+			remove_ticket(space_info, ticket);
 			ticket->bytes = 0;
 			space_info->tickets_id++;
 			wake_up(&ticket->wait);
@@ -899,7 +907,7 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
 			btrfs_info(fs_info, "failing ticket with %llu bytes",
 				   ticket->bytes);
 
-		list_del_init(&ticket->list);
+		remove_ticket(space_info, ticket);
 		ticket->error = -ENOSPC;
 		wake_up(&ticket->wait);
 
@@ -1063,7 +1071,7 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
 			 * despite getting an error, resulting in a space leak
 			 * (bytes_may_use counter of our space_info).
 			 */
-			list_del_init(&ticket->list);
+			remove_ticket(space_info, ticket);
 			ticket->error = -EINTR;
 			break;
 		}
@@ -1121,7 +1129,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
 		 * either the async reclaim job deletes the ticket from the list
 		 * or we delete it ourselves at wait_reserve_ticket().
 		 */
-		list_del_init(&ticket->list);
+		remove_ticket(space_info, ticket);
 		if (!ret)
 			ret = -ENOSPC;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects
  2020-04-07 10:38 [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects fdmanana
@ 2020-04-07 11:33 ` Nikolay Borisov
  2020-04-08 17:18 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: Nikolay Borisov @ 2020-04-07 11:33 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 7.04.20 г. 13:38 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Whenever we add a ticket to a space_info object we increment the object's
> reclaim_size counter witht the ticket's bytes, and we decrement it with
> the corresponding amount only when we are able to grant the requested
> space to the ticket. When we are not able to grant the space to a ticket,
> or when the ticket is removed due to a signal (e.g. an application has
> received sigterm from the terminal) we never decrement the counter with
> the corresponding bytes from the ticket. This leak can result in the
> space reclaim code to later do much more work than necessary. So fix it
> by decrementing the counter when those two cases happen as well.
> 
> Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Doh, you are correct. I especially like it you've actually factored
ticket removal into a function.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/block-group.c |  1 +
>  fs/btrfs/space-info.c  | 20 ++++++++++++++------
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 786849fcc319..47f66c6a7d7f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3370,6 +3370,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>  			    space_info->bytes_reserved > 0 ||
>  			    space_info->bytes_may_use > 0))
>  			btrfs_dump_space_info(info, space_info, 0, 0);
> +		WARN_ON(space_info->reclaim_size > 0);
>  		list_del(&space_info->list);
>  		btrfs_sysfs_remove_space_info(space_info);
>  	}
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 8b0fe053a25d..ff17a4420358 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -361,6 +361,16 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
>  	return 0;
>  }
>  
> +static void remove_ticket(struct btrfs_space_info *space_info,
> +			  struct reserve_ticket *ticket)
> +{
> +	if (!list_empty(&ticket->list)) {
> +		list_del_init(&ticket->list);
> +		ASSERT(space_info->reclaim_size >= ticket->bytes);
> +		space_info->reclaim_size -= ticket->bytes;
> +	}
> +}
> +
>  /*
>   * This is for space we already have accounted in space_info->bytes_may_use, so
>   * basically when we're returning space from block_rsv's.
> @@ -388,9 +398,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
>  			btrfs_space_info_update_bytes_may_use(fs_info,
>  							      space_info,
>  							      ticket->bytes);
> -			list_del_init(&ticket->list);
> -			ASSERT(space_info->reclaim_size >= ticket->bytes);
> -			space_info->reclaim_size -= ticket->bytes;
> +			remove_ticket(space_info, ticket);
>  			ticket->bytes = 0;
>  			space_info->tickets_id++;
>  			wake_up(&ticket->wait);
> @@ -899,7 +907,7 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
>  			btrfs_info(fs_info, "failing ticket with %llu bytes",
>  				   ticket->bytes);
>  
> -		list_del_init(&ticket->list);
> +		remove_ticket(space_info, ticket);
>  		ticket->error = -ENOSPC;
>  		wake_up(&ticket->wait);
>  
> @@ -1063,7 +1071,7 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
>  			 * despite getting an error, resulting in a space leak
>  			 * (bytes_may_use counter of our space_info).
>  			 */
> -			list_del_init(&ticket->list);
> +			remove_ticket(space_info, ticket);
>  			ticket->error = -EINTR;
>  			break;
>  		}
> @@ -1121,7 +1129,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
>  		 * either the async reclaim job deletes the ticket from the list
>  		 * or we delete it ourselves at wait_reserve_ticket().
>  		 */
> -		list_del_init(&ticket->list);
> +		remove_ticket(space_info, ticket);
>  		if (!ret)
>  			ret = -ENOSPC;
>  	}
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects
  2020-04-07 10:38 [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects fdmanana
  2020-04-07 11:33 ` Nikolay Borisov
@ 2020-04-08 17:18 ` David Sterba
  2020-04-08 17:44   ` Filipe Manana
  1 sibling, 1 reply; 4+ messages in thread
From: David Sterba @ 2020-04-08 17:18 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Tue, Apr 07, 2020 at 11:38:49AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Whenever we add a ticket to a space_info object we increment the object's
> reclaim_size counter witht the ticket's bytes, and we decrement it with
> the corresponding amount only when we are able to grant the requested
> space to the ticket. When we are not able to grant the space to a ticket,
> or when the ticket is removed due to a signal (e.g. an application has
> received sigterm from the terminal) we never decrement the counter with
> the corresponding bytes from the ticket. This leak can result in the
> space reclaim code to later do much more work than necessary. So fix it
> by decrementing the counter when those two cases happen as well.
> 
> Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

There's a minor conflict with Josef's patch "btrfs: run
btrfs_try_granting_tickets if a priority ticket fails" so I'll apply
yours to misc-5.7 as it's a regression fix.

> @@ -1121,7 +1129,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
>  		 * either the async reclaim job deletes the ticket from the list
>  		 * or we delete it ourselves at wait_reserve_ticket().
>  		 */
> -		list_del_init(&ticket->list);
> +		remove_ticket(space_info, ticket);
>  		if (!ret)
>  			ret = -ENOSPC;
>  	}

The conflicting hunk is:

--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1156,11 +1156,17 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
        ret = ticket->error;
        if (ticket->bytes || ticket->error) {
                /*
-                * Need to delete here for priority tickets. For regular tickets
-                * either the async reclaim job deletes the ticket from the list
-                * or we delete it ourselves at wait_reserve_ticket().
+                * We were a priority ticket, so we need to delete ourselves
+                * from the list.  Because we could have other priority tickets
+                * behind us that require less space, run
+                * btrfs_try_granting_tickets() to see if their reservations can
+                * now be made.
                 */
-               list_del_init(&ticket->list);
+               if (!list_empty(&ticket->list)) {
+                       list_del_init(&ticket->list);
+                       btrfs_try_granting_tickets(fs_info, space_info);
+               }
+
                if (!ret)
                        ret = -ENOSPC;
        }
---

so I assume the correct fixup is to replace list_del_init with
remove_ticket.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects
  2020-04-08 17:18 ` David Sterba
@ 2020-04-08 17:44   ` Filipe Manana
  0 siblings, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2020-04-08 17:44 UTC (permalink / raw)
  To: dsterba, Filipe Manana, linux-btrfs

On Wed, Apr 8, 2020 at 6:19 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Apr 07, 2020 at 11:38:49AM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Whenever we add a ticket to a space_info object we increment the object's
> > reclaim_size counter witht the ticket's bytes, and we decrement it with
> > the corresponding amount only when we are able to grant the requested
> > space to the ticket. When we are not able to grant the space to a ticket,
> > or when the ticket is removed due to a signal (e.g. an application has
> > received sigterm from the terminal) we never decrement the counter with
> > the corresponding bytes from the ticket. This leak can result in the
> > space reclaim code to later do much more work than necessary. So fix it
> > by decrementing the counter when those two cases happen as well.
> >
> > Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> There's a minor conflict with Josef's patch "btrfs: run
> btrfs_try_granting_tickets if a priority ticket fails" so I'll apply
> yours to misc-5.7 as it's a regression fix.
>
> > @@ -1121,7 +1129,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
> >                * either the async reclaim job deletes the ticket from the list
> >                * or we delete it ourselves at wait_reserve_ticket().
> >                */
> > -             list_del_init(&ticket->list);
> > +             remove_ticket(space_info, ticket);
> >               if (!ret)
> >                       ret = -ENOSPC;
> >       }
>
> The conflicting hunk is:
>
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1156,11 +1156,17 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
>         ret = ticket->error;
>         if (ticket->bytes || ticket->error) {
>                 /*
> -                * Need to delete here for priority tickets. For regular tickets
> -                * either the async reclaim job deletes the ticket from the list
> -                * or we delete it ourselves at wait_reserve_ticket().
> +                * We were a priority ticket, so we need to delete ourselves
> +                * from the list.  Because we could have other priority tickets
> +                * behind us that require less space, run
> +                * btrfs_try_granting_tickets() to see if their reservations can
> +                * now be made.
>                  */
> -               list_del_init(&ticket->list);
> +               if (!list_empty(&ticket->list)) {
> +                       list_del_init(&ticket->list);
> +                       btrfs_try_granting_tickets(fs_info, space_info);
> +               }
> +
>                 if (!ret)
>                         ret = -ENOSPC;
>         }
> ---
>
> so I assume the correct fixup is to replace list_del_init with
> remove_ticket.

Yes, correct.
Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-08 17:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07 10:38 [PATCH 1/2] Btrfs: fix reclaim counter leak of space_info objects fdmanana
2020-04-07 11:33 ` Nikolay Borisov
2020-04-08 17:18 ` David Sterba
2020-04-08 17:44   ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.