[PATCH 1/2] btrfs: try to satisfy metadata requests when every flush

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns
@ 2016-09-21  6:59 Wang Xiaoguang
  2016-09-21  6:59 ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space Wang Xiaoguang
  2016-10-07 13:16 ` [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Josef Bacik
  0 siblings, 2 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-09-21  6:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik, dsterba

In flush_space()->shrink_delalloc(), if can_overcommit() returns true,
though no bytes added to space_info, we still may satisfy some requests.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 38c2df8..fdfc97f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4960,6 +4960,43 @@ static void wake_all_tickets(struct list_head *head)
 }
 
 /*
+ * This function must be protected by btrfs_space_info's lock.
+ */
+static void try_to_wake_tickets(struct btrfs_root *root,
+				struct btrfs_space_info *space_info)
+{
+	struct reserve_ticket *ticket;
+	struct list_head *head = &space_info->priority_tickets;
+	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
+	u64 used;
+
+again:
+	while (!list_empty(head)) {
+		ticket = list_first_entry(head, struct reserve_ticket,
+					  list);
+		used = space_info->bytes_used +
+			space_info->bytes_reserved + space_info->bytes_pinned +
+			space_info->bytes_readonly + space_info->bytes_may_use;
+
+		if (used + ticket->bytes <= space_info->total_bytes ||
+		    can_overcommit(root, space_info, ticket->bytes, flush)) {
+			space_info->bytes_may_use += ticket->bytes;
+			list_del_init(&ticket->list);
+			ticket->bytes = 0;
+			space_info->tickets_id++;
+			wake_up(&ticket->wait);
+		} else
+			return;
+	}
+
+	if (head == &space_info->priority_tickets) {
+		head = &space_info->tickets;
+		flush = BTRFS_RESERVE_FLUSH_ALL;
+		goto again;
+	}
+}
+
+/*
  * This is for normal flushers, we can wait all goddamned day if we want to.  We
  * will loop and continuously try to flush as long as we are making progress.
  * We count progress as clearing off tickets each time we have to loop.
@@ -4995,6 +5032,8 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		ret = flush_space(fs_info->fs_root, space_info, to_reclaim,
 			    to_reclaim, flush_state);
 		spin_lock(&space_info->lock);
+		if (!ret)
+			try_to_wake_tickets(fs_info->fs_root, space_info);
 		if (list_empty(&space_info->tickets)) {
 			space_info->flush = 0;
 			spin_unlock(&space_info->lock);
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space
  2016-09-21  6:59 [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Wang Xiaoguang
@ 2016-09-21  6:59 ` Wang Xiaoguang
  2016-09-22  9:25   ` [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim " Wang Xiaoguang
  2016-10-07 13:17   ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming " Josef Bacik
  2016-10-07 13:16 ` [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Josef Bacik
  1 sibling, 2 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-09-21  6:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik, dsterba

Indeed as long as we have delalloc bytes and if we fail to reclaim
requested metadata space, we should write these delalloc bytes and
have one more try, this can fix some false enospc errors in some
extreme cases.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fdfc97f..46c2a37 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4794,6 +4794,15 @@ static int may_commit_transaction(struct btrfs_root *root,
 {
 	struct btrfs_block_rsv *delayed_rsv = &root->fs_info->delayed_block_rsv;
 	struct btrfs_trans_handle *trans;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+
+	/*
+	 * shrink_delalloc() may not write enough delalloc bytes, so here we
+	 * have a last try. Please don't remove these, because these can fix
+	 * some false enospc error in some extreme cases.
+	 */
+	btrfs_start_delalloc_roots(fs_info, 0, -1);
+	btrfs_wait_ordered_roots(fs_info, -1, 0, (u64)-1);
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 	if (trans)
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space
  2016-09-21  6:59 ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space Wang Xiaoguang
@ 2016-09-22  9:25   ` Wang Xiaoguang
  2016-10-07  6:27     ` Wang Xiaoguang
  2016-10-07 13:24     ` Josef Bacik
  2016-10-07 13:17   ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming " Josef Bacik
  1 sibling, 2 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-09-22  9:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik, dsterba

Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all
ordered extents, but I run into some enospc errors when doing large file
create and delete test, it's because shrink_delalloc() does not write
enough delalloc bytes and wait them finished:
    From: Miao Xie <miaox@cn.fujitsu.com>
    Date: Mon, 4 Nov 2013 23:13:25 +0800
    Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents

    It is very likely that there are lots of ordered extents in the filesytem,
    if we wait for the completion of all of them when we want to reclaim some
    space for the metadata space reservation, we would be blocked for a long
    time. The performance would drop down suddenly for a long time.

But since Josef introduced "Btrfs: introduce ticketed enospc infrastructure",
shrink_delalloc() starts to be run asynchronously, then If we want to reclaim
metadata space, we can try harder, after all, false enospc error is not
acceptable.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 46c2a37..f7c420b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig,
 		if (trans)
 			return;
 		if (wait_ordered)
-			btrfs_wait_ordered_roots(root->fs_info, items,
+			btrfs_wait_ordered_roots(root->fs_info, -1,
 						 0, (u64)-1);
 		return;
 	}
@@ -4775,6 +4775,14 @@ skip_async:
 		}
 		delalloc_bytes = percpu_counter_sum_positive(
 						&root->fs_info->delalloc_bytes);
+		if (loops == 2) {
+			/*
+			 * Try to write all current delalloc bytes and wait all
+			 * ordered extents to have a last try.
+			 */
+			to_reclaim = delalloc_bytes;
+			items = -1;
+		}
 	}
 }
 
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space
  2016-09-22  9:25   ` [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim " Wang Xiaoguang
@ 2016-10-07  6:27     ` Wang Xiaoguang
  2016-10-07 13:24     ` Josef Bacik
  1 sibling, 0 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-10-07  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik, dsterba

Hi,

Any comments about these 3 patches, thanks.

Regards,
Xiaoguang Wang

On 09/22/2016 05:25 PM, Wang Xiaoguang wrote:
> Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all
> ordered extents, but I run into some enospc errors when doing large file
> create and delete test, it's because shrink_delalloc() does not write
> enough delalloc bytes and wait them finished:
>      From: Miao Xie <miaox@cn.fujitsu.com>
>      Date: Mon, 4 Nov 2013 23:13:25 +0800
>      Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents
>
>      It is very likely that there are lots of ordered extents in the filesytem,
>      if we wait for the completion of all of them when we want to reclaim some
>      space for the metadata space reservation, we would be blocked for a long
>      time. The performance would drop down suddenly for a long time.
>
> But since Josef introduced "Btrfs: introduce ticketed enospc infrastructure",
> shrink_delalloc() starts to be run asynchronously, then If we want to reclaim
> metadata space, we can try harder, after all, false enospc error is not
> acceptable.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 46c2a37..f7c420b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig,
>   		if (trans)
>   			return;
>   		if (wait_ordered)
> -			btrfs_wait_ordered_roots(root->fs_info, items,
> +			btrfs_wait_ordered_roots(root->fs_info, -1,
>   						 0, (u64)-1);
>   		return;
>   	}
> @@ -4775,6 +4775,14 @@ skip_async:
>   		}
>   		delalloc_bytes = percpu_counter_sum_positive(
>   						&root->fs_info->delalloc_bytes);
> +		if (loops == 2) {
> +			/*
> +			 * Try to write all current delalloc bytes and wait all
> +			 * ordered extents to have a last try.
> +			 */
> +			to_reclaim = delalloc_bytes;
> +			items = -1;
> +		}
>   	}
>   }
>   




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns
  2016-09-21  6:59 [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Wang Xiaoguang
  2016-09-21  6:59 ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space Wang Xiaoguang
@ 2016-10-07 13:16 ` Josef Bacik
  2016-10-10  8:58   ` Wang Xiaoguang
  2016-10-12  7:27   ` Wang Xiaoguang
  1 sibling, 2 replies; 11+ messages in thread
From: Josef Bacik @ 2016-10-07 13:16 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba

On 09/21/2016 02:59 AM, Wang Xiaoguang wrote:
> In flush_space()->shrink_delalloc(), if can_overcommit() returns true,
> though no bytes added to space_info, we still may satisfy some requests.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/extent-tree.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 38c2df8..fdfc97f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4960,6 +4960,43 @@ static void wake_all_tickets(struct list_head *head)
>  }
>
>  /*
> + * This function must be protected by btrfs_space_info's lock.
> + */
> +static void try_to_wake_tickets(struct btrfs_root *root,
> +				struct btrfs_space_info *space_info)
> +{
> +	struct reserve_ticket *ticket;
> +	struct list_head *head = &space_info->priority_tickets;
> +	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
> +	u64 used;
> +
> +again:
> +	while (!list_empty(head)) {
> +		ticket = list_first_entry(head, struct reserve_ticket,
> +					  list);
> +		used = space_info->bytes_used +
> +			space_info->bytes_reserved + space_info->bytes_pinned +
> +			space_info->bytes_readonly + space_info->bytes_may_use;
> +
> +		if (used + ticket->bytes <= space_info->total_bytes ||
> +		    can_overcommit(root, space_info, ticket->bytes, flush)) {
> +			space_info->bytes_may_use += ticket->bytes;
> +			list_del_init(&ticket->list);
> +			ticket->bytes = 0;
> +			space_info->tickets_id++;
> +			wake_up(&ticket->wait);
> +		} else
> +			return;
> +	}
> +
> +	if (head == &space_info->priority_tickets) {
> +		head = &space_info->tickets;
> +		flush = BTRFS_RESERVE_FLUSH_ALL;
> +		goto again;
> +	}
> +}
> +
> +/*
>   * This is for normal flushers, we can wait all goddamned day if we want to.  We
>   * will loop and continuously try to flush as long as we are making progress.
>   * We count progress as clearing off tickets each time we have to loop.
> @@ -4995,6 +5032,8 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
>  		ret = flush_space(fs_info->fs_root, space_info, to_reclaim,
>  			    to_reclaim, flush_state);
>  		spin_lock(&space_info->lock);
> +		if (!ret)
> +			try_to_wake_tickets(fs_info->fs_root, space_info);
>  		if (list_empty(&space_info->tickets)) {
>  			space_info->flush = 0;
>  			spin_unlock(&space_info->lock);
>

So first off I have no problems with this patch, it seems reasonable to me.

However I'm curious to see where it helped.  The only time can_overcommit() 
would suddenly start returning true where it wouldn't have before without 
actually adding space to the space_info would be when we've dropped an empty 
block group (or added a new drive) and suddenly fs_info->free_chunk_space is 
larger than it was.  Is that what you were observing?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space
  2016-09-21  6:59 ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space Wang Xiaoguang
  2016-09-22  9:25   ` [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim " Wang Xiaoguang
@ 2016-10-07 13:17   ` Josef Bacik
  1 sibling, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2016-10-07 13:17 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba

On 09/21/2016 02:59 AM, Wang Xiaoguang wrote:
> Indeed as long as we have delalloc bytes and if we fail to reclaim
> requested metadata space, we should write these delalloc bytes and
> have one more try, this can fix some false enospc errors in some
> extreme cases.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/extent-tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fdfc97f..46c2a37 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4794,6 +4794,15 @@ static int may_commit_transaction(struct btrfs_root *root,
>  {
>  	struct btrfs_block_rsv *delayed_rsv = &root->fs_info->delayed_block_rsv;
>  	struct btrfs_trans_handle *trans;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +
> +	/*
> +	 * shrink_delalloc() may not write enough delalloc bytes, so here we
> +	 * have a last try. Please don't remove these, because these can fix
> +	 * some false enospc error in some extreme cases.
> +	 */
> +	btrfs_start_delalloc_roots(fs_info, 0, -1);
> +	btrfs_wait_ordered_roots(fs_info, -1, 0, (u64)-1);
>
>  	trans = (struct btrfs_trans_handle *)current->journal_info;
>  	if (trans)
>

This will deadlock.

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space
  2016-09-22  9:25   ` [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim " Wang Xiaoguang
  2016-10-07  6:27     ` Wang Xiaoguang
@ 2016-10-07 13:24     ` Josef Bacik
  2016-10-10  8:54       ` Wang Xiaoguang
  1 sibling, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2016-10-07 13:24 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba

On 09/22/2016 05:25 AM, Wang Xiaoguang wrote:
> Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all
> ordered extents, but I run into some enospc errors when doing large file
> create and delete test, it's because shrink_delalloc() does not write
> enough delalloc bytes and wait them finished:
>     From: Miao Xie <miaox@cn.fujitsu.com>
>     Date: Mon, 4 Nov 2013 23:13:25 +0800
>     Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents
>
>     It is very likely that there are lots of ordered extents in the filesytem,
>     if we wait for the completion of all of them when we want to reclaim some
>     space for the metadata space reservation, we would be blocked for a long
>     time. The performance would drop down suddenly for a long time.
>
> But since Josef introduced "Btrfs: introduce ticketed enospc infrastructure",
> shrink_delalloc() starts to be run asynchronously, then If we want to reclaim
> metadata space, we can try harder, after all, false enospc error is not
> acceptable.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/extent-tree.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 46c2a37..f7c420b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig,
>  		if (trans)
>  			return;
>  		if (wait_ordered)
> -			btrfs_wait_ordered_roots(root->fs_info, items,
> +			btrfs_wait_ordered_roots(root->fs_info, -1,
>  						 0, (u64)-1);
>  		return;
>  	}
> @@ -4775,6 +4775,14 @@ skip_async:
>  		}
>  		delalloc_bytes = percpu_counter_sum_positive(
>  						&root->fs_info->delalloc_bytes);
> +		if (loops == 2) {
> +			/*
> +			 * Try to write all current delalloc bytes and wait all
> +			 * ordered extents to have a last try.
> +			 */
> +			to_reclaim = delalloc_bytes;
> +			items = -1;
> +		}
>  	}
>  }
>
>

The problem is if the outstanding ordered extents aren't enough to actually 
return the space we need we end up flushing and waiting longer when we should 
have just committed the transaction.  Think for example if we are slowly writing 
to a few files and rapidly removing thousands of files.  In this case all of our 
space is tied up in pinned, so we'd be better off not waiting on ordered extents 
and instead committing the transaction.

I think instead what we should do is have a priority set, so instead of doing 
commit_cycles in btrfs_async_reclaim_metadata_space, we instead have priority, 
and set it to say 3.  Then we pass this priority down to all of the flushers, 
and use it as a multiplier in delalloc for the number of items we'll wait on. 
Once we hit priority 0 we wait for all the things.  This way we do the easy pass 
first and hope it works, if not we try harder the next time through, etc until 
we throw all caution to the wind and wait for anything we can find.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space
  2016-10-07 13:24     ` Josef Bacik
@ 2016-10-10  8:54       ` Wang Xiaoguang
  0 siblings, 0 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-10-10  8:54 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba

Hi,

On 10/07/2016 09:24 PM, Josef Bacik wrote:
> On 09/22/2016 05:25 AM, Wang Xiaoguang wrote:
>> Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all
>> ordered extents, but I run into some enospc errors when doing large file
>> create and delete test, it's because shrink_delalloc() does not write
>> enough delalloc bytes and wait them finished:
>>     From: Miao Xie <miaox@cn.fujitsu.com>
>>     Date: Mon, 4 Nov 2013 23:13:25 +0800
>>     Subject: [PATCH] Btrfs: don't wait for the completion of all the 
>> ordered extents
>>
>>     It is very likely that there are lots of ordered extents in the 
>> filesytem,
>>     if we wait for the completion of all of them when we want to 
>> reclaim some
>>     space for the metadata space reservation, we would be blocked for 
>> a long
>>     time. The performance would drop down suddenly for a long time.
>>
>> But since Josef introduced "Btrfs: introduce ticketed enospc 
>> infrastructure",
>> shrink_delalloc() starts to be run asynchronously, then If we want to 
>> reclaim
>> metadata space, we can try harder, after all, false enospc error is not
>> acceptable.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/extent-tree.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 46c2a37..f7c420b 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root 
>> *root, u64 to_reclaim, u64 orig,
>>          if (trans)
>>              return;
>>          if (wait_ordered)
>> -            btrfs_wait_ordered_roots(root->fs_info, items,
>> +            btrfs_wait_ordered_roots(root->fs_info, -1,
>>                           0, (u64)-1);
>>          return;
>>      }
>> @@ -4775,6 +4775,14 @@ skip_async:
>>          }
>>          delalloc_bytes = percpu_counter_sum_positive(
>> &root->fs_info->delalloc_bytes);
>> +        if (loops == 2) {
>> +            /*
>> +             * Try to write all current delalloc bytes and wait all
>> +             * ordered extents to have a last try.
>> +             */
>> +            to_reclaim = delalloc_bytes;
>> +            items = -1;
>> +        }
>>      }
>>  }
>>
>>
>
> The problem is if the outstanding ordered extents aren't enough to 
> actually return the space we need we end up flushing and waiting 
> longer when we should have just committed the transaction.  Think for 
> example if we are slowly writing to a few files and rapidly removing 
> thousands of files.  In this case all of our space is tied up in 
> pinned, so we'd be better off not waiting on ordered extents and 
> instead committing the transaction.
Yes, I see, writing ordered extents are involved in disk writes, which 
are much slow.

>
> I think instead what we should do is have a priority set, so instead 
> of doing commit_cycles in btrfs_async_reclaim_metadata_space, we 
> instead have priority, and set it to say 3.  Then we pass this 
> priority down to all of the flushers, and use it as a multiplier in 
> delalloc for the number of items we'll wait on. Once we hit priority 0 
> we wait for all the things.  This way we do the easy pass first and 
> hope it works, if not we try harder the next time through, etc until 
> we throw all caution to the wind and wait for anything we can find.  
> Thanks,
OK, thanks for your suggestions, I'll try to write a better version, thanks.

Regards,
Xiaoguang Wang
>
> Josef
>
>




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns
  2016-10-07 13:16 ` [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Josef Bacik
@ 2016-10-10  8:58   ` Wang Xiaoguang
  2016-10-12  7:27   ` Wang Xiaoguang
  1 sibling, 0 replies; 11+ messages in thread
From: Wang Xiaoguang @ 2016-10-10  8:58 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba

hi,

On 10/07/2016 09:16 PM, Josef Bacik wrote:
> On 09/21/2016 02:59 AM, Wang Xiaoguang wrote:
>> In flush_space()->shrink_delalloc(), if can_overcommit() returns true,
>> though no bytes added to space_info, we still may satisfy some requests.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/extent-tree.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 38c2df8..fdfc97f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4960,6 +4960,43 @@ static void wake_all_tickets(struct list_head 
>> *head)
>>  }
>>
>>  /*
>> + * This function must be protected by btrfs_space_info's lock.
>> + */
>> +static void try_to_wake_tickets(struct btrfs_root *root,
>> +                struct btrfs_space_info *space_info)
>> +{
>> +    struct reserve_ticket *ticket;
>> +    struct list_head *head = &space_info->priority_tickets;
>> +    enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
>> +    u64 used;
>> +
>> +again:
>> +    while (!list_empty(head)) {
>> +        ticket = list_first_entry(head, struct reserve_ticket,
>> +                      list);
>> +        used = space_info->bytes_used +
>> +            space_info->bytes_reserved + space_info->bytes_pinned +
>> +            space_info->bytes_readonly + space_info->bytes_may_use;
>> +
>> +        if (used + ticket->bytes <= space_info->total_bytes ||
>> +            can_overcommit(root, space_info, ticket->bytes, flush)) {
>> +            space_info->bytes_may_use += ticket->bytes;
>> +            list_del_init(&ticket->list);
>> +            ticket->bytes = 0;
>> +            space_info->tickets_id++;
>> +            wake_up(&ticket->wait);
>> +        } else
>> +            return;
>> +    }
>> +
>> +    if (head == &space_info->priority_tickets) {
>> +        head = &space_info->tickets;
>> +        flush = BTRFS_RESERVE_FLUSH_ALL;
>> +        goto again;
>> +    }
>> +}
>> +
>> +/*
>>   * This is for normal flushers, we can wait all goddamned day if we 
>> want to.  We
>>   * will loop and continuously try to flush as long as we are making 
>> progress.
>>   * We count progress as clearing off tickets each time we have to loop.
>> @@ -4995,6 +5032,8 @@ static void 
>> btrfs_async_reclaim_metadata_space(struct work_struct *work)
>>          ret = flush_space(fs_info->fs_root, space_info, to_reclaim,
>>                  to_reclaim, flush_state);
>>          spin_lock(&space_info->lock);
>> +        if (!ret)
>> +            try_to_wake_tickets(fs_info->fs_root, space_info);
>>          if (list_empty(&space_info->tickets)) {
>>              space_info->flush = 0;
>>              spin_unlock(&space_info->lock);
>>
>
> So first off I have no problems with this patch, it seems reasonable 
> to me.
>
> However I'm curious to see where it helped.  The only time 
> can_overcommit() would suddenly start returning true where it wouldn't 
> have before without actually adding space to the space_info would be 
> when we've dropped an empty block group (or added a new drive) and 
> suddenly fs_info->free_chunk_space is larger than it was.  Is that 
> what you were observing?  Thanks,
Indeed, I'm not sure :)
I just found can_overcommit() sometimes can help, I'll run some tests 
and inform you what's happening later.

Regards,
Xiaoguang Wang

>
> Josef
>
>




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns
  2016-10-07 13:16 ` [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Josef Bacik
  2016-10-10  8:58   ` Wang Xiaoguang
@ 2016-10-12  7:27   ` Wang Xiaoguang
  2016-10-12 17:08     ` Josef Bacik
  1 sibling, 1 reply; 11+ messages in thread
From: Wang Xiaoguang @ 2016-10-12  7:27 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba

hi,

On 10/07/2016 09:16 PM, Josef Bacik wrote:
> On 09/21/2016 02:59 AM, Wang Xiaoguang wrote:
>> In flush_space()->shrink_delalloc(), if can_overcommit() returns true,
>> though no bytes added to space_info, we still may satisfy some requests.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/extent-tree.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 38c2df8..fdfc97f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4960,6 +4960,43 @@ static void wake_all_tickets(struct list_head 
>> *head)
>>  }
>>
>>  /*
>> + * This function must be protected by btrfs_space_info's lock.
>> + */
>> +static void try_to_wake_tickets(struct btrfs_root *root,
>> +                struct btrfs_space_info *space_info)
>> +{
>> +    struct reserve_ticket *ticket;
>> +    struct list_head *head = &space_info->priority_tickets;
>> +    enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
>> +    u64 used;
>> +
>> +again:
>> +    while (!list_empty(head)) {
>> +        ticket = list_first_entry(head, struct reserve_ticket,
>> +                      list);
>> +        used = space_info->bytes_used +
>> +            space_info->bytes_reserved + space_info->bytes_pinned +
>> +            space_info->bytes_readonly + space_info->bytes_may_use;
>> +
>> +        if (used + ticket->bytes <= space_info->total_bytes ||
>> +            can_overcommit(root, space_info, ticket->bytes, flush)) {
>> +            space_info->bytes_may_use += ticket->bytes;
>> +            list_del_init(&ticket->list);
>> +            ticket->bytes = 0;
>> +            space_info->tickets_id++;
>> +            wake_up(&ticket->wait);
>> +        } else
>> +            return;
>> +    }
>> +
>> +    if (head == &space_info->priority_tickets) {
>> +        head = &space_info->tickets;
>> +        flush = BTRFS_RESERVE_FLUSH_ALL;
>> +        goto again;
>> +    }
>> +}
>> +
>> +/*
>>   * This is for normal flushers, we can wait all goddamned day if we 
>> want to.  We
>>   * will loop and continuously try to flush as long as we are making 
>> progress.
>>   * We count progress as clearing off tickets each time we have to loop.
>> @@ -4995,6 +5032,8 @@ static void 
>> btrfs_async_reclaim_metadata_space(struct work_struct *work)
>>          ret = flush_space(fs_info->fs_root, space_info, to_reclaim,
>>                  to_reclaim, flush_state);
>>          spin_lock(&space_info->lock);
>> +        if (!ret)
>> +            try_to_wake_tickets(fs_info->fs_root, space_info);
>>          if (list_empty(&space_info->tickets)) {
>>              space_info->flush = 0;
>>              spin_unlock(&space_info->lock);
>>
>
> So first off I have no problems with this patch, it seems reasonable 
> to me.
>
> However I'm curious to see where it helped.  The only time 
> can_overcommit() would suddenly start returning true where it wouldn't 
> have before without actually adding space to the space_info would be 
> when we've dropped an empty block group (or added a new drive) and 
> suddenly fs_info->free_chunk_space is larger than it was.  Is that 
> what you were observing?  Thanks,
I think you're right. I don't add new drive when doing my big files 
create and delete test, so
it maybe "remove useless chunks" causing can_overcommit() returns true, 
but I don't know
how to observe this in codes, sorry. And this patch can really help to 
fix my enospc issue :)

Meanwhile, in shrink_delalloc(), if can_overcommit() returns true, 
shrink_delalloc()
will not shrink delalloc bytes any more, in this case we should check 
whether
some tickcts' requests can overcommit, if some can, we can satisfy them 
timely and directly.

I notice original btrfs codes before your patch "Btrfs: introduce 
ticketed enospc infrastructure"
will have over_commit try when every flush_space() returns, so here we 
may also need to do it, thanks.

Regards,
Xiaoguang Wang

>
> Josef
>
>




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns
  2016-10-12  7:27   ` Wang Xiaoguang
@ 2016-10-12 17:08     ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2016-10-12 17:08 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba

On 10/12/2016 03:27 AM, Wang Xiaoguang wrote:
> hi,
>
> On 10/07/2016 09:16 PM, Josef Bacik wrote:
>> On 09/21/2016 02:59 AM, Wang Xiaoguang wrote:
>>> In flush_space()->shrink_delalloc(), if can_overcommit() returns true,
>>> though no bytes added to space_info, we still may satisfy some requests.
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/extent-tree.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 39 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 38c2df8..fdfc97f 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -4960,6 +4960,43 @@ static void wake_all_tickets(struct list_head *head)
>>>  }
>>>
>>>  /*
>>> + * This function must be protected by btrfs_space_info's lock.
>>> + */
>>> +static void try_to_wake_tickets(struct btrfs_root *root,
>>> +                struct btrfs_space_info *space_info)
>>> +{
>>> +    struct reserve_ticket *ticket;
>>> +    struct list_head *head = &space_info->priority_tickets;
>>> +    enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_NO_FLUSH;
>>> +    u64 used;
>>> +
>>> +again:
>>> +    while (!list_empty(head)) {
>>> +        ticket = list_first_entry(head, struct reserve_ticket,
>>> +                      list);
>>> +        used = space_info->bytes_used +
>>> +            space_info->bytes_reserved + space_info->bytes_pinned +
>>> +            space_info->bytes_readonly + space_info->bytes_may_use;
>>> +
>>> +        if (used + ticket->bytes <= space_info->total_bytes ||
>>> +            can_overcommit(root, space_info, ticket->bytes, flush)) {
>>> +            space_info->bytes_may_use += ticket->bytes;
>>> +            list_del_init(&ticket->list);
>>> +            ticket->bytes = 0;
>>> +            space_info->tickets_id++;
>>> +            wake_up(&ticket->wait);
>>> +        } else
>>> +            return;
>>> +    }
>>> +
>>> +    if (head == &space_info->priority_tickets) {
>>> +        head = &space_info->tickets;
>>> +        flush = BTRFS_RESERVE_FLUSH_ALL;
>>> +        goto again;
>>> +    }
>>> +}
>>> +
>>> +/*
>>>   * This is for normal flushers, we can wait all goddamned day if we want
>>> to.  We
>>>   * will loop and continuously try to flush as long as we are making progress.
>>>   * We count progress as clearing off tickets each time we have to loop.
>>> @@ -4995,6 +5032,8 @@ static void btrfs_async_reclaim_metadata_space(struct
>>> work_struct *work)
>>>          ret = flush_space(fs_info->fs_root, space_info, to_reclaim,
>>>                  to_reclaim, flush_state);
>>>          spin_lock(&space_info->lock);
>>> +        if (!ret)
>>> +            try_to_wake_tickets(fs_info->fs_root, space_info);
>>>          if (list_empty(&space_info->tickets)) {
>>>              space_info->flush = 0;
>>>              spin_unlock(&space_info->lock);
>>>
>>
>> So first off I have no problems with this patch, it seems reasonable to me.
>>
>> However I'm curious to see where it helped.  The only time can_overcommit()
>> would suddenly start returning true where it wouldn't have before without
>> actually adding space to the space_info would be when we've dropped an empty
>> block group (or added a new drive) and suddenly fs_info->free_chunk_space is
>> larger than it was.  Is that what you were observing?  Thanks,
> I think you're right. I don't add new drive when doing my big files create and
> delete test, so
> it maybe "remove useless chunks" causing can_overcommit() returns true, but I
> don't know
> how to observe this in codes, sorry. And this patch can really help to fix my
> enospc issue :)
>
> Meanwhile, in shrink_delalloc(), if can_overcommit() returns true,
> shrink_delalloc()
> will not shrink delalloc bytes any more, in this case we should check whether
> some tickcts' requests can overcommit, if some can, we can satisfy them timely
> and directly.
>
> I notice original btrfs codes before your patch "Btrfs: introduce ticketed
> enospc infrastructure"
> will have over_commit try when every flush_space() returns, so here we may also
> need to do it, thanks.
>

Oh so you are deleting big files, that could free up block groups which would 
make can_overcommit() suddenly start returning true.  That makes sense.  You can add

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-12 17:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21  6:59 [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Wang Xiaoguang
2016-09-21  6:59 ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming metadata space Wang Xiaoguang
2016-09-22  9:25   ` [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim " Wang Xiaoguang
2016-10-07  6:27     ` Wang Xiaoguang
2016-10-07 13:24     ` Josef Bacik
2016-10-10  8:54       ` Wang Xiaoguang
2016-10-07 13:17   ` [PATCH 2/2] btrfs: try to write enough delalloc bytes when reclaiming " Josef Bacik
2016-10-07 13:16 ` [PATCH 1/2] btrfs: try to satisfy metadata requests when every flush_space() returns Josef Bacik
2016-10-10  8:58   ` Wang Xiaoguang
2016-10-12  7:27   ` Wang Xiaoguang
2016-10-12 17:08     ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.