All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, <linux-btrfs@vger.kernel.org>
Cc: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Subject: Re: [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space
Date: Fri, 22 Apr 2016 14:06:15 -0400	[thread overview]
Message-ID: <571A6817.8080904@fb.com> (raw)
In-Reply-To: <1458610552-9845-20-git-send-email-quwenruo@cn.fujitsu.com>

On 03/21/2016 09:35 PM, Qu Wenruo wrote:
> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try
> to reserve is calculated by the difference between outstanding_extents and
> reserved_extents.
>
> When reserve_metadata_bytes() fails to reserve desited metadata space,
> it has already done some reclaim work, such as write ordered extents.
>
> In that case, outstanding_extents and reserved_extents may already
> changed, and we may reserve enough metadata space then.
>
> So this patch will try to call reserve_metadata_bytes() at most 3 times
> to ensure we really run out of space.
>
> Such false ENOSPC is mainly caused by small file extents and time
> consuming delalloc functions, which mainly affects in-band
> de-duplication. (Compress should also be affected, but LZO/zlib is
> faster than SHA256, so still harder to trigger than dedupe).
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c | 25 ++++++++++++++++++++++---
>   1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dabd721..016d2ec 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
>   				 * a new extent is revered, then deleted
>   				 * in one tran, and inc/dec get merged to 0.
>   				 *
> -				 * In this case, we need to remove its dedup
> +				 * In this case, we need to remove its dedupe
>   				 * hash.
>   				 */
>   				btrfs_dedupe_del(trans, fs_info, node->bytenr);
> @@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>   	bool delalloc_lock = true;
>   	u64 to_free = 0;
>   	unsigned dropped;
> +	int loops = 0;
>
>   	/* If we are a free space inode we need to not flush since we will be in
>   	 * the middle of a transaction commit.  We also don't need the delalloc
> @@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>   	    btrfs_transaction_in_commit(root->fs_info))
>   		schedule_timeout(1);
>
> +	num_bytes = ALIGN(num_bytes, root->sectorsize);
> +
> +again:
>   	if (delalloc_lock)
>   		mutex_lock(&BTRFS_I(inode)->delalloc_mutex);
>
> -	num_bytes = ALIGN(num_bytes, root->sectorsize);
> -
>   	spin_lock(&BTRFS_I(inode)->lock);
>   	nr_extents = (unsigned)div64_u64(num_bytes +
>   					 BTRFS_MAX_EXTENT_SIZE - 1,
> @@ -5815,6 +5817,23 @@ out_fail:
>   	}
>   	if (delalloc_lock)
>   		mutex_unlock(&BTRFS_I(inode)->delalloc_mutex);
> +	/*
> +	 * The number of metadata bytes is calculated by the difference
> +	 * between outstanding_extents and reserved_extents. Sometimes though
> +	 * reserve_metadata_bytes() fails to reserve the wanted metadata bytes,
> +	 * indeed it has already done some work to reclaim metadata space, hence
> +	 * both outstanding_extents and reserved_extents would have changed and
> +	 * the bytes we try to reserve would also has changed(may be smaller).
> +	 * So here we try to reserve again. This is much useful for online
> +	 * dedupe, which will easily eat almost all meta space.
> +	 *
> +	 * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for
> +	 * online dedupe, later we should find a better method to avoid dedupe
> +	 * enospc issue.
> +	 */
> +	if (unlikely(ret == -ENOSPC && loops++ < 3))
> +		goto again;
> +
>   	return ret;
>   }
>
>

NAK, we aren't going to just arbitrarily retry to make our metadata 
reservation.  Dropping reserved metadata space by completing ordered 
extents should free enough to make our current reservation, and in fact 
this only accounts for the disparity, so should be an accurate count 
most of the time.  I can see a case for detecting that the disparity no 
longer exists and retrying in that case (we free enough ordered extents 
that we are no longer trying to reserve ours + overflow but now only 
ours) and retry in _that specific case_, but we need to limit it to this 
case only.  Thanks,

Josef

  reply	other threads:[~2016-04-22 18:06 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22  1:35 [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 01/27] btrfs: dedupe: Introduce dedupe framework and its header Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 02/27] btrfs: dedupe: Introduce function to initialize dedupe info Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 03/27] btrfs: dedupe: Introduce function to add hash into in-memory tree Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 04/27] btrfs: dedupe: Introduce function to remove hash from " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 05/27] btrfs: delayed-ref: Add support for increasing data ref under spinlock Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 06/27] btrfs: dedupe: Introduce function to search for an existing hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 07/27] btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 08/27] btrfs: ordered-extent: Add support for dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 09/27] btrfs: dedupe: Inband in-memory only de-duplication implement Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method Qu Wenruo
2016-03-24 20:58   ` Chris Mason
2016-03-25  1:59     ` Qu Wenruo
2016-03-25 15:11       ` Chris Mason
2016-03-26 13:11         ` Qu Wenruo
2016-03-28 14:09           ` Chris Mason
2016-03-29  1:47             ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 11/27] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info Qu Wenruo
2016-03-29 17:31   ` Alex Lyakas
2016-03-30  0:26     ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 12/27] btrfs: dedupe: Add support for on-disk hash search Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 13/27] btrfs: dedupe: Add support to delete hash for on-disk backend Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 14/27] btrfs: dedupe: Add support for adding " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 15/27] btrfs: dedupe: Add ioctl for inband dedupelication Qu Wenruo
2016-03-22  2:29   ` kbuild test robot
2016-03-22  2:48   ` kbuild test robot
2016-03-22  1:35 ` [PATCH v8 16/27] btrfs: dedupe: add an inode nodedupe flag Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 17/27] btrfs: dedupe: add a property handler for online dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 18/27] btrfs: dedupe: add per-file online dedupe control Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space Qu Wenruo
2016-04-22 18:06   ` Josef Bacik [this message]
2016-04-25  0:54     ` Qu Wenruo
2016-04-25 14:05       ` Josef Bacik
2016-04-26  0:50         ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 20/27] btrfs: dedupe: Fix a bug when running inband dedupe with balance Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 21/27] btrfs: Fix a memory leak in inband dedupe hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 22/27] btrfs: dedupe: Fix metadata balance error when dedupe is enabled Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 23/27] btrfs: dedupe: Avoid submit IO for hash hit extent Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 24/27] btrfs: dedupe: Preparation for compress-dedupe co-work Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 25/27] btrfs: dedupe: Add support for compression and dedpue Qu Wenruo
2016-03-24 20:35   ` Chris Mason
2016-03-25  1:44     ` Qu Wenruo
2016-03-25 15:12       ` Chris Mason
2016-03-22  1:35 ` [PATCH v8 26/27] btrfs: relocation: Enhance error handling to avoid BUG_ON Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 27/27] btrfs: dedupe: Fix a space cache delalloc bytes underflow bug Qu Wenruo
2016-03-22 13:38 ` [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework David Sterba
2016-03-23  2:25   ` Qu Wenruo
2016-03-24 13:42     ` David Sterba
2016-03-25  1:38       ` Qu Wenruo
2016-04-04 16:55         ` David Sterba
2016-04-05  3:08           ` Qu Wenruo
2016-04-20  2:02             ` Qu Wenruo
2016-04-20 19:14               ` Chris Mason
2016-04-06  3:47           ` Nicholas D Steeves
2016-04-06  5:22             ` Qu Wenruo
2016-04-22 22:14               ` Nicholas D Steeves
2016-04-25  1:25                 ` Qu Wenruo
2016-03-29 17:22 ` Alex Lyakas
2016-03-30  0:34   ` Qu Wenruo
2016-03-30 10:36     ` Alex Lyakas
2016-04-03  8:22     ` Alex Lyakas
2016-04-05  3:51       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571A6817.8080904@fb.com \
    --to=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=wangxg.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.