linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans van Kranenburg <Hans.van.Kranenburg@mendix.com>
To: Sasha Levin <sashal@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Cc: David Sterba <dsterba@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
Date: Tue, 8 Jan 2019 23:52:02 +0000	[thread overview]
Message-ID: <783ccf1f-65df-d388-079c-9449d4319c27@mendix.com> (raw)
In-Reply-To: <20190108192628.121270-72-sashal@kernel.org>

Hi Sasha,

On 1/8/19 8:25 PM, Sasha Levin wrote:
> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> 
> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
> 
> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
> fixed calculating the stripe_size for a new DUP chunk.

That one also ended up as:

4.14-stable
0136bd7238b2cb8238426af4183ed0b02165c3f9

4.9-stable
8890bae03f4dba1c2292e5445682b556af4e8f1b

4.4-stable
97c3e46ef53748278286fc09dcc30b138d6677c4

3.16.57-rc1
f68f46284a199f6837c1d5b94a6ae979a2cc463c

While hitting the failure condition without adding "crafting" steps to
make it exactly match the scenario is unlikely, it might be good if we
just go all the way back with this regression fix?

Thanks,
Hans

> However, the same calculation reappears a bit later, and that one was
> not changed yet. The resulting bug that is exposed is that the newly
> allocated device extents ('stripes') can have a few MiB overlap with the
> next thing stored after them, which is another device extent or the end
> of the disk.
> 
> The scenario in which this can happen is:
> * The block device for the filesystem is less than 10GiB in size.
> * The amount of contiguous free unallocated disk space chosen to use for
>   chunk allocation is 20% of the total device size, or a few MiB more or
>   less.
> 
> An example:
> - The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB)
> - There's 1578MiB unallocated raw disk space left in one contiguous
>   piece.
> 
> In this case stripe_size is first calculated as 789MiB, (half of
> 1578MiB).
> 
> Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we
> enter the if block. Now stripe_size value is immediately overwritten
> while calculating an adjusted value based on max_chunk_size, which ends
> up as 788MiB.
> 
> Next, the value is rounded up to a 16MiB boundary, 800MiB, which is
> actually more than the value we had before. However, the last comparison
> fails to detect this, because it's comparing the value with the total
> amount of free space, which is about twice the size of stripe_size.
> 
> In the example above, this means that the resulting raw disk space being
> allocated is 1600MiB, while only a gap of 1578MiB has been found. The
> second device extent object for this DUP chunk will overlap for 22MiB
> with whatever comes next.
> 
> The underlying problem here is that the stripe_size is reused all the
> time for different things. So, when entering the code in the if block,
> stripe_size is immediately overwritten with something else. If later we
> decide we want to have the previous value back, then the logic to
> compute it was copy pasted in again.
> 
> With this change, the value in stripe_size is not unnecessarily
> destroyed, so the duplicated calculation is not needed any more.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  fs/btrfs/volumes.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a567ee0bf060..1797a82eb7df 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4768,19 +4768,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	/*
>  	 * Use the number of data stripes to figure out how big this chunk
>  	 * is really going to be in terms of logical address space,
> -	 * and compare that answer with the max chunk size
> +	 * and compare that answer with the max chunk size. If it's higher,
> +	 * we try to reduce stripe_size.
>  	 */
>  	if (stripe_size * data_stripes > max_chunk_size) {
> -		stripe_size = div_u64(max_chunk_size, data_stripes);
> -
> -		/* bump the answer up to a 16MB boundary */
> -		stripe_size = round_up(stripe_size, SZ_16M);
> -
>  		/*
> -		 * But don't go higher than the limits we found while searching
> -		 * for free extents
> +		 * Reduce stripe_size, round it up to a 16MB boundary again and
> +		 * then use it, unless it ends up being even bigger than the
> +		 * previous value we had already.
>  		 */
> -		stripe_size = min(devices_info[ndevs - 1].max_avail,
> +		stripe_size = min(round_up(div_u64(max_chunk_size,
> +						   data_stripes), SZ_16M),
>  				  stripe_size);
>  	}
>  
>

  reply	other threads:[~2019-01-08 23:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190108192628.121270-1-sashal@kernel.org>
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 071/117] btrfs: volumes: Make sure there is no overlap of dev extents at mount time Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling Sasha Levin
2019-01-08 23:52   ` Hans van Kranenburg [this message]
2019-01-23 14:37     ` Sasha Levin
2019-01-23 15:54       ` Hans van Kranenburg
2019-01-23 17:41         ` David Sterba
2019-01-23 18:18         ` Sasha Levin
2019-01-23 19:32           ` Hans van Kranenburg
2019-11-19 15:23             ` Ben Hutchings
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 073/117] btrfs: fix use-after-free due to race between replace start and cancel Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 074/117] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 075/117] Btrfs: fix access to available allocation bits when starting balance Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 076/117] btrfs: improve error handling of btrfs_add_link Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=783ccf1f-65df-d388-079c-9449d4319c27@mendix.com \
    --to=hans.van.kranenburg@mendix.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).