linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
Date: Fri, 31 Jul 2020 11:40:30 +0100	[thread overview]
Message-ID: <CAL3q7H4asjD-9Vq-MtsJ+q3EHU9RBuF_nv1tVRQXMExGnSRbDA@mail.gmail.com> (raw)
In-Reply-To: <9b441c78-b919-dbe6-0fab-a89c6d011703@suse.com>

On Fri, Jul 31, 2020 at 11:21 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2020/7/31 下午6:05, Filipe Manana wrote:
> > On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> [BUG]
> >> The following script can lead to tons of beyond device boundary access:
> >>
> >>   mkfs.btrfs -f $dev -b 10G
> >>   mount $dev $mnt
> >>   trimfs $mnt
> >>   btrfs filesystem resize 1:-1G $mnt
> >>   trimfs $mnt
> >>
> >> [CAUSE]
> >> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> >> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> >> already trimmed.
> >>
> >> So we check device->alloc_state by finding the first range which doesn't
> >> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> >>
> >> But if we shrunk the device, that bits are not cleared, thus we could
> >> easily got a range starts beyond the shrunk device size.
> >>
> >> This results the returned @start and @end are all beyond device size,
> >> then we call "end = min(end, device->total_bytes -1);" making @end
> >> smaller than device size.
> >>
> >> Then finally we goes "len = end - start + 1", totally underflow the
> >> result, and lead to the beyond-device-boundary access.
> >>
> >> [FIX]
> >> This patch will fix the problem in two ways:
> >> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> >>   This is the root fix
> >>
> >> - Add extra safe net when trimming free device extents
> >>   We check and warn if the returned range is already beyond current
> >>   device.
> >>
> >> Link: https://github.com/kdave/btrfs-progs/issues/282
> >> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Add proper fixes tag
> >> - Add extra warning for beyond device end case
> >> - Add graceful exit for already trimmed case
> >> v3:
> >> - Don't return EUCLEAN for beyond boundary access
> >> - Rephrase the warning message for beyond boundary access
> >> ---
> >>  fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
> >>  fs/btrfs/volumes.c     | 12 ++++++++++++
> >>  2 files changed, 33 insertions(+)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index fa7d83051587..7c5e0961c93b 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -33,6 +33,7 @@
> >>  #include "delalloc-space.h"
> >>  #include "block-group.h"
> >>  #include "discard.h"
> >> +#include "rcu-string.h"
> >>
> >>  #undef SCRAMBLE_DELAYED_REFS
> >>
> >> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> >>                                             &start, &end,
> >>                                             CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >>
> >> +               /* CHUNK_* bits not cleared properly */
> >> +               if (start > device->total_bytes) {
> >> +                       WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >> +                       btrfs_warn_in_rcu(fs_info,
> >> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> >> +                                         start, end - start + 1,
> >> +                                         rcu_str_deref(device->name),
> >> +                                         device->total_bytes);
> >> +                       mutex_unlock(&fs_info->chunk_mutex);
> >> +                       ret = 0;
> >> +                       break;
> >> +               }
> >> +
> >> +               /* The remaining part has already been trimmed */
> >> +               if (start == device->total_bytes) {
> >> +                       mutex_unlock(&fs_info->chunk_mutex);
> >> +                       ret = 0;
> >> +                       break;
> >> +               }
> >
> > Sorry I missed this earlier, but why is this a special case? Couldn't
> > this be merged into the previous check?
> > Why is an offset matching the ending of the device not considered unexpected?
>
> For such example:
>                 0               1g              2g
> device 1:       |///////////////|               |
> |//| = Allocated space
> |  | = Free space.
>
> After one fstrim, [1G, 2G) get trimmed.
> So in the alloc_state we have
>                 0               1G              2G
> device 1:       |               |***************|
> |**| = CHUNK_TRIMMED bits set
>
> Here we just focus on the unallocated space, ignoring the block group parts.
>
> Then we run fstrim again.
> We call find_first_clear_extent_bit(start == 1G), then we got the result
> start == 2G, end = U64_MAX.
>
> In that case, we got start == device->total_bytes, and it's completely
> valid.

Ok. But this can happen without shrinking the device before, and it
seems we already handle it, or is the existing handling buggy?
If it is, it should be replaced or updated.

Thanks.

>
> >
> > I also don't understand the comment, what is the remaining part?
>
> The remaining means the unallocated space from the @start of
> find_first_clear_extent_bit().
>
> Any better suggestion?
>
> Thanks,
> Qu
>
> >
> > Thanks.
> >
> >> +
> >>                 /* Ensure we skip the reserved area in the first 1M */
> >>                 start = max_t(u64, start, SZ_1M);
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index d7670e2a9f39..4e51ef68ea72 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> >>         }
> >>
> >>         mutex_lock(&fs_info->chunk_mutex);
> >> +       /*
> >> +        * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> >> +        * current device boundary.
> >> +        * This shouldn't fail, as alloc_state should only utilize those two
> >> +        * bits, thus we shouldn't alloc new memory for clearing the status.
> >> +        *
> >> +        * So here we just do an ASSERT() to catch future behavior change.
> >> +        */
> >> +       ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> >> +                               CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >> +       ASSERT(!ret);
> >> +
> >>         btrfs_device_set_disk_total_bytes(device, new_size);
> >>         if (list_empty(&device->post_commit_list))
> >>                 list_add_tail(&device->post_commit_list,
> >> --
> >> 2.28.0
> >>
> >
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  parent reply	other threads:[~2020-07-31 10:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31  9:48 [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Qu Wenruo
2020-07-31 10:05 ` Filipe Manana
2020-07-31 10:20   ` Qu Wenruo
2020-07-31 10:38     ` Qu Wenruo
2020-07-31 10:42       ` Filipe Manana
2020-07-31 10:40     ` Filipe Manana [this message]
2020-07-31 20:52 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H4asjD-9Vq-MtsJ+q3EHU9RBuF_nv1tVRQXMExGnSRbDA@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).