linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v4] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
Date: Tue, 11 Aug 2020 15:42:31 +0800	[thread overview]
Message-ID: <8796039b-d0b1-a129-3b01-a260a051154b@gmx.com> (raw)
In-Reply-To: <20200811072234.GK2026@twin.jikos.cz>



On 2020/8/11 下午3:22, David Sterba wrote:
> On Sat, Aug 01, 2020 at 07:35:26AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/7/31 下午10:08, David Sterba wrote:
>>> On Fri, Jul 31, 2020 at 07:29:11PM +0800, Qu Wenruo wrote:
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>>>  	}
>>>>
>>>>  	mutex_lock(&fs_info->chunk_mutex);
>>>> +	/*
>>>> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>>>> +	 * current device boundary.
>>>> +	 * This shouldn't fail, as alloc_state should only utilize those two
>>>> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
>>>
>>> If this fails or not depends on implementation details of
>>> clear_extent_bits and this comment will get out of sync eventually, so I
>>> don't think it should be that specific.
>>>
>>> If the new_size is somewhere in the middle of an existing state, it'll
>>> need to be split anyway, no?
>>
>> Nope. Because in alloc_state we only have two bits utilized,
>> CHUNK_TRIMMED and CHUNK_ALLOCATED.
>>
>> Thus what we're doing is to clear all utilized bits.
>
> Which is true for now, adding a new bit would change that.

Yep, that's also why I had the comment here.

>
>>>
>>> alloc_state |-----+++++|
>>> clear             |------------------------- ... (u64)-1|
>>>
>>> So we'd need to keep the state "-" and unset bits only from "+", and
>>> this will require a split.
>>
>> In this case, we would only reduce the the size of the existing status,
>> or just remove it completely.
>
> I haven't found the 'only reduce the size' in the code, thre's always
> some split. The case in __clear_extent_bit is
>
>  773          *     | ---- desired range ---- |
>  774          *  | state | or
>  775          *  | ------------- state -------------- |
>
> the case on line 774 and followed by split_state.

You're right, we still need to allocate memory in this case.
And it's the BUG_ON() and the preallocated state saving us from the
memory failure.

>
>>> But I still have doubts about just clearing the range, why are there any
>>> device->alloc_state entries at all after device is shrunk?
>>
>> Because the alloc_state is mostly only utilized by trim facility, thus
>> existing functions won't bother clearing/setting it.
>>
>> In this particular case, previous fstrim run would set the CHUNK_TRIMMED
>> bit for all unallocated range (except the super reserve).
>> Then shrink doesn't clear the exceed range, and cause problem.
>
> So the unallocated range on a device is also represented in the
> alloc_state tree?

If the unallocated range has been trimmed, then it has an state, with
CHUNK_TRIMMED bit set.

>
>> Thus clearing the bit in btrfs_shrink_device() makes sense.
>>
>>> Using
>>> clear_extent_bits here is not wrong if we look at the end result of
>>> clearing the range, but otherwise it leaves some state information
>>> and allocated memory behind.
>>>
>> Not that complex case, just plain not fully considered corner case.
>
> So what to do about that? I expect the alloc_state tree to represent the
> device accurately and don't want to leave known issues unfixed.
>
The patch still stand as it is.

The reproducer is still the same:
- fstrim
  This marks the unallocated range of one device with CHUNK_TRIMMED bit
  And the range starts from the last dev extent end, to the end of the
device.

- shrink device
  We need to remove the CHUNK_* bits, or problems will happen in next step

- fstrim again
  Due to the bad check, we could underflow the length of the range, leads to
  access beyond boundary.

The patch fixes it in two locations.
When shrink, we clear the CHUNK_* bits, as these bits makes no sense for
range beyond device boundary.

When fstrim, we do extra check to ensure we don't underflow/overflow
anything.

Is there anything unclear that needs extra comments?

Thanks,
Qu

  reply	other threads:[~2020-08-11  7:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 11:29 [PATCH v4] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Qu Wenruo
2020-07-31 14:08 ` David Sterba
2020-07-31 23:35   ` Qu Wenruo
2020-08-11  7:22     ` David Sterba
2020-08-11  7:42       ` Qu Wenruo [this message]
2020-08-12  6:10         ` David Sterba
2020-08-12  6:33           ` Qu Wenruo
2020-08-12  6:37             ` David Sterba
2020-08-11  8:41 ` Nikolay Borisov
2020-08-11  8:46   ` Qu Wenruo
2020-08-11 10:24     ` Filipe Manana
2020-08-12  6:14       ` David Sterba
2020-08-12  6:43 ` [PATCH v5] " David Sterba
2020-08-12  6:57   ` Qu Wenruo
2020-08-12 11:14   ` Qu Wenruo
2020-08-12 11:24     ` Nikolay Borisov
2020-08-12 11:26       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8796039b-d0b1-a129-3b01-a260a051154b@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).