All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Neal Gompa <ngompa13@gmail.com>, Qu Wenruo <wqu@suse.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v3 00/13] btrfs: support read-write for subpage metadata
Date: Fri, 2 Apr 2021 16:52:27 +0800	[thread overview]
Message-ID: <a58abc5a-ea2c-3936-4bb1-9b1c5d4e0f77@gmx.com> (raw)
In-Reply-To: <20210402084652.b7a4mj2mntxu2xi5@riteshh-domain>



On 2021/4/2 下午4:46, Ritesh Harjani wrote:
> On 21/04/02 04:36PM, Qu Wenruo wrote:
>>
>>
>> On 2021/4/2 下午4:33, Ritesh Harjani wrote:
>>> On 21/03/29 10:01AM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2021/3/29 上午4:02, Ritesh Harjani wrote:
>>>>> On 21/03/25 09:16PM, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/3/25 下午8:20, Neal Gompa wrote:
>>>>>>> On Thu, Mar 25, 2021 at 3:17 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>>>>>
>>>>>>>> This patchset can be fetched from the following github repo, along with
>>>>>>>> the full subpage RW support:
>>>>>>>> https://github.com/adam900710/linux/tree/subpage
>>>>>>>>
>>>>>>>> This patchset is for metadata read write support.
>>>>>>>>
>>>>>>>> [FULL RW TEST]
>>>>>>>> Since the data write path is not included in this patchset, we can't
>>>>>>>> really test the patchset itself, but anyone can grab the patch from
>>>>>>>> github repo and do fstests/generic tests.
>>>>>>>>
>>>>>>>> But at least the full RW patchset can pass -g generic/quick -x defrag
>>>>>>>> for now.
>>>>>>>>
>>>>>>>> There are some known issues:
>>>>>>>>
>>>>>>>> - Defrag behavior change
>>>>>>>>       Since current defrag is doing per-page defrag, to support subpage
>>>>>>>>       defrag, we need some change in the loop.
>>>>>>>>       E.g. if a page has both hole and regular extents in it, then defrag
>>>>>>>>       will rewrite the full 64K page.
>>>>>>>>
>>>>>>>>       Thus for now, defrag related failure is expected.
>>>>>>>>       But this should only cause behavior difference, no crash nor hang is
>>>>>>>>       expected.
>>>>>>>>
>>>>>>>> - No compression support yet
>>>>>>>>       There are at least 2 known bugs if forcing compression for subpage
>>>>>>>>       * Some hard coded PAGE_SIZE screwing up space rsv
>>>>>>>>       * Subpage ASSERT() triggered
>>>>>>>>         This is because some compression code is unlocking locked_page by
>>>>>>>>         calling extent_clear_unlock_delalloc() with locked_page == NULL.
>>>>>>>>       So for now compression is also disabled.
>>>>>>>>
>>>>>>>> - Inode nbytes mismatch
>>>>>>>>       Still debugging.
>>>>>>>>       The fastest way to trigger is fsx using the following parameters:
>>>>>>>>
>>>>>>>>         fsx -l 262144 -o 65536 -S 30073 -N 256 -R -W $mnt/file > /tmp/fsx
>>>>>>>>
>>>>>>>>       Which would cause inode nbytes differs from expected value and
>>>>>>>>       triggers btrfs check error.
>>>>>>>>
>>>>>>>> [DIFFERENCE AGAINST REGULAR SECTORSIZE]
>>>>>>>> The metadata part in fact has more new code than data part, as it has
>>>>>>>> some different behaviors compared to the regular sector size handling:
>>>>>>>>
>>>>>>>> - No more page locking
>>>>>>>>       Now metadata read/write relies on extent io tree locking, other than
>>>>>>>>       page locking.
>>>>>>>>       This is to allow behaviors like read lock one eb while also try to
>>>>>>>>       read lock another eb in the same page.
>>>>>>>>       We can't rely on page lock as now we have multiple extent buffers in
>>>>>>>>       the same page.
>>>>>>>>
>>>>>>>> - Page status update
>>>>>>>>       Now we use subpage wrappers to handle page status update.
>>>>>>>>
>>>>>>>> - How to submit dirty extent buffers
>>>>>>>>       Instead of just grabbing extent buffer from page::private, we need to
>>>>>>>>       iterate all dirty extent buffers in the page and submit them.
>>>>>>>>
>>>>>>>> [CHANGELOG]
>>>>>>>> v2:
>>>>>>>> - Rebased to latest misc-next
>>>>>>>>       No conflicts at all.
>>>>>>>>
>>>>>>>> - Add new sysfs interface to grab supported RO/RW sectorsize
>>>>>>>>       This will allow mkfs.btrfs to detect unmountable fs better.
>>>>>>>>
>>>>>>>> - Use newer naming schema for each patch
>>>>>>>>       No more "extent_io:" or "inode:" schema anymore.
>>>>>>>>
>>>>>>>> - Move two pure cleanups to the series
>>>>>>>>       Patch 2~3, originally in RW part.
>>>>>>>>
>>>>>>>> - Fix one uninitialized variable
>>>>>>>>       Patch 6.
>>>>>>>>
>>>>>>>> v3:
>>>>>>>> - Rename the sysfs to supported_sectorsizes
>>>>>>>>
>>>>>>>> - Rebased to latest misc-next branch
>>>>>>>>       This removes 2 cleanup patches.
>>>>>>>>
>>>>>>>> - Add new overview comment for subpage metadata
>>>>>>>>
>>>>>>>> Qu Wenruo (13):
>>>>>>>>       btrfs: add sysfs interface for supported sectorsize
>>>>>>>>       btrfs: use min() to replace open-code in btrfs_invalidatepage()
>>>>>>>>       btrfs: remove unnecessary variable shadowing in btrfs_invalidatepage()
>>>>>>>>       btrfs: refactor how we iterate ordered extent in
>>>>>>>>         btrfs_invalidatepage()
>>>>>>>>       btrfs: introduce helpers for subpage dirty status
>>>>>>>>       btrfs: introduce helpers for subpage writeback status
>>>>>>>>       btrfs: allow btree_set_page_dirty() to do more sanity check on subpage
>>>>>>>>         metadata
>>>>>>>>       btrfs: support subpage metadata csum calculation at write time
>>>>>>>>       btrfs: make alloc_extent_buffer() check subpage dirty bitmap
>>>>>>>>       btrfs: make the page uptodate assert to be subpage compatible
>>>>>>>>       btrfs: make set/clear_extent_buffer_dirty() to be subpage compatible
>>>>>>>>       btrfs: make set_btree_ioerr() accept extent buffer and to be subpage
>>>>>>>>         compatible
>>>>>>>>       btrfs: add subpage overview comments
>>>>>>>>
>>>>>>>>      fs/btrfs/disk-io.c   | 143 ++++++++++++++++++++++++++++++++++---------
>>>>>>>>      fs/btrfs/extent_io.c | 127 ++++++++++++++++++++++++++++----------
>>>>>>>>      fs/btrfs/inode.c     | 128 ++++++++++++++++++++++----------------
>>>>>>>>      fs/btrfs/subpage.c   | 127 ++++++++++++++++++++++++++++++++++++++
>>>>>>>>      fs/btrfs/subpage.h   |  17 +++++
>>>>>>>>      fs/btrfs/sysfs.c     |  15 +++++
>>>>>>>>      6 files changed, 441 insertions(+), 116 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.30.1
>>>>>>>>
>>>>>>>
>>>>>>> Why wouldn't we just integrate full read-write support with the
>>>>>>> caveats as described now? It seems to be relatively reasonable to do
>>>>>>> that, and this patch set is essentially unusable without the rest of
>>>>>>> it that does enable full read-write support.
>>>>>>
>>>>>> The metadata part is much more stable than data path (almost not touched
>>>>>> for several months), and the metadata part already has some difference
>>>>>> in its behavior, which needs review.
>>>>>>
>>>>>> You point makes some sense, but I still don't believe pushing a super
>>>>>> large patchset does any help for the review.
>>>>>>
>>>>>> If you want to test, you can grab the branch from the github repo.
>>>>>> If you want to review, the mails are all here for review.
>>>>>>
>>>>>> In fact, we used to have subpage support sent as a big patchset from IBM
>>>>>> guys, but the result is only some preparation patches get merged, and
>>>>>> nothing more.
>>>>>>
>>>>>> Using this multi-series method, we're already doing better work and
>>>>>> received more testing (to ensure regular sectorsize is not affected at
>>>>>> least).
>>>>>
>>>>> Hi Qu Wenruo,
>>>>>
>>>>> Sorry about chiming in late on this. I don't have any strong objection on either
>>>>> approach. Although sometime back when I tested your RW support git tree on
>>>>> Power, the unmount patch itself was crashing. I didn't debug it that time
>>>>> (this was a month back or so), so I also didn't bother testing xfstests on Power.
>>>>>
>>>>> But we do have an interest in making sure this patch series work on bs < ps
>>>>> on Power platform. I can try helping with testing, reviewing (to best of my
>>>>> knowledge) and fixing anything is possible :)
>>>>
>>>> That's great!
>>>>
>>>> One of my biggest problem here is, I don't have good enough testing
>>>> environment.
>>>>
>>>> Although SUSE has internal clouds for ARM64/PPC64, but due to the
>>>> f**king Great Firewall, it's super slow to access, no to mention doing
>>>> proper debugging.
>>>>
>>>> Currently I'm using two ARM SBCs, RK3399 and A311D based, to do the test.
>>>> But their computing power is far from ideal, only generic/quick can
>>>> finish in hours.
>>>>
>>>> Thus real world Power could definitely help.
>>>>>
>>>>> Let me try and pull your tree and test it on Power. Please let me know if there
>>>>> is anything needs to be taken care apart from your github tree and btrfs-progs
>>>>> branch with bs < ps support.
>>>>
>>>> If you're going to test the branch, here are some small notes:
>>>>
>>>> - Need to use latest btrfs-progs
>>>>     As it fixes a false alert on crossing 64K page boundary.
>>>>
>>>> - Need to slightly modify btrfs-progs to avoid false alerts
>>>>     For subpage case, mkfs.btrfs will output a warning, but that warning
>>>>     is outputted into stderr, which will screw up generic test groups.
>>>>     It's recommended to apply the following diff:
>>>>
>>>> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
>>>> index 569208a9..21976554 100644
>>>> --- a/common/fsfeatures.c
>>>> +++ b/common/fsfeatures.c
>>>> @@ -341,8 +341,8 @@ int btrfs_check_sectorsize(u32 sectorsize)
>>>>                   return -EINVAL;
>>>>           }
>>>>           if (page_size != sectorsize)
>>>> -               warning(
>>>> -"the filesystem may not be mountable, sectorsize %u doesn't match page
>>>> size %u",
>>>> +               printf(
>>>> +"the filesystem may not be mountable, sectorsize %u doesn't match page
>>>> size %u\n",
>>>>                           sectorsize, page_size);
>>>>           return 0;
>>>>    }
>>>>
>>>> - Xfstest/btrfs group will crash at btrfs/143
>>>>     Still investigating, but you can ignore btrfs group for now.
>>>>
>>>> - Very rare hang
>>>>     There is a very low change to hang, with "bad ordered accounting"
>>>>     dmesg.
>>>>     If you can hit, please let me know.
>>>>     I had something idea to fix it, but not yet in the branch.
>>>>
>>>> - btrfs inode nbytes mismatch
>>>>     Investigating, as it will make btrfs-check to report error.
>>>>
>>>> The last two bugs are the final show blocker, I'll give you extra
>>>> updates when those are fixed.
>>>
>>> Thanks Qu Wenruo, for above info.
>>> I cloned below git tree as mentioned in your git log to test for RW on Power.
>>> However, I still see that RW mount for bs < ps is disabled for in open_ctree()
>>> https://github.com/adam900710/linux/tree/subpage
>>>
>>> I see below code present in this tree.
>>>            /* For 4K sector size support, it's only read-only */
>>>            if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
>>>                    if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
>>>                            btrfs_err(fs_info,
>>>            "subpage sectorsize %u only supported read-only for page size %lu",
>>>                                    sectorsize, PAGE_SIZE);
>>>                            err = -EINVAL;
>>>                            goto fail_alloc;
>>>                    }
>>>            }
>>>
>>> Could you pls point me to the tree I can use for bs < ps testing on Power?
>>> Sorry if I missed something.
>>
>> Sorry, I updated the branch to my current development progress, it's now
>> at the ordered extent rework part, without the remaining subpage
>> functionality at all.
>>
>> You may want to grab this tree instead:
>> https://github.com/adam900710/linux/tree/subpage_old
>>
>> But please keep in mind that, you may get random hang, and certain
>> generic test case, especially generic/075 can corrupt the inode nbytes
>> and leaving all later test cases using TEST_DEV to report error on fsck.
>>
>
> Thanks for quick response. Sure, I will exclude generic/075 from the test
> for now.

Not only generic/075, but all tests running fsx may cause inode nbytes
corruption.

Thus I'd recommend either modify btrfs-check to ignore it, or re-mkfs on
TEST_DEV after each test case.

Thanks,
Qu

>
> -ritesh
>

  reply	other threads:[~2021-04-02  8:52 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  7:14 [PATCH v3 00/13] btrfs: support read-write for subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 01/13] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
2021-03-25 14:41   ` Anand Jain
2021-03-29 18:20     ` David Sterba
2021-04-01 22:32       ` Anand Jain
2021-04-01 17:56   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 02/13] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 03/13] btrfs: remove unnecessary variable shadowing " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 04/13] btrfs: refactor how we iterate ordered extent " Qu Wenruo
2021-04-02  1:15   ` Anand Jain
2021-04-02  3:33     ` Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 05/13] btrfs: introduce helpers for subpage dirty status Qu Wenruo
2021-04-01 18:11   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 06/13] btrfs: introduce helpers for subpage writeback status Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 07/13] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 08/13] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 09/13] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 10/13] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 11/13] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 12/13] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 13/13] btrfs: add subpage overview comments Qu Wenruo
2021-03-25 12:20 ` [PATCH v3 00/13] btrfs: support read-write for subpage metadata Neal Gompa
2021-03-25 13:16   ` Qu Wenruo
2021-03-28 20:02     ` Ritesh Harjani
2021-03-29  2:01       ` Qu Wenruo
2021-04-02  1:39         ` Anand Jain
2021-04-02  3:26           ` Qu Wenruo
2021-04-02  8:33         ` Ritesh Harjani
2021-04-02  8:36           ` Qu Wenruo
2021-04-02  8:46             ` Ritesh Harjani
2021-04-02  8:52               ` Qu Wenruo [this message]
2021-04-12 11:33                 ` Qu Wenruo
2021-04-15  3:44                   ` riteshh
2021-04-15 14:52                     ` riteshh
2021-04-15 23:19                       ` Qu Wenruo
2021-04-15 23:34                         ` Qu Wenruo
2021-04-16  1:34                           ` Qu Wenruo
2021-04-16  5:50                             ` riteshh
2021-04-16  6:14                               ` Qu Wenruo
2021-04-16 16:52                                 ` riteshh
2021-04-19  5:59                                   ` riteshh
2021-04-19  6:16                                     ` Qu Wenruo
2021-04-19  7:04                                       ` riteshh
2021-04-19  7:19                                       ` Qu Wenruo
2021-04-19 13:24                                         ` Qu Wenruo
2021-04-21  7:03                                           ` riteshh
2021-04-21  7:15                                             ` Qu Wenruo
2021-04-21  7:30                                             ` riteshh
2021-04-21  8:26                                               ` Qu Wenruo
2021-04-21 11:13                                                 ` riteshh
2021-04-21 11:42                                                   ` Qu Wenruo
2021-04-21 12:15                                                     ` riteshh
2021-03-29 18:53 ` David Sterba
2021-04-01  5:36   ` Qu Wenruo
2021-04-01 17:55     ` David Sterba
2021-04-02  1:27     ` Anand Jain
2021-04-03 11:08 ` David Sterba
2021-04-05  6:14   ` Qu Wenruo
2021-04-06  2:31     ` Anand Jain
2021-04-06 19:20       ` David Sterba
2021-04-06 23:59       ` Qu Wenruo
2021-04-06 19:13     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a58abc5a-ea2c-3936-4bb1-9b1c5d4e0f77@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    --cc=ritesh.list@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.