All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: riteshh <riteshh@linux.ibm.com>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	Neal Gompa <ngompa13@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v3 00/13] btrfs: support read-write for subpage metadata
Date: Fri, 16 Apr 2021 14:14:04 +0800	[thread overview]
Message-ID: <a5478e5e-9be4-bc32-d5e1-eaaa3f2b63a9@suse.com> (raw)
In-Reply-To: <20210416055036.v4siyzsnmf32bx4y@riteshh-domain>



On 2021/4/16 下午1:50, riteshh wrote:
> On 21/04/16 09:34AM, Qu Wenruo wrote:
>>
>>
>> On 2021/4/16 上午7:34, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/4/16 上午7:19, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2021/4/15 下午10:52, riteshh wrote:
>>>>> On 21/04/15 09:14AM, riteshh wrote:
>>>>>> On 21/04/12 07:33PM, Qu Wenruo wrote:
>>>>>>> Good news, you can fetch the subpage branch for better test results.
>>>>>>>
>>>>>>> Now the branch should pass all generic tests, except defrag and known
>>>>>>> failures.
>>>>>>> And no more random crash during the tests.
>>>>>>
>>>>>> Thanks, let me test it on PPC64 box.
>>>>>
>>>>> I do see some failures remaining with the patch series.
>>>>> However the one which is blocking my testing is the tests/generic/095
>>>>> I see kernel BUG hitting with below signature.
>>>>
>>>> That's pretty different from my tests.
>>>>
>>>> As I haven't seen such BUG_ON() for a while.
>>>>
>>>>
>>>>>
>>>>> Please let me know if this a known failure?
>>>>>
>>>>> <xfstests config>
>>>>> #:~/work-tools/xfstests$ sudo ./check -g auto
>>>>> SECTION       -- btrfs_4k
>>>>> FSTYP         -- btrfs
>>>>> PLATFORM      -- Linux/ppc64le qemu 5.12.0-rc7-02316-g3490dae50c0 #73
>>>>> SMP Thu Apr 15 07:29:23 CDT 2021
>>>>> MKFS_OPTIONS  -- -f -s 4096 -n 4096 /dev/loop3
>>>>
>>>> I see you're using -n 4096, not the default -n 16K, let me see if I can
>>>> reproduce that.
>>>>
>>>> But from the backtrace, it doesn't look like the case,
>>>> as it happens for data path, which means it's only related to sectorsize.
>>>>
>>>>> MOUNT_OPTIONS -- /dev/loop3 /mnt1/scratch
>>>>>
>>>>>
>>>>> <kernel logs>
>>>>> [ 6057.560580] BTRFS warning (device loop3): read-write for sector
>>>>> size 4096 with page size 65536 is experimental
>>>>> [ 6057.861383] run fstests generic/095 at 2021-04-15 14:12:10
>>>>> [ 6058.345127] BTRFS info (device loop2): disk space caching is enabled
>>>>> [ 6058.348910] BTRFS info (device loop2): has skinny extents
>>>>> [ 6058.351930] BTRFS warning (device loop2): read-write for sector
>>>>> size 4096 with page size 65536 is experimental
>>>>> [ 6059.896382] BTRFS: device fsid 43ec9cdf-c124-4460-ad93-933bfd5ddbbd
>>>>> devid 1 transid 5 /dev/loop3 scanned by mkfs.btrfs (739641)
>>>>> [ 6060.225107] BTRFS info (device loop3): disk space caching is enabled
>>>>> [ 6060.226213] BTRFS info (device loop3): has skinny extents
>>>>> [ 6060.227084] BTRFS warning (device loop3): read-write for sector
>>>>> size 4096 with page size 65536 is experimental
>>>>> [ 6060.234537] BTRFS info (device loop3): checking UUID tree
>>>>> [ 6061.375902] assertion failed: PagePrivate(page) && page->private,
>>>>> in fs/btrfs/subpage.c:171
>>>>> [ 6061.378296] ------------[ cut here ]------------
>>>>> [ 6061.379422] kernel BUG at fs/btrfs/ctree.h:3403!
>>>>> cpu 0x5: Vector: 700 (Program Check) at [c0000000260d7490]
>>>>>       pc: c000000000a9370c: assertfail.constprop.11+0x34/0x48
>>>>>       lr: c000000000a93708: assertfail.constprop.11+0x30/0x48
>>>>>       sp: c0000000260d7730
>>>>>      msr: 800000000282b033
>>>>>     current = 0xc0000000260c0080
>>>>>     paca    = 0xc00000003fff8a00   irqmask: 0x03   irq_happened: 0x01
>>>>>       pid   = 739712, comm = fio
>>>>> kernel BUG at fs/btrfs/ctree.h:3403!
>>>>> Linux version 5.12.0-rc7-02316-g3490dae50c0 (riteshh@xxxx) (gcc
>>>>> (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0, GNU ld (GNU Binutils for Ubuntu)
>>>>> 2.30) #73 SMP Thu Apr 15 07:29:23 CDT 2021
>>>>> enter ? for help
>>>>> [c0000000260d7790] c000000000a90280
>>>>> btrfs_subpage_assert.isra.9+0x70/0x110
>>>>> [c0000000260d77b0] c000000000a91064
>>>>> btrfs_subpage_set_uptodate+0x54/0x110
>>>>> [c0000000260d7800] c0000000009c6d0c btrfs_dirty_pages+0x1bc/0x2c0
>>>>
>>>> This is very strange.
>>>> As in btrfs_dirty_pages(), the pages passed in are already prepared by
>>>> prepare_pages(), which means all of them should have Private set.
>>>>
>>>> Can you reproduce the bug reliable?
> 
> Yes. almost reliably on my PPC box.
> 
>>>
>>> OK, I got it reproduced.
>>>
>>> It's not a reliable BUG_ON(), but can be reproduced.
>>> The test get skipped for all my boards as it requires fio tool, thus I
>>> didn't get it triggered for all previous runs.
>>>
>>> I'll take a look into the case.
>>
>> This exposed an interesting race window in btrfs_buffered_write():
>>          Writer                    |             fadvice
>> ----------------------------------+-------------------------------
>> btrfs_buffered_write()            |
>> |- prepare_pages()                |
>> |  |- Now all pages involved get  |
>> |     Private set                 |
>> |                                 | btrfs_release_page()
>> |                                 | |- Clear page Private
>> |- lock_extent()                  |
>> |  |- This would prevent          |
>> |     btrfs_release_page() to     |
>> |     clear the page Private      |
>> |
>> |- btrfs_dirty_page()
>>     |- Will trigger the BUG_ON()
> 
> 
> Sorry about the silly query. But help me understand how is above race possible?
> Won't prepare_pages() will lock all the pages first. The same requirement
> of locked page should be with btrfs_releasepage() too no?

releasepage() call can easily got a page locked and release it.

For call sites like btrfs_invalidatepage(), the page is already locked.

btrfs_releasepage() will not to try to release the page if the extent is 
locked (any extent range inside the page has EXTENT_LOCK bit).

> 
> I see only two paths which could result into btrfs_releasepage()
> 1. one via try_to_release_pages -> releasepage()

This is the race one, called from fadvice() to release pages.

> 2. writeback path calling btrfs_writepage or btrfs_writepages
> 	which may result into calling of btrfs_invalidatepage()

Not this one.

> 
> Although I am not sure which one this is racing with.
> 
>>
>> This only happens for subpage, because subpage introduces new ASSERT()
>> to do extra check.
>>
>> If we want to speak strictly, regular sector size should also report
>> this problem.
>> But regular sector size case doesn't really care about page Private, as
>> it just set page->private to a constant value, unlike subpage case which
>> stores important value.
>>
>> The fix will just re-set page Private and needed structures in
>> btrfs_dirty_page(), under extent locked so no btrfs_releasepage() is
>> able to release it anymore.
> 
> With above fix I see a different issue with below signature.
> 
> [  130.272410] BTRFS warning (device loop2): read-write for sector size 4096 with page size 65536 is experimental
> [  130.387470] run fstests generic/095 at 2021-04-16 05:04:09
> [  132.042532] BTRFS: device fsid 642daee0-165a-4271-b6f3-728f215c5348 devid 1 transid 5 /dev/loop3 scanned by mkfs.btrfs (5226)
> [  132.146892] BTRFS info (device loop3): disk space caching is enabled
> [  132.147831] BTRFS info (device loop3): has skinny extents
> [  132.148491] BTRFS warning (device loop3): read-write for sector size 4096 with page size 65536 is experimental
> [  132.158228] BTRFS info (device loop3): checking UUID tree
> [  133.931695] BUG: spinlock bad magic on CPU#4, swapper/4/0
> [  133.932874] BUG: Unable to handle kernel data access on write at 0x6b6b6b6b6b6b725b

That looks like some poisoned memory.

I have run 128 runs of generic/095 locally on my Arm board during the 
fix, unable to reproduce the crash anymore.

And this call site is even harder to get race, as in endio context, the 
page still has PageWriteback until the last bio finished in the page.

This means btrfs_releasepage() will not even try to release the page, 
while btrfs_invalidatepage() will wait the page to finish its writeback 
before doing anything.

So this is very strange to me.

Any reproducibility on your side? Or something specific to Power is 
related to this case? (IIRC some page flag operation is not atomic, 
maybe that is related?)

Thanks,
Qu
> [  133.934432] Faulting instruction address: 0xc000000000283654
> cpu 0x4: Vector: 380 (Data SLB Access) at [c000000007937160]
>      pc: c000000000283654: spin_dump+0x70/0xbc
>      lr: c000000000283638: spin_dump+0x54/0xbc
>      sp: c000000007937400
>     msr: 8000000000001033
>     dar: 6b6b6b6b6b6b725b
>    current = 0xc000000007913300
>    paca    = 0xc00000003fff9c00   irqmask: 0x03   irq_happened: 0x05
>      pid   = 0, comm = swapper/4
> Linux version 5.12.0-rc7-02317-g61d9ec0f765 (riteshh@ltctulc6a-p1) (gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.30) #74 SMP Thu Apr 15 23:52:56 CDT 2021
> enter ? for help
> [c000000007937470] c000000000283078 do_raw_spin_unlock+0x88/0x230
> [c0000000079374a0] c0000000012b1e14 _raw_spin_unlock_irqrestore+0x44/0x90
> [c0000000079374d0] c000000000a918dc btrfs_subpage_clear_writeback+0xac/0xe0
> [c000000007937530] c0000000009e0458 end_bio_extent_writepage+0x158/0x270
> [c0000000079375f0] c000000000b6fd14 bio_endio+0x254/0x270
> [c000000007937630] c0000000009fc0f0 btrfs_end_bio+0x1a0/0x200
> [c000000007937670] c000000000b6fd14 bio_endio+0x254/0x270
> [c0000000079376b0] c000000000b781fc blk_update_request+0x46c/0x670
> [c000000007937760] c000000000b8b394 blk_mq_end_request+0x34/0x1d0
> [c0000000079377a0] c000000000d82d1c lo_complete_rq+0x11c/0x140
> [c0000000079377d0] c000000000b880a4 blk_complete_reqs+0x84/0xb0
> [c000000007937800] c0000000012b2ca4 __do_softirq+0x334/0x680
> [c000000007937910] c0000000001dd878 irq_exit+0x148/0x1d0
> [c000000007937940] c000000000016f4c do_IRQ+0x20c/0x240
> [c0000000079379d0] c000000000009240 hardware_interrupt_common_virt+0x1b0/0x1c0
> 
> 
> 
> 
>>
>> The fix is already added to the github branch.
>> Now it has the fix as the HEAD.
>>
>> I hope this won't damage your confidence on the patchset.
>>
>> Thanks for the report!
>> Qu
>>
>>>
>>> Thanks for the report,
>>> Qu
>>>>
>>>> BTW, are using running the latest branch, with this commit at top?
> 
> Yes. Below branch.
> https://github.com/adam900710/linux/commits/subpage
> 
> -ritesh
> 
>>>>
>>>> commit 3490dae50c01cec04364e5288f43ae9ac9eca2c9
>>>> Author: Qu Wenruo <wqu@suse.com>
>>>> Date:   Mon Feb 22 14:19:38 2021 +0800
>>>>
>>>>      btrfs: allow read-write for 4K sectorsize on 64K page sizesystems
>>>>
>>>> As I was updating the patchset until the last minute.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>> [c0000000260d7880] c0000000009c7298 btrfs_buffered_write+0x488/0x7f0
>>>>> [c0000000260d79d0] c0000000009cbeb4 btrfs_file_write_iter+0x314/0x520
>>>>> [c0000000260d7a50] c00000000055fd84 do_iter_readv_writev+0x1b4/0x260
>>>>> [c0000000260d7ac0] c00000000056114c do_iter_write+0xdc/0x2c0
>>>>> [c0000000260d7b10] c0000000005c2d2c iter_file_splice_write+0x2ec/0x510
>>>>> [c0000000260d7c30] c0000000005c1ba0 do_splice_from+0x50/0x70
>>>>> [c0000000260d7c50] c0000000005c37e8 do_splice+0x5a8/0x910
>>>>> [c0000000260d7cd0] c0000000005c3ce0 sys_splice+0x190/0x300
>>>>> [c0000000260d7d60] c000000000039ba4 system_call_exception+0x384/0x3d0
>>>>> [c0000000260d7e10] c00000000000d45c system_call_common+0xec/0x278
>>>>> --- Exception: c00 (System Call) at 00007ffff72ef170
>>>>>
>>>>>
>>>>> -ritesh
>>>>>
> 


  reply	other threads:[~2021-04-16  6:14 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  7:14 [PATCH v3 00/13] btrfs: support read-write for subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 01/13] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
2021-03-25 14:41   ` Anand Jain
2021-03-29 18:20     ` David Sterba
2021-04-01 22:32       ` Anand Jain
2021-04-01 17:56   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 02/13] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 03/13] btrfs: remove unnecessary variable shadowing " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 04/13] btrfs: refactor how we iterate ordered extent " Qu Wenruo
2021-04-02  1:15   ` Anand Jain
2021-04-02  3:33     ` Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 05/13] btrfs: introduce helpers for subpage dirty status Qu Wenruo
2021-04-01 18:11   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 06/13] btrfs: introduce helpers for subpage writeback status Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 07/13] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 08/13] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 09/13] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 10/13] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 11/13] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 12/13] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 13/13] btrfs: add subpage overview comments Qu Wenruo
2021-03-25 12:20 ` [PATCH v3 00/13] btrfs: support read-write for subpage metadata Neal Gompa
2021-03-25 13:16   ` Qu Wenruo
2021-03-28 20:02     ` Ritesh Harjani
2021-03-29  2:01       ` Qu Wenruo
2021-04-02  1:39         ` Anand Jain
2021-04-02  3:26           ` Qu Wenruo
2021-04-02  8:33         ` Ritesh Harjani
2021-04-02  8:36           ` Qu Wenruo
2021-04-02  8:46             ` Ritesh Harjani
2021-04-02  8:52               ` Qu Wenruo
2021-04-12 11:33                 ` Qu Wenruo
2021-04-15  3:44                   ` riteshh
2021-04-15 14:52                     ` riteshh
2021-04-15 23:19                       ` Qu Wenruo
2021-04-15 23:34                         ` Qu Wenruo
2021-04-16  1:34                           ` Qu Wenruo
2021-04-16  5:50                             ` riteshh
2021-04-16  6:14                               ` Qu Wenruo [this message]
2021-04-16 16:52                                 ` riteshh
2021-04-19  5:59                                   ` riteshh
2021-04-19  6:16                                     ` Qu Wenruo
2021-04-19  7:04                                       ` riteshh
2021-04-19  7:19                                       ` Qu Wenruo
2021-04-19 13:24                                         ` Qu Wenruo
2021-04-21  7:03                                           ` riteshh
2021-04-21  7:15                                             ` Qu Wenruo
2021-04-21  7:30                                             ` riteshh
2021-04-21  8:26                                               ` Qu Wenruo
2021-04-21 11:13                                                 ` riteshh
2021-04-21 11:42                                                   ` Qu Wenruo
2021-04-21 12:15                                                     ` riteshh
2021-03-29 18:53 ` David Sterba
2021-04-01  5:36   ` Qu Wenruo
2021-04-01 17:55     ` David Sterba
2021-04-02  1:27     ` Anand Jain
2021-04-03 11:08 ` David Sterba
2021-04-05  6:14   ` Qu Wenruo
2021-04-06  2:31     ` Anand Jain
2021-04-06 19:20       ` David Sterba
2021-04-06 23:59       ` Qu Wenruo
2021-04-06 19:13     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5478e5e-9be4-bc32-d5e1-eaaa3f2b63a9@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=ritesh.list@gmail.com \
    --cc=riteshh@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.