All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: riteshh <riteshh@linux.ibm.com>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	Neal Gompa <ngompa13@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v3 00/13] btrfs: support read-write for subpage metadata
Date: Wed, 21 Apr 2021 16:26:49 +0800	[thread overview]
Message-ID: <ca952f24-d0cd-fda7-c9c5-85eba3e7d04a@suse.com> (raw)
In-Reply-To: <20210421073020.fwu7a5t4ihl7gzao@riteshh-domain>



On 2021/4/21 下午3:30, riteshh wrote:
> On 21/04/21 12:33PM, riteshh wrote:
>> On 21/04/19 09:24PM, Qu Wenruo wrote:
>>> [...]
>>>>>
>>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>>> index 45ec3f5ef839..49f78d643392 100644
>>>>> --- a/fs/btrfs/file.c
>>>>> +++ b/fs/btrfs/file.c
>>>>> @@ -1341,7 +1341,17 @@ static int prepare_uptodate_page(struct inode
>>>>> *inode,
>>>>>                          unlock_page(page);
>>>>>                          return -EIO;
>>>>>                  }
>>>>> -               if (page->mapping != inode->i_mapping) {
>>>>> +
>>>>> +               /*
>>>>> +                * Since btrfs_readpage() will get the page unlocked, we
>>>>> have
>>>>> +                * a window where fadvice() can try to release the page.
>>>>> +                * Here we check both inode mapping and PagePrivate() to
>>>>> +                * make sure the page is not released.
>>>>> +                *
>>>>> +                * The priavte flag check is essential for subpage as we
>>>>> need
>>>>> +                * to store extra bitmap using page->private.
>>>>> +                */
>>>>> +               if (page->mapping != inode->i_mapping ||
>>>>> PagePrivate(page)) {
>>>>    ^ Obviously it should be !PagePrivate(page).
>>>
>>> Hi Ritesh,
>>>
>>> Mind to have another try on generic/095?
>>>
>>> This time the branch is updated with the following commit at top:
>>>
>>> commit d700b16dced6f2e2b47e1ca5588a92216ce84dfb (HEAD -> subpage,
>>> github/subpage)
>>> Author: Qu Wenruo <wqu@suse.com>
>>> Date:   Mon Apr 19 13:41:31 2021 +0800
>>>
>>>      btrfs: fix a crash caused by race between prepare_pages() and
>>>      btrfs_releasepage()
>>>
>>> The fix uses the PagePrivate() check to avoid the problem, and passes
>>> several generic/auto loops without any sign of crash.
>>>
>>> But considering I always have difficult in reproducing the bug with previous
>>> improper fix, your verification would be very helpful.
>>>
>>
>> Hi Qu,
>>
>> Thanks for the patch. I did try above patch but even with this I could still
>> reproduce the issue.
>>
>> 1. I think the original problem could be due to below logs.
>> 	[   79.079641] run fstests generic/095 at 2021-04-21 06:46:23
>> 	<...>
>> 	[   83.634710] Page cache invalidation failure on direct I/O.  Possible data corruption due to collision with buffered I/O!
>>
>> Meaning, there might be a race here between DIO and buffered IO.
>> So from DIO path we call invalidate_inode_pages2_range(). Somehow this maybe
>> causing call of btrfs_releasepage().
>>
>> Now from code, invalidate_inode_pages2_range() can be called from both
>> __iomap_dio_rw() and from iomap_dio_complete(). So it is not clear as to from
>> where this might be triggering this bug.
> 
> I think I got one of the problem.
> 1. we use page->private pointer as btrfs_subpage struct which also happens to
>     hold spinlock within it.
> 
>     Now in btrfs_subpage_clear_writeback()
>     -> we take this spinlock  spin_lock_irqsave(&subpage->lock, flags);
>     -> we call end_page_writeback(page);
>     		  -> this may end up waking up invalidate_inode_pages2_range()
> 		  which is waiting for writeback to complete.
> 			  -> this then may also call btrfs_releasepage() on the
> 			  same page and also free the subpage structure.

This indeeds looks like a problem.

This really means we need to have such a small race window below:
(btrfs_invalidatepage() doesn't seem to be possible to race considering
  how much work needed to be done in that function)

	Thread 1		|		Thread 2
--------------------------------+------------------------------------
  end_bio_extent_writepage()	| btrfs_releasepage()
  |- spin_lock_irqsave()		| |
  |- end_page_writeback()	| |
  |				| |- if (PageWriteback() ||...)
  |				| |- clear_page_extent_mapped()
  |- spin_unlock_irqrestore().

It looks like my arm boards are not fast enough to trigger the race.

Although it can be fixed by doing the same thing as dirty bit, by 
checking the bitmap first and then call end_page_writeback() with 
spinlock unlocked.

Would you please try the following fix? (based on the latest branch, 
which already has the previous fixes included).

I'm also running the tests on all my arm boards to make sure it doesn't 
cause extra problem, so far so good, but my board is far from fast, thus 
not yet 100% tested.

Thanks,
Qu

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 696485ab68a2..c5abf9745c10 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -420,13 +420,16 @@ void btrfs_subpage_clear_writeback(const struct 
btrfs_fs_info *fs_info,
  {
         struct btrfs_subpage *subpage = (struct btrfs_subpage 
*)page->private;
         u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+       bool finished = false;
         unsigned long flags;

         spin_lock_irqsave(&subpage->lock, flags);
         subpage->writeback_bitmap &= ~tmp;
         if (subpage->writeback_bitmap == 0)
-               end_page_writeback(page);
+               finished = true;
         spin_unlock_irqrestore(&subpage->lock, flags);
+       if (finished)
+               end_page_writeback(page);
  }

  void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info,

> 
>     -> then we call spin_unlock => here the btrfs_subpage structure got freed
>     but we still accessed and hence causing spinlock bug corruption
> 
> <below call traces were observed without any fixes>
> <i.e. tree contained patches till "btrfs: reject raid5/6 fs for subpage">
> [   79.079641] run fstests generic/095 at 2021-04-21 06:46:23
> [   81.118576] BTRFS: device fsid 0450e360-e0ea-4cff-9f84-3c6064437ef6 devid 1 transid 5 /dev/loop3 scanned by mkfs.btrfs (4669)
> [   81.208410] BTRFS info (device loop3): disk space caching is enabled
> [   81.209219] BTRFS info (device loop3): has skinny extents
> [   81.209849] BTRFS warning (device loop3): read-write for sector size 4096 with page size 65536 is experimental
> [   81.219579] BTRFS info (device loop3): checking UUID tree
> [   83.634710] Page cache invalidation failure on direct I/O.  Possible data corruption due to collision with buffered I/O!
> [   83.639921] File: /mnt1/scratch/file1 PID: 221 Comm: kworker/30:1
> [   85.130349] fio (4720) used greatest stack depth: 7808 bytes left
> [   87.022500] BUG: spinlock bad magic on CPU#26, swapper/26/0
> [   87.023457] BUG: Unable to handle kernel data access on write at 0x6b6b6b6b6b6b725b
> [   87.024776] Faulting instruction address: 0xc000000000283654
> cpu 0x1a: Vector: 380 (Data SLB Access) at [c000000007af7160]
>      pc: c000000000283654: spin_dump+0x70/0xbc
>      lr: c000000000283638: spin_dump+0x54/0xbc
>      sp: c000000007af7400
>     msr: 8000000000009033
>     dar: 6b6b6b6b6b6b725b
>    current = 0xc000000007ab9800
>    paca    = 0xc00000003ffc9a00   irqmask: 0x03   irq_happened: 0x01
>      pid   = 0, comm = swapper/26
> Linux version 5.12.0-rc7-02317-gee3f9a64895 (riteshh@ltctulc6a-p1) (gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.30) #78 SMP Wed Apr 21 01:10:41 CDT 2021
> enter ? for help
> [c000000007af7470] c000000000283078 do_raw_spin_unlock+0x88/0x230
> [c000000007af74a0] c0000000012b1e34 _raw_spin_unlock_irqrestore+0x44/0x90
> [c000000007af74d0] c000000000a918fc btrfs_subpage_clear_writeback+0xac/0xe0
> [c000000007af7530] c0000000009e0478 end_bio_extent_writepage+0x158/0x270
> [c000000007af75f0] c000000000b6fd34 bio_endio+0x254/0x270
> [c000000007af7630] c0000000009fc110 btrfs_end_bio+0x1a0/0x200
> [c000000007af7670] c000000000b6fd34 bio_endio+0x254/0x270
> [c000000007af76b0] c000000000b7821c blk_update_request+0x46c/0x670
> [c000000007af7760] c000000000b8b3b4 blk_mq_end_request+0x34/0x1d0
> [c000000007af77a0] c000000000d82d3c lo_complete_rq+0x11c/0x140
> [c000000007af77d0] c000000000b880c4 blk_complete_reqs+0x84/0xb0
> [c000000007af7800] c0000000012b2cc4 __do_softirq+0x334/0x680
> [c000000007af7910] c0000000001dd878 irq_exit+0x148/0x1d0
> [c000000007af7940] c000000000016f4c do_IRQ+0x20c/0x240
> [c000000007af79d0] c000000000009240 hardware_interrupt_common_virt+0x1b0/0x1c0
> 
> -ritesh
> 


  reply	other threads:[~2021-04-21  8:27 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  7:14 [PATCH v3 00/13] btrfs: support read-write for subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 01/13] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
2021-03-25 14:41   ` Anand Jain
2021-03-29 18:20     ` David Sterba
2021-04-01 22:32       ` Anand Jain
2021-04-01 17:56   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 02/13] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 03/13] btrfs: remove unnecessary variable shadowing " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 04/13] btrfs: refactor how we iterate ordered extent " Qu Wenruo
2021-04-02  1:15   ` Anand Jain
2021-04-02  3:33     ` Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 05/13] btrfs: introduce helpers for subpage dirty status Qu Wenruo
2021-04-01 18:11   ` David Sterba
2021-03-25  7:14 ` [PATCH v3 06/13] btrfs: introduce helpers for subpage writeback status Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 07/13] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 08/13] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 09/13] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 10/13] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 11/13] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 12/13] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
2021-03-25  7:14 ` [PATCH v3 13/13] btrfs: add subpage overview comments Qu Wenruo
2021-03-25 12:20 ` [PATCH v3 00/13] btrfs: support read-write for subpage metadata Neal Gompa
2021-03-25 13:16   ` Qu Wenruo
2021-03-28 20:02     ` Ritesh Harjani
2021-03-29  2:01       ` Qu Wenruo
2021-04-02  1:39         ` Anand Jain
2021-04-02  3:26           ` Qu Wenruo
2021-04-02  8:33         ` Ritesh Harjani
2021-04-02  8:36           ` Qu Wenruo
2021-04-02  8:46             ` Ritesh Harjani
2021-04-02  8:52               ` Qu Wenruo
2021-04-12 11:33                 ` Qu Wenruo
2021-04-15  3:44                   ` riteshh
2021-04-15 14:52                     ` riteshh
2021-04-15 23:19                       ` Qu Wenruo
2021-04-15 23:34                         ` Qu Wenruo
2021-04-16  1:34                           ` Qu Wenruo
2021-04-16  5:50                             ` riteshh
2021-04-16  6:14                               ` Qu Wenruo
2021-04-16 16:52                                 ` riteshh
2021-04-19  5:59                                   ` riteshh
2021-04-19  6:16                                     ` Qu Wenruo
2021-04-19  7:04                                       ` riteshh
2021-04-19  7:19                                       ` Qu Wenruo
2021-04-19 13:24                                         ` Qu Wenruo
2021-04-21  7:03                                           ` riteshh
2021-04-21  7:15                                             ` Qu Wenruo
2021-04-21  7:30                                             ` riteshh
2021-04-21  8:26                                               ` Qu Wenruo [this message]
2021-04-21 11:13                                                 ` riteshh
2021-04-21 11:42                                                   ` Qu Wenruo
2021-04-21 12:15                                                     ` riteshh
2021-03-29 18:53 ` David Sterba
2021-04-01  5:36   ` Qu Wenruo
2021-04-01 17:55     ` David Sterba
2021-04-02  1:27     ` Anand Jain
2021-04-03 11:08 ` David Sterba
2021-04-05  6:14   ` Qu Wenruo
2021-04-06  2:31     ` Anand Jain
2021-04-06 19:20       ` David Sterba
2021-04-06 23:59       ` Qu Wenruo
2021-04-06 19:13     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca952f24-d0cd-fda7-c9c5-85eba3e7d04a@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=ritesh.list@gmail.com \
    --cc=riteshh@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.