linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Song Liu <song@kernel.org>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	William Kucharski <william.kucharski@oracle.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Wed, 29 Sep 2021 22:24:44 -0700 (PDT)	[thread overview]
Message-ID: <3d264ed9-f8fd-60d4-7125-f9f745ebeb52@google.com> (raw)
In-Reply-To: <CAPhsuW4_-ju9QgB7J4imrhQvH6ZqoOkVtVOVX11Yk_ZRakwQ+A@mail.gmail.com>

On Wed, 29 Sep 2021, Song Liu wrote:
> On Wed, Sep 29, 2021 at 6:54 PM Rongwei Wang
> <rongwei.wang@linux.alibaba.com> wrote:
> > On 9/30/21 7:41 AM, Song Liu wrote:
> > > On Wed, Sep 29, 2021 at 10:56 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >>
> > > [...]
> > >>> Now, I am able to crash the system on
> > >>>      find_lock_entries () {
> > >>>       ...
> > >>>         VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
> > >>>      }
> > >>> I guess it is related. I will test more.
> > >>
> > >> That's a bogus VM_BUG_ON.  I have a patch in my tree to delete it.
> > >> Andrew has it too, but for some reason, he hasn't sent it on to Linus.

I think Andrew has held back from sending it on to Linus because I pointed
out that the example Matthew cited (shmem and swap cache) was wrong, and
could not explain it: we wanted to understand what syzbot had actually
hit before sending on.

Would this bug be a good explanation for it?

In the meantime, independently, I was looking at fuse_try_move_page(),
which appears to do the old splice thievery that got disabled from splice
itself, stealing a page from one mapping to put into another.  I suspect
that could result in a page->index != xas.xa_index too (outside page lock).

> > >>
> > >> +++ b/mm/filemap.c
> > >> @@ -2093,7 +2093,6 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
> > >>                  if (!xa_is_value(page)) {
> > >>                          if (page->index < start)
> > >>                                  goto put;
> > >> -                       VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
> > >>                          if (page->index + thp_nr_pages(page) - 1 > end)
> > >>                                  goto put;
> > >>                          if (!trylock_page(page))
> > >
> > > Yes, after removing this line, I am able to see the same bug.
> > >
> > > Here is my finding so far:
> > >
> > > The issue is NOT caused by concurrent khugepaged:collapse_file() and
> > > truncate_pagecache(inode, 0). With some printks, we can see a clear
> > > time gap (>2 second )  between collapse_file() finishes, and
> > > truncate_pagecache() (which crashes soon). Therefore, my earlier
> > > suggestion that adds deny_write_access() to collapse_file() does NOT
> > > work.
> > >
> > > The crash is actually caused by concurrent truncate_pagecache(inode, 0).
> > > If I change the number of write thread in stress_madvise_dso.c to one,
> > > (IOW, one thread_read and one thread_write), I cannot reproduce the
> > > crash anymore.
> > Whether CONFIG_DEBUG_VM is enabled in your vm?
> >
> > I think the second possibility mentioned above will been found if you
> > enable CONFIG_DEBUG_VM:
> >
> > 1) multiple writers truncate the same page cache concurrently;
> > 2) collapse_file rolls back when writer truncates the page cache;
> >
> > The following log will be print after enable CONFIG_DEBUG_VM:
> >
> > [22216.789904]  do_idle+0xb4/0x104
> > [22216.789906]  cpu_startup_entry+0x34/0x9c
> > [22216.790144] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS
> > 0.0.0 02/06/2015
> > [22216.790553]  secondary_start_kernel+0x104/0x180
> > [22216.790778] Call trace:
> > [22216.791300] Code: d4210000 b0006161 910d4021 94013b45 (d4210000)
> > [22216.791662]  dump_backtrace+0x0/0x1ec
> > [22216.791664]  show_stack+0x24/0x30
> > [22216.791956] ---[ end trace dc769a61c1af087b ]---
> > [22216.792295]  dump_stack+0xd0/0x128
> > [22216.792299]  bad_page+0xe4/0x110
> > [22216.792579] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > in interrupt
> > [22216.792937]  check_free_page_bad+0x84/0x90
> > [22216.792940]  free_pcp_prepare+0x1fc/0x21c
> > [22216.793253] SMP: stopping secondary CPUs
> > [22216.793525]  free_unref_page+0x2c/0xec
> > [22216.805537]  __put_page+0x60/0x70
> > [22216.805931]  collapse_file+0xdc8/0x12f0
> > [22216.806385]  khugepaged_scan_file+0x2dc/0x37c
> > [22216.806900]  khugepaged_scan_mm_slot+0x2e0/0x380
> > [22216.807450]  khugepaged_do_scan+0x2dc/0x2fc
> > [22216.807946]  khugepaged+0x38/0x100
> > [22216.808342]  kthread+0x11c/0x120
> > [22216.808735] Kernel Offset: disabled
> > [22216.809153] CPU features: 0x0040002,62208238
> > [22216.809681] Memory Limit: none
> > [22216.813477] Starting crashdump kernel...
> >
> > So I think the race also exists between collapse_file and
> > truncate_pagecache.
> 
> I do have CONFIG_DEBUG_VM, but I haven't hit this issue yet.

Sorry, it's taken me a long time to get into this bug(s).

I haven't tried to reproduce it, but I do think that Rongwei will
be proved right.

In doing a lockless lookup of the page cache, we tend to imagine
that the THP head page will be encountered first, and the special
treatment for the head will do the right thing to skip the tails.

But when racing against collapse_file()'s rewind, I agree with
Rongwei that it is possible for truncation (or page cache cleanup
in this instance) to collect a pagevec which starts with a PageTail
some way into the compound page.

shmem_undo_range() (which shmem uses rather than truncate_pagecache())
would not call truncate_inode_page() on a THP tail: if it encounters a
tail, it will try to split the THP, and not call truncate_inode_page()
if it cannot.  Unless I'm inventing the memory, I now think that I did
encounter this race between truncate and collapse on huge shmem, and
had to re-craft my shmem_punch_compound() to handle it correctly.

If it is just a matter of collapse_file() rewind, I suppose we could
reverse the direction in which that proceeds; but I'm not convinced
that would be enough, and don't think we need such a "big" change.

Aside from the above page->index mischeck in find_lock_entries(),
I now think this bug needs nothing more than simply removing the
VM_BUG_ON_PAGE(PageTail(page), page) from truncate_inode_page().

(You could say move its page->mapping check before its PageTail
check; but the PageTail check would then never catch anything.)

Rongwei's patch went in the right direction, but why add N lines
if deleting one is good?  And for a while I thought that idea of
using filemap_invalidate_lock() was very good; but now think
the page lock we already have is better, and solve both races.

Hugh

  reply	other threads:[~2021-09-30  5:25 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07  2:49   ` Yu Xu
2021-09-07 18:08   ` Yang Shi
     [not found]     ` <38AF4DC8-5E6F-4568-B2E3-0434BD847BC9@linux.alibaba.com>
2021-09-08 21:48       ` Yang Shi
2021-09-13 14:49   ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56   ` Yang Shi
     [not found]     ` <44BE85B4-692C-41E8-B5A0-C1E0B0272ACD@linux.alibaba.com>
2021-09-08 21:51       ` Yang Shi
2021-09-22  7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37   ` Matthew Wilcox
2021-09-22 17:04     ` Rongwei Wang
2021-09-24  2:43       ` Andrew Morton
2021-09-24  3:08         ` Yang Shi
2021-09-24  3:35         ` Rongwei Wang
2021-09-24  7:12         ` Rongwei Wang
2021-09-27 22:24           ` Song Liu
2021-09-28 12:06             ` Matthew Wilcox
2021-09-28 16:59               ` Song Liu
2021-09-28 16:20             ` Rongwei Wang
2021-09-29  7:14               ` Song Liu
2021-09-29  7:50                 ` Rongwei Wang
2021-09-29 16:59                   ` Song Liu
2021-09-29 17:55                     ` Matthew Wilcox
2021-09-29 23:41                       ` Song Liu
2021-09-30  0:00                         ` Matthew Wilcox
2021-09-30  0:41                           ` Song Liu
2021-09-30  2:14                             ` Rongwei Wang
2021-10-04 17:26                             ` Rongwei Wang
2021-10-04 19:05                               ` Matthew Wilcox
2021-10-05  1:58                                 ` Rongwei Wang
2021-10-04 20:26                               ` Song Liu
2021-10-05  2:58                               ` Hugh Dickins
2021-10-05  3:07                                 ` Matthew Wilcox
2021-10-05  9:03                                 ` Rongwei Wang
2021-09-30  1:54                         ` Rongwei Wang
2021-09-30  3:26                           ` Song Liu
2021-09-30  5:24                             ` Hugh Dickins [this message]
2021-09-30 15:28                               ` Matthew Wilcox
2021-09-30 16:49                                 ` Hugh Dickins
2021-09-30 17:39                                   ` Yang Shi
2021-10-02 17:08                                     ` Matthew Wilcox
2021-10-04 18:28                                       ` Yang Shi
2021-10-04 19:31                                         ` Matthew Wilcox
2021-10-05  2:26                                           ` Hugh Dickins
2021-10-02  2:22                                   ` Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:41     ` Matthew Wilcox
2021-10-06  8:39       ` Rongwei Wang
2021-10-06 17:58     ` Yang Shi
2021-10-11  2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13  7:55     ` Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11  3:08     ` Matthew Wilcox
2021-10-11  3:22       ` Rongwei Wang
2021-10-11  5:08     ` [PATCH v4 RESEND " Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d264ed9-f8fd-60d4-7125-f9f745ebeb52@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=song@kernel.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).