linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Song Liu <song@kernel.org>, Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	William Kucharski <william.kucharski@oracle.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Thu, 30 Sep 2021 09:54:04 +0800	[thread overview]
Message-ID: <67906bf5-4de9-8433-3d70-cc8fc5cc2347@linux.alibaba.com> (raw)
In-Reply-To: <CAPhsuW6_2_LxQRrs7xF3omgO22+6goDR=bEjKGRopaS-pHJB2Q@mail.gmail.com>



On 9/30/21 7:41 AM, Song Liu wrote:
> On Wed, Sep 29, 2021 at 10:56 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
> [...]
>>> Now, I am able to crash the system on
>>>      find_lock_entries () {
>>>       ...
>>>         VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
>>>      }
>>> I guess it is related. I will test more.
>>
>> That's a bogus VM_BUG_ON.  I have a patch in my tree to delete it.
>> Andrew has it too, but for some reason, he hasn't sent it on to Linus.
>>
>> +++ b/mm/filemap.c
>> @@ -2093,7 +2093,6 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
>>                  if (!xa_is_value(page)) {
>>                          if (page->index < start)
>>                                  goto put;
>> -                       VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
>>                          if (page->index + thp_nr_pages(page) - 1 > end)
>>                                  goto put;
>>                          if (!trylock_page(page))
> 
> Yes, after removing this line, I am able to see the same bug.
> 
> Here is my finding so far:
> 
> The issue is NOT caused by concurrent khugepaged:collapse_file() and
> truncate_pagecache(inode, 0). With some printks, we can see a clear
> time gap (>2 second )  between collapse_file() finishes, and
> truncate_pagecache() (which crashes soon). Therefore, my earlier
> suggestion that adds deny_write_access() to collapse_file() does NOT
> work.
> 
> The crash is actually caused by concurrent truncate_pagecache(inode, 0).
> If I change the number of write thread in stress_madvise_dso.c to one,
> (IOW, one thread_read and one thread_write), I cannot reproduce the
> crash anymore.
Whether CONFIG_DEBUG_VM is enabled in your vm?

I think the second possibility mentioned above will been found if you 
enable CONFIG_DEBUG_VM:

1) multiple writers truncate the same page cache concurrently;
2) collapse_file rolls back when writer truncates the page cache;

The following log will be print after enable CONFIG_DEBUG_VM:

[22216.789904]  do_idle+0xb4/0x104
[22216.789906]  cpu_startup_entry+0x34/0x9c
[22216.790144] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 
0.0.0 02/06/2015
[22216.790553]  secondary_start_kernel+0x104/0x180
[22216.790778] Call trace:
[22216.791300] Code: d4210000 b0006161 910d4021 94013b45 (d4210000)
[22216.791662]  dump_backtrace+0x0/0x1ec
[22216.791664]  show_stack+0x24/0x30
[22216.791956] ---[ end trace dc769a61c1af087b ]---
[22216.792295]  dump_stack+0xd0/0x128
[22216.792299]  bad_page+0xe4/0x110
[22216.792579] Kernel panic - not syncing: Oops - BUG: Fatal exception 
in interrupt
[22216.792937]  check_free_page_bad+0x84/0x90
[22216.792940]  free_pcp_prepare+0x1fc/0x21c
[22216.793253] SMP: stopping secondary CPUs
[22216.793525]  free_unref_page+0x2c/0xec
[22216.805537]  __put_page+0x60/0x70
[22216.805931]  collapse_file+0xdc8/0x12f0
[22216.806385]  khugepaged_scan_file+0x2dc/0x37c
[22216.806900]  khugepaged_scan_mm_slot+0x2e0/0x380
[22216.807450]  khugepaged_do_scan+0x2dc/0x2fc
[22216.807946]  khugepaged+0x38/0x100
[22216.808342]  kthread+0x11c/0x120
[22216.808735] Kernel Offset: disabled
[22216.809153] CPU features: 0x0040002,62208238
[22216.809681] Memory Limit: none
[22216.813477] Starting crashdump kernel...

So I think the race also exists between collapse_file and 
truncate_pagecache.

> 
> I think this means we cannot fix this issue in collapse_file(), because it
> finishes long before the crash.
> 
> Thanks,
> Song
> 

  parent reply	other threads:[~2021-09-30  1:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07  2:49   ` Yu Xu
2021-09-07 18:08   ` Yang Shi
     [not found]     ` <38AF4DC8-5E6F-4568-B2E3-0434BD847BC9@linux.alibaba.com>
2021-09-08 21:48       ` Yang Shi
2021-09-13 14:49   ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56   ` Yang Shi
     [not found]     ` <44BE85B4-692C-41E8-B5A0-C1E0B0272ACD@linux.alibaba.com>
2021-09-08 21:51       ` Yang Shi
2021-09-22  7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37   ` Matthew Wilcox
2021-09-22 17:04     ` Rongwei Wang
2021-09-24  2:43       ` Andrew Morton
2021-09-24  3:08         ` Yang Shi
2021-09-24  3:35         ` Rongwei Wang
2021-09-24  7:12         ` Rongwei Wang
2021-09-27 22:24           ` Song Liu
2021-09-28 12:06             ` Matthew Wilcox
2021-09-28 16:59               ` Song Liu
2021-09-28 16:20             ` Rongwei Wang
2021-09-29  7:14               ` Song Liu
2021-09-29  7:50                 ` Rongwei Wang
2021-09-29 16:59                   ` Song Liu
2021-09-29 17:55                     ` Matthew Wilcox
2021-09-29 23:41                       ` Song Liu
2021-09-30  0:00                         ` Matthew Wilcox
2021-09-30  0:41                           ` Song Liu
2021-09-30  2:14                             ` Rongwei Wang
2021-10-04 17:26                             ` Rongwei Wang
2021-10-04 19:05                               ` Matthew Wilcox
2021-10-05  1:58                                 ` Rongwei Wang
2021-10-04 20:26                               ` Song Liu
2021-10-05  2:58                               ` Hugh Dickins
2021-10-05  3:07                                 ` Matthew Wilcox
2021-10-05  9:03                                 ` Rongwei Wang
2021-09-30  1:54                         ` Rongwei Wang [this message]
2021-09-30  3:26                           ` Song Liu
2021-09-30  5:24                             ` Hugh Dickins
2021-09-30 15:28                               ` Matthew Wilcox
2021-09-30 16:49                                 ` Hugh Dickins
2021-09-30 17:39                                   ` Yang Shi
2021-10-02 17:08                                     ` Matthew Wilcox
2021-10-04 18:28                                       ` Yang Shi
2021-10-04 19:31                                         ` Matthew Wilcox
2021-10-05  2:26                                           ` Hugh Dickins
2021-10-02  2:22                                   ` Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:41     ` Matthew Wilcox
2021-10-06  8:39       ` Rongwei Wang
2021-10-06 17:58     ` Yang Shi
2021-10-11  2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13  7:55     ` Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11  3:08     ` Matthew Wilcox
2021-10-11  3:22       ` Rongwei Wang
2021-10-11  5:08     ` [PATCH v4 RESEND " Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67906bf5-4de9-8433-3d70-cc8fc5cc2347@linux.alibaba.com \
    --to=rongwei.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=song@kernel.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).