From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>, Hao Sun <sunhao.th@gmail.com>,
Hugh Dickins <hughd@google.com>, Song Liu <song@kernel.org>,
Rongwei Wang <rongwei.wang@linux.alibaba.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
William Kucharski <william.kucharski@oracle.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Mon, 4 Oct 2021 19:26:23 -0700 (PDT) [thread overview]
Message-ID: <a07564a3-b2fc-9ffe-3ace-3f276075ea5c@google.com> (raw)
In-Reply-To: <YVtWhVNFhLbA9+Tl@casper.infradead.org>
On Mon, 4 Oct 2021, Matthew Wilcox wrote:
> On Mon, Oct 04, 2021 at 11:28:45AM -0700, Yang Shi wrote:
> > On Sat, Oct 2, 2021 at 10:09 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Thu, Sep 30, 2021 at 10:39:14AM -0700, Yang Shi wrote:
> > > > On Thu, Sep 30, 2021 at 9:49 AM Hugh Dickins <hughd@google.com> wrote:
> > > > > I assume you're thinking of one of the fuzzer blkdev ones:
> > > > > https://lore.kernel.org/linux-mm/CACkBjsbtF_peC7N_4mRfHML_BeiPe+O9DahTfr84puSG_J9rcQ@mail.gmail.com/
> > > > > or
> > > > > https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/
> > > > >
> > > > > I haven't started on those ones yet: yes, I imagine one or both of those
> > > > > will need a further fix (S_ISREG() check somewhere if we're lucky; but
> > > > > could well be nastier); but for the bug in this thread, I expect
> > > >
> > > > Makes sense to me. We should be able to check S_ISREG() in khugepaged,
> > > > if it is not a regular file, just bail out. Sounds not that nasty to
> > > > me AFAIU.
> > >
> > > I don't see why we should have an S_ISREG() check. I agree it's not the
> > > intended usecase, but it ought to work fine. Unless there's something
> > > I'm missing?
> >
> > Check out this bug report:
> > https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/
> > and the patch from me:
> > https://lore.kernel.org/linux-mm/20210917205731.262693-1-shy828301@gmail.com/
> >
> > I don't think we handle buffers correctly for file THP, right? My
> > patch is ad hoc, so I thought Hugh's suggestion makes some sense to
> > me. Why do we have THP collapsed for unintended usecase in the first
> > place?
>
> OK, I've done some more digging. I think what's going on with this
> report is userspace opens the block device RO, causes the page cache to
> be loaded with data, then khugepaged comes in and creates THPs.
Yes.
> What confuses me is that these THPs have private data attached to them.
> I don't know how that happens. If it's block device specific, then
> yes, something like your S_ISREG() idea should work fine. Otherwise,
> we might need to track down another problem.
Agreed, the file THP is created without PagePrivate, so the puzzle was
why the read-only cached page would later become page_has_private().
The C repro showed that it uses (a BTRFS_IOC_ADD_DEV ioctl which might
not be relevant and) a BLKRRPART ioctl 0x125f: I didn't follow BLKRRPART
all the way down, but imagine it has to attach buffer-heads to re-read
the partition table. Which would explain it.
Aside from that particular ioctl, it seems a good idea to insist on
S_ISREG just to shrink the attack surface: as Yang Shi says, executable
THP on block device was never an intended usecase, and not a usecase
anyone is likely to miss! And that fuzzer appears to delight in
tormenting /dev/nullb0, so let's just seal off that avenue.
You're right to have some doubt, as to whether there might be other
ways for buffer-heads to get attached, even on a read-only regular
file; but no way has sprung to my mind, and READ_ONLY_THP_FOR_FS has
survived well in its intended usage: so I think we should proceed on
the assumption that no further bugs remain - then fix them when found.
I wasn't able to reproduce the problem with the repro, would need to
waste many hours to do so. But here's the untested S_ISREG patch I
came up with. Sorry, I've mixed something else in: in moving the
alignment part to clarify the conditions, I was alarmed to see that
shmem with !shmem_huge_enabled was falling through to THP_FOR_FS to
give unexpected huge pages: fixed that, though later found there's
a separate shmem_huge_enabled() check which should exclude it.
--- 5.15-rc4/mm/khugepaged.c 2021-09-12 17:39:21.943438422 -0700
+++ linux/khugepaged.c 2021-10-03 20:41:13.194822795 -0700
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
+ if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+ vma->vm_pgoff, HPAGE_PMD_NR))
+ return false;
+
/* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
- }
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
/* Read-only file mappings need to be aligned for THP to work. */
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
- (vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ (vm_flags & VM_EXEC) && vma->vm_file) {
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
next prev parent reply other threads:[~2021-10-05 2:26 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07 2:49 ` Yu Xu
2021-09-07 18:08 ` Yang Shi
[not found] ` <38AF4DC8-5E6F-4568-B2E3-0434BD847BC9@linux.alibaba.com>
2021-09-08 21:48 ` Yang Shi
2021-09-13 14:49 ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56 ` Yang Shi
[not found] ` <44BE85B4-692C-41E8-B5A0-C1E0B0272ACD@linux.alibaba.com>
2021-09-08 21:51 ` Yang Shi
2021-09-22 7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22 7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37 ` Matthew Wilcox
2021-09-22 17:04 ` Rongwei Wang
2021-09-24 2:43 ` Andrew Morton
2021-09-24 3:08 ` Yang Shi
2021-09-24 3:35 ` Rongwei Wang
2021-09-24 7:12 ` Rongwei Wang
2021-09-27 22:24 ` Song Liu
2021-09-28 12:06 ` Matthew Wilcox
2021-09-28 16:59 ` Song Liu
2021-09-28 16:20 ` Rongwei Wang
2021-09-29 7:14 ` Song Liu
2021-09-29 7:50 ` Rongwei Wang
2021-09-29 16:59 ` Song Liu
2021-09-29 17:55 ` Matthew Wilcox
2021-09-29 23:41 ` Song Liu
2021-09-30 0:00 ` Matthew Wilcox
2021-09-30 0:41 ` Song Liu
2021-09-30 2:14 ` Rongwei Wang
2021-10-04 17:26 ` Rongwei Wang
2021-10-04 19:05 ` Matthew Wilcox
2021-10-05 1:58 ` Rongwei Wang
2021-10-04 20:26 ` Song Liu
2021-10-05 2:58 ` Hugh Dickins
2021-10-05 3:07 ` Matthew Wilcox
2021-10-05 9:03 ` Rongwei Wang
2021-09-30 1:54 ` Rongwei Wang
2021-09-30 3:26 ` Song Liu
2021-09-30 5:24 ` Hugh Dickins
2021-09-30 15:28 ` Matthew Wilcox
2021-09-30 16:49 ` Hugh Dickins
2021-09-30 17:39 ` Yang Shi
2021-10-02 17:08 ` Matthew Wilcox
2021-10-04 18:28 ` Yang Shi
2021-10-04 19:31 ` Matthew Wilcox
2021-10-05 2:26 ` Hugh Dickins [this message]
2021-10-02 2:22 ` Rongwei Wang
2021-09-22 7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06 2:41 ` Matthew Wilcox
2021-10-06 8:39 ` Rongwei Wang
2021-10-06 17:58 ` Yang Shi
2021-10-11 2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11 2:22 ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13 7:55 ` Rongwei Wang
2021-10-11 2:22 ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11 3:08 ` Matthew Wilcox
2021-10-11 3:22 ` Rongwei Wang
2021-10-11 5:08 ` [PATCH v4 RESEND " Rongwei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a07564a3-b2fc-9ffe-3ace-3f276075ea5c@google.com \
--to=hughd@google.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rongwei.wang@linux.alibaba.com \
--cc=shy828301@gmail.com \
--cc=song@kernel.org \
--cc=sunhao.th@gmail.com \
--cc=william.kucharski@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).