linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>, Hao Sun <sunhao.th@gmail.com>,
	Hugh Dickins <hughd@google.com>, Song Liu <song@kernel.org>,
	Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	William Kucharski <william.kucharski@oracle.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Mon, 4 Oct 2021 19:26:23 -0700 (PDT)	[thread overview]
Message-ID: <a07564a3-b2fc-9ffe-3ace-3f276075ea5c@google.com> (raw)
In-Reply-To: <YVtWhVNFhLbA9+Tl@casper.infradead.org>

On Mon, 4 Oct 2021, Matthew Wilcox wrote:
> On Mon, Oct 04, 2021 at 11:28:45AM -0700, Yang Shi wrote:
> > On Sat, Oct 2, 2021 at 10:09 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Thu, Sep 30, 2021 at 10:39:14AM -0700, Yang Shi wrote:
> > > > On Thu, Sep 30, 2021 at 9:49 AM Hugh Dickins <hughd@google.com> wrote:
> > > > > I assume you're thinking of one of the fuzzer blkdev ones:
> > > > > https://lore.kernel.org/linux-mm/CACkBjsbtF_peC7N_4mRfHML_BeiPe+O9DahTfr84puSG_J9rcQ@mail.gmail.com/
> > > > > or
> > > > > https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/
> > > > >
> > > > > I haven't started on those ones yet: yes, I imagine one or both of those
> > > > > will need a further fix (S_ISREG() check somewhere if we're lucky; but
> > > > > could well be nastier); but for the bug in this thread, I expect
> > > >
> > > > Makes sense to me. We should be able to check S_ISREG() in khugepaged,
> > > > if it is not a regular file, just bail out. Sounds not that nasty to
> > > > me AFAIU.
> > >
> > > I don't see why we should have an S_ISREG() check.  I agree it's not the
> > > intended usecase, but it ought to work fine.  Unless there's something
> > > I'm missing?
> > 
> > Check out this bug report:
> > https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/
> > and the patch from me:
> > https://lore.kernel.org/linux-mm/20210917205731.262693-1-shy828301@gmail.com/
> > 
> > I don't think we handle buffers correctly for file THP, right? My
> > patch is ad hoc, so I thought Hugh's suggestion makes some sense to
> > me. Why do we have THP collapsed for unintended usecase in the first
> > place?
> 
> OK, I've done some more digging.  I think what's going on with this
> report is userspace opens the block device RO, causes the page cache to
> be loaded with data, then khugepaged comes in and creates THPs.

Yes.

> What confuses me is that these THPs have private data attached to them.
> I don't know how that happens.  If it's block device specific, then
> yes, something like your S_ISREG() idea should work fine.  Otherwise,
> we might need to track down another problem.

Agreed, the file THP is created without PagePrivate, so the puzzle was
why the read-only cached page would later become page_has_private().

The C repro showed that it uses (a BTRFS_IOC_ADD_DEV ioctl which might
not be relevant and) a BLKRRPART ioctl 0x125f: I didn't follow BLKRRPART
all the way down, but imagine it has to attach buffer-heads to re-read
the partition table.  Which would explain it.

Aside from that particular ioctl, it seems a good idea to insist on
S_ISREG just to shrink the attack surface: as Yang Shi says, executable
THP on block device was never an intended usecase, and not a usecase
anyone is likely to miss! And that fuzzer appears to delight in
tormenting /dev/nullb0, so let's just seal off that avenue.

You're right to have some doubt, as to whether there might be other
ways for buffer-heads to get attached, even on a read-only regular
file; but no way has sprung to my mind, and READ_ONLY_THP_FOR_FS has
survived well in its intended usage: so I think we should proceed on
the assumption that no further bugs remain - then fix them when found.

I wasn't able to reproduce the problem with the repro, would need to
waste many hours to do so.  But here's the untested S_ISREG patch I
came up with.  Sorry, I've mixed something else in: in moving the
alignment part to clarify the conditions, I was alarmed to see that
shmem with !shmem_huge_enabled was falling through to THP_FOR_FS to
give unexpected huge pages: fixed that, though later found there's
a separate shmem_huge_enabled() check which should exclude it.

--- 5.15-rc4/mm/khugepaged.c	2021-09-12 17:39:21.943438422 -0700
+++ linux/khugepaged.c	2021-10-03 20:41:13.194822795 -0700
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
 	if (!transhuge_vma_enabled(vma, vm_flags))
 		return false;
 
+	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+					vma->vm_pgoff, HPAGE_PMD_NR))
+		return false;
+
 	/* Enabled via shmem mount options or sysfs settings. */
-	if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
-		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
-				HPAGE_PMD_NR);
-	}
+	if (shmem_file(vma->vm_file))
+		return shmem_huge_enabled(vma);
 
 	/* THP settings require madvise. */
 	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
 		return false;
 
 	/* Read-only file mappings need to be aligned for THP to work. */
-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-	    !inode_is_open_for_write(vma->vm_file->f_inode) &&
-	    (vm_flags & VM_EXEC)) {
-		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
-				HPAGE_PMD_NR);
+	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+	    (vm_flags & VM_EXEC) && vma->vm_file) {
+		struct inode *inode = vma->vm_file->f_inode;
+
+	        return !inode_is_open_for_write(inode) &&
+			S_ISREG(inode->i_mode);
 	}
 
 	if (!vma->anon_vma || vma->vm_ops)

  reply	other threads:[~2021-10-05  2:26 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07  2:49   ` Yu Xu
2021-09-07 18:08   ` Yang Shi
     [not found]     ` <38AF4DC8-5E6F-4568-B2E3-0434BD847BC9@linux.alibaba.com>
2021-09-08 21:48       ` Yang Shi
2021-09-13 14:49   ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56   ` Yang Shi
     [not found]     ` <44BE85B4-692C-41E8-B5A0-C1E0B0272ACD@linux.alibaba.com>
2021-09-08 21:51       ` Yang Shi
2021-09-22  7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37   ` Matthew Wilcox
2021-09-22 17:04     ` Rongwei Wang
2021-09-24  2:43       ` Andrew Morton
2021-09-24  3:08         ` Yang Shi
2021-09-24  3:35         ` Rongwei Wang
2021-09-24  7:12         ` Rongwei Wang
2021-09-27 22:24           ` Song Liu
2021-09-28 12:06             ` Matthew Wilcox
2021-09-28 16:59               ` Song Liu
2021-09-28 16:20             ` Rongwei Wang
2021-09-29  7:14               ` Song Liu
2021-09-29  7:50                 ` Rongwei Wang
2021-09-29 16:59                   ` Song Liu
2021-09-29 17:55                     ` Matthew Wilcox
2021-09-29 23:41                       ` Song Liu
2021-09-30  0:00                         ` Matthew Wilcox
2021-09-30  0:41                           ` Song Liu
2021-09-30  2:14                             ` Rongwei Wang
2021-10-04 17:26                             ` Rongwei Wang
2021-10-04 19:05                               ` Matthew Wilcox
2021-10-05  1:58                                 ` Rongwei Wang
2021-10-04 20:26                               ` Song Liu
2021-10-05  2:58                               ` Hugh Dickins
2021-10-05  3:07                                 ` Matthew Wilcox
2021-10-05  9:03                                 ` Rongwei Wang
2021-09-30  1:54                         ` Rongwei Wang
2021-09-30  3:26                           ` Song Liu
2021-09-30  5:24                             ` Hugh Dickins
2021-09-30 15:28                               ` Matthew Wilcox
2021-09-30 16:49                                 ` Hugh Dickins
2021-09-30 17:39                                   ` Yang Shi
2021-10-02 17:08                                     ` Matthew Wilcox
2021-10-04 18:28                                       ` Yang Shi
2021-10-04 19:31                                         ` Matthew Wilcox
2021-10-05  2:26                                           ` Hugh Dickins [this message]
2021-10-02  2:22                                   ` Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:41     ` Matthew Wilcox
2021-10-06  8:39       ` Rongwei Wang
2021-10-06 17:58     ` Yang Shi
2021-10-11  2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13  7:55     ` Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11  3:08     ` Matthew Wilcox
2021-10-11  3:22       ` Rongwei Wang
2021-10-11  5:08     ` [PATCH v4 RESEND " Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a07564a3-b2fc-9ffe-3ace-3f276075ea5c@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=shy828301@gmail.com \
    --cc=song@kernel.org \
    --cc=sunhao.th@gmail.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).