linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alistair Popple <apopple@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Yang Shi <shy828301@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Shakeel Butt <shakeelb@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks
Date: Thu, 8 Jul 2021 19:50:26 -0700 (PDT)	[thread overview]
Message-ID: <6c4d46aa-4d73-76a7-bcce-a09024768f63@google.com> (raw)
In-Reply-To: <20210708135811.775drqgwkwc76vcb@box.shutemov.name>

On Thu, 8 Jul 2021, Kirill A. Shutemov wrote:
> On Wed, Jul 07, 2021 at 01:08:53PM -0700, Hugh Dickins wrote:
> > The kernel recovers in due course from missing Mlocked pages: but there
> > was no point in calling page_mlock() (formerly known as try_to_munlock())
> > on a THP, because nothing got done even when it was found to be mapped in
> > another VM_LOCKED vma.
> > 
> > It's true that we need to be careful: Mlocked accounting of pte-mapped
> > THPs is too difficult (so consistently avoided); but Mlocked accounting
> > of only-pmd-mapped THPs is supposed to work, even when multiple mappings
> > are mlocked and munlocked or munmapped.  Refine the tests.
> 
> Well, that's true that it should be fine to mlock only-pmd-mapped THPs,
> but the refined check doesn't gurantee that the page is not mapped with
> PTEs. !PageDoubleMap(page) only guarantees that the page in not mapped
> with both PMDs and PTEs at the same time. For anon pages, we clear the
> flag when the last PMD mapping is gone and only PTEs left.
> 
> Do I miss some detail here? Maybe we exclude anon pages here somehow?
> I don't see it.

Yes, you're right, Kirill: thanks a lot for catching that.
PageDoubleMap: certainly not my favourite page flag!

And now that I've seen follow_trans_huge_pmd(), its comments, and its
goto skip_mlock on a PageAnon with compound_mapcount != 1, the right
fix for page_mlock() seems to be to skip over Anon THP altogether.

Here's a v2 of just this patch (others remain good): what do you think?

[PATCH v2 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as try_to_munlock())
on a THP, because nothing got done even when it was found to be mapped in
another VM_LOCKED vma.

It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped file THPs is supposed to work, even when multiple
mappings are mlocked and munlocked or munmapped.  Refine the tests.

Many thanks to Kirill for reminding that PageDoubleMap cannot be relied on
to warn of pte mappings in the Anon THP case; and a scan of subpages does
not seem appropriate here.  Note how follow_trans_huge_pmd() does not even
mark an Anon THP as mlocked when compound_mapcount != 1: multiple mlocking
of Anon THP is avoided, so simply return from page_mlock() in this case.

I said the kernel recovers: but would page reclaim be likely to split THP
before rediscovering that it's VM_LOCKED?  Apparently so.  I have worked
on a fix for that, but it's a different issue, and not something to rush.
Whereas page_mlock_one() could not be reviewed without fixing this first.

Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/rmap.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 746013e282c3..f1d4edf9c696 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1440,20 +1440,20 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		/*
 		 * If the page is mlock()d, we cannot swap it out.
 		 */
-		if (!(flags & TTU_IGNORE_MLOCK)) {
-			if (vma->vm_flags & VM_LOCKED) {
-				/* PTE-mapped THP are never mlocked */
-				if (!PageTransCompound(page)) {
-					/*
-					 * Holding pte lock, we do *not* need
-					 * mmap_lock here
-					 */
-					mlock_vma_page(page);
-				}
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
-			}
+		if (!(flags & TTU_IGNORE_MLOCK) &&
+		    (vma->vm_flags & VM_LOCKED)) {
+			/*
+			 * PTE-mapped THP are never marked as mlocked: so do
+			 * not set it on a DoubleMap THP, nor on an Anon THP
+			 * (which may still be PTE-mapped after DoubleMap was
+			 * cleared).  But stop unmapping even in those cases.
+			 */
+			if (!PageTransCompound(page) || (PageHead(page) &&
+			     !PageDoubleMap(page) && !PageAnon(page)))
+				mlock_vma_page(page);
+			page_vma_mapped_walk_done(&pvmw);
+			ret = false;
+			break;
 		}
 
 		/* Unexpected PMD-mapped THP? */
@@ -1984,9 +1984,13 @@ static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
 		 * munlock_vma_pages_range().
 		 */
 		if (vma->vm_flags & VM_LOCKED) {
-			/* PTE-mapped THP are never mlocked */
-			if (!PageTransCompound(page))
-				mlock_vma_page(page);
+			/*
+			 * PTE-mapped THP are never marked as mlocked; but
+			 * this function is never called on a DoubleMap THP,
+			 * nor on an Anon THP (which may still be PTE-mapped
+			 * after DoubleMap was cleared).
+			 */
+			mlock_vma_page(page);
 			page_vma_mapped_walk_done(&pvmw);
 		}
 
@@ -2020,6 +2024,10 @@ void page_mlock(struct page *page)
 	VM_BUG_ON_PAGE(!PageLocked(page) || PageLRU(page), page);
 	VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page);
 
+	/* Anon THP are only marked as mlocked when singly mapped */
+	if (PageTransCompound(page) && PageAnon(page))
+		return;
+
 	rmap_walk(page, &rwc);
 }
 
-- 
2.26.2


  reply	other threads:[~2021-07-09  2:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07 20:06 [PATCH 1/4] mm/rmap: fix comments left over from recent changes Hugh Dickins
2021-07-07 20:08 ` [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks Hugh Dickins
2021-07-07 20:53   ` Shakeel Butt
2021-07-08 13:58   ` Kirill A. Shutemov
2021-07-09  2:50     ` Hugh Dickins [this message]
2021-07-09 10:56       ` Kirill A. Shutemov
2021-07-07 20:11 ` [PATCH 3/4] mm/rmap: fix new bug: premature return from page_mlock_one() Hugh Dickins
2021-07-07 20:22   ` Shakeel Butt
2021-07-07 23:14   ` Alistair Popple
2021-07-07 20:13 ` [PATCH 4/4] mm/rmap: try_to_migrate() skip zone_device !device_private Hugh Dickins
2021-07-07 20:54   ` Shakeel Butt
2021-07-07 23:25   ` Alistair Popple
2021-07-07 20:22 ` [PATCH 1/4] mm/rmap: fix comments left over from recent changes Shakeel Butt
2021-07-07 21:26 ` Yang Shi
2021-07-07 21:29   ` Yang Shi
2021-07-07 23:34 ` Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c4d46aa-4d73-76a7-bcce-a09024768f63@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcampbell@nvidia.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).