linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Zach O'Keefe <zokeefe@google.com>,
	linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Yang Shi <shy828301@gmail.com>
Subject: [PATCH] mm/khugepaged: Fix ->anon_vma race
Date: Wed, 11 Jan 2023 14:33:51 +0100	[thread overview]
Message-ID: <20230111133351.807024-1-jannh@google.com> (raw)

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked. retract_page_tables() bails out if an ->anon_vma is
attached, but does this check before holding the mmap lock (as the comment
above the check explains).

If we racily merge an existing ->anon_vma (shared with a child process)
from a neighboring VMA, subsequent rmap traversals on pages belonging to
the child will be able to see the page tables that we are concurrently
removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that there
really is no concurrent page table access.

Reported-by: Zach O'Keefe <zokeefe@google.com>
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
zokeefe@ pointed out to me that the current code (after my last round of patches)
can hit a lockdep assert by racing, and after staring at it a bit I've
convinced myself that this is a real, preexisting bug.
(I haven't written a reproducer for it though. One way to hit it might be
something along the lines of:

 - set up a process A with a private-file-mapping VMA V1
 - let A fork() to create process B, thereby copying V1 in A to V1' in B
 - let B extend the end of V1'
 - let B put some anon pages into the extended part of V1'
 - let A map a new private-file-mapping VMA V2 directly behind V1, without
   an anon_vma
[race begins here]
  - in A's thread 1: begin retract_page_tables() on V2, run through first
    ->anon_vma check
  - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it
    merges the anon_vma of V1 (which implies V1 and V2 must be mapping the
    same file at compatible offsets)
  - in B: trigger rmap traversal on anon page in V1'

 mm/khugepaged.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5cb401aa2b9d..0bfed37f3a3b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		 * has higher cost too. It would also probably require locking
 		 * the anon_vma.
 		 */
-		if (vma->anon_vma) {
+		if (READ_ONCE(vma->anon_vma)) {
 			result = SCAN_PAGE_ANON;
 			goto next;
 		}
@@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		result = SCAN_PTE_MAPPED_HUGEPAGE;
 		if ((cc->is_khugepaged || is_target) &&
 		    mmap_write_trylock(mm)) {
+			/*
+			 * Re-check whether we have an ->anon_vma, because
+			 * collapse_and_free_pmd() requires that either no
+			 * ->anon_vma exists or the anon_vma is locked.
+			 * We already checked ->anon_vma above, but that check
+			 * is racy because ->anon_vma can be populated under the
+			 * mmap lock in read mode.
+			 */
+			if (vma->anon_vma) {
+				result = SCAN_PAGE_ANON;
+				goto unlock_next;
+			}
 			/*
 			 * When a vma is registered with uffd-wp, we can't
 			 * recycle the pmd pgtable because there can be pte

base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25
-- 
2.39.0.314.g84b9a713c41-goog



             reply	other threads:[~2023-01-11 13:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 13:33 Jann Horn [this message]
2023-01-12  1:06 ` [PATCH] mm/khugepaged: Fix ->anon_vma race Yang Shi
2023-01-13 19:36   ` Jann Horn
2023-01-12  8:56 ` Kirill A. Shutemov
2023-01-12 18:12   ` Yang Shi
2023-01-13  0:10     ` Kirill A. Shutemov
2023-01-13  3:22       ` Yang Shi
2023-01-13 19:28   ` Jann Horn
2023-01-15 19:06     ` Kirill A. Shutemov
2023-01-16 12:06       ` Jann Horn
2023-01-16 12:34         ` Kirill A. Shutemov
2023-01-16 12:54           ` Jann Horn
2023-01-16 13:07           ` David Hildenbrand
2023-01-16 13:47             ` Kirill A. Shutemov
2023-01-23 11:07               ` David Hildenbrand
2023-01-24  0:51                 ` Kirill A. Shutemov
2023-01-24 10:19                   ` David Hildenbrand
2023-01-17 18:57       ` Yang Shi
2023-01-17 19:12 ` Jann Horn
2023-01-17 22:55   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230111133351.807024-1-jannh@google.com \
    --to=jannh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).