All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com,
	kirill.shutemov@linux.intel.com, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, peterx@redhat.com,
	rcampbell@nvidia.com, shy828301@gmail.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org,
	wangyugui@e16-tech.com, will@kernel.org, willy@infradead.org,
	ziy@nvidia.com
Subject: [patch 09/24] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
Date: Thu, 24 Jun 2021 18:39:26 -0700	[thread overview]
Message-ID: <20210625013926.7ZTZ9B0S5%akpm@linux-foundation.org> (raw)
In-Reply-To: <20210624183838.ac3161ca4a43989665ac8b2f@linux-foundation.org>

From: Hugh Dickins <hughd@google.com>
Subject: mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes

Running certain tests with a DEBUG_VM kernel would crash within hours, on
the total_mapcount BUG() in split_huge_page_to_list(), while trying to
free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which, on
a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

Crash dumps showed two tail pages of a shmem huge page remained mapped by
pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a
long unmapped range; and no page table had yet been allocated for the head
of the huge page to be mapped into.

Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in the
next.

Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before.  Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times less
often: because of the way redundant levels are folded together, but folded
differently in different configurations, it was just too difficult to use
them correctly; and step_forward() is simpler anyway.

Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_vma_mapped.c |   34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

--- a/mm/page_vma_mapped.c~mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes
+++ a/mm/page_vma_mapped.c
@@ -116,6 +116,13 @@ static bool check_pte(struct page_vma_ma
 	return pfn_is_match(pvmw->page, pfn);
 }
 
+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size)
+{
+	pvmw->address = (pvmw->address + size) & ~(size - 1);
+	if (!pvmw->address)
+		pvmw->address = ULONG_MAX;
+}
+
 /**
  * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at
  * @pvmw->address
@@ -183,16 +190,22 @@ bool page_vma_mapped_walk(struct page_vm
 	if (pvmw->pte)
 		goto next_pte;
 restart:
-	{
+	do {
 		pgd = pgd_offset(mm, pvmw->address);
-		if (!pgd_present(*pgd))
-			return false;
+		if (!pgd_present(*pgd)) {
+			step_forward(pvmw, PGDIR_SIZE);
+			continue;
+		}
 		p4d = p4d_offset(pgd, pvmw->address);
-		if (!p4d_present(*p4d))
-			return false;
+		if (!p4d_present(*p4d)) {
+			step_forward(pvmw, P4D_SIZE);
+			continue;
+		}
 		pud = pud_offset(p4d, pvmw->address);
-		if (!pud_present(*pud))
-			return false;
+		if (!pud_present(*pud)) {
+			step_forward(pvmw, PUD_SIZE);
+			continue;
+		}
 
 		pvmw->pmd = pmd_offset(pud, pvmw->address);
 		/*
@@ -239,7 +252,8 @@ restart:
 
 				spin_unlock(ptl);
 			}
-			return false;
+			step_forward(pvmw, PMD_SIZE);
+			continue;
 		}
 		if (!map_pte(pvmw))
 			goto next_pte;
@@ -269,7 +283,9 @@ next_pte:
 			spin_lock(pvmw->ptl);
 		}
 		goto this_pte;
-	}
+	} while (pvmw->address < end);
+
+	return false;
 }
 
 /**
_

  parent reply	other threads:[~2021-06-25  1:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25  1:38 incoming Andrew Morton
2021-06-25  1:39 ` [patch 01/24] mm: page_vma_mapped_walk(): use page for pvmw->page Andrew Morton
2021-06-25  1:39 ` [patch 02/24] mm: page_vma_mapped_walk(): settle PageHuge on entry Andrew Morton
2021-06-25  1:39 ` [patch 03/24] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Andrew Morton
2021-06-25  1:39 ` [patch 04/24] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Andrew Morton
2021-06-25  1:39 ` [patch 05/24] mm: page_vma_mapped_walk(): crossing page table boundary Andrew Morton
2021-06-25  1:39 ` [patch 06/24] mm: page_vma_mapped_walk(): add a level of indentation Andrew Morton
2021-06-25  1:39 ` [patch 07/24] mm: page_vma_mapped_walk(): use goto instead of while (1) Andrew Morton
2021-06-25  1:39 ` [patch 08/24] mm: page_vma_mapped_walk(): get vma_address_end() earlier Andrew Morton
2021-06-25  1:39 ` Andrew Morton [this message]
2021-06-25  1:39 ` [patch 10/24] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Andrew Morton
2021-06-25  1:39 ` [patch 11/24] nilfs2: fix memory leak in nilfs_sysfs_delete_device_group Andrew Morton
2021-06-25  1:39 ` [patch 12/24] mm/vmalloc: add vmalloc_no_huge Andrew Morton
2021-06-25  1:39 ` [patch 13/24] KVM: s390: prepare for hugepage vmalloc Andrew Morton
2021-06-25  1:39 ` [patch 14/24] mm/vmalloc: unbreak kasan vmalloc support Andrew Morton
2021-06-25  1:39 ` [patch 15/24] kthread_worker: split code for canceling the delayed work timer Andrew Morton
2021-06-25  1:39 ` [patch 16/24] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Andrew Morton
2021-06-25  1:39 ` [patch 17/24] mm, futex: fix shared futex pgoff on shmem huge page Andrew Morton
2021-06-25  1:39 ` [patch 18/24] mm/memory-failure: use a mutex to avoid memory_failure() races Andrew Morton
2021-06-25  1:39 ` [patch 19/24] mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned Andrew Morton
2021-06-25  1:40 ` [patch 20/24] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers Andrew Morton
2021-06-25  1:40 ` [patch 21/24] mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array Andrew Morton
2021-06-25  1:40 ` [patch 22/24] mm/page_alloc: do bulk array bounds check after checking populated elements Andrew Morton
2021-06-25  1:40 ` [patch 23/24] MAINTAINERS: fix Marek's identity again Andrew Morton
2021-06-25  1:40 ` [patch 24/24] mailmap: add Marek's other e-mail address and identity without diacritics Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210625013926.7ZTZ9B0S5%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyugui@e16-tech.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.