Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Andrea Arcangeli <aarcange@redhat.com>,
	Song Liu <songliubraving@fb.com>,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH] khugepaged: retract_page_tables() remember to test exit
Date: Sun, 2 Aug 2020 12:16:53 -0700 (PDT)
Message-ID: <alpine.LSU.2.11.2008021215400.27773@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.2008021204390.27773@eggly.anvils>

Only once have I seen this scenario (and forgot even to notice what
forced the eventual crash): a sequence of "BUG: Bad page map" alerts
from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a
huge page of ext4 text (but could as well have been shmem): my belief
is that it was racing with collapse_file()'s retract_page_tables(),
found *pmd pointing to a page table, locked it, but *pmd had become
0 by the time start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock;
but exit_mmap() proceeds without mmap lock.  Most of what's run by
khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
do so, for example.  But retract_page_tables() did not: fix that
(using an mm variable instead of vma->vm_mm repeatedly).

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:53:37.892660983 -0700
@@ -1538,6 +1538,7 @@ out:
 static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 {
 	struct vm_area_struct *vma;
+	struct mm_struct *mm;
 	unsigned long addr;
 	pmd_t *pmd, _pmd;
 
@@ -1566,7 +1567,8 @@ static void retract_page_tables(struct a
 			continue;
 		if (vma->vm_end < addr + HPAGE_PMD_SIZE)
 			continue;
-		pmd = mm_find_pmd(vma->vm_mm, addr);
+		mm = vma->vm_mm;
+		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
 			continue;
 		/*
@@ -1576,17 +1578,19 @@ static void retract_page_tables(struct a
 		 * mmap_lock while holding page lock. Fault path does it in
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
-		if (mmap_write_trylock(vma->vm_mm)) {
-			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
-			/* assume page table is clear */
-			_pmd = pmdp_collapse_flush(vma, addr, pmd);
-			spin_unlock(ptl);
-			mmap_write_unlock(vma->vm_mm);
-			mm_dec_nr_ptes(vma->vm_mm);
-			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+		if (mmap_write_trylock(mm)) {
+			if (!khugepaged_test_exit(mm)) {
+				spinlock_t *ptl = pmd_lock(mm, pmd);
+				/* assume page table is clear */
+				_pmd = pmdp_collapse_flush(vma, addr, pmd);
+				spin_unlock(ptl);
+				mm_dec_nr_ptes(mm);
+				pte_free(mm, pmd_pgtable(_pmd));
+			}
+			mmap_write_unlock(mm);
 		} else {
 			/* Try again later */
-			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+			khugepaged_add_pte_mapped_thp(mm, addr);
 		}
 	}
 	i_mmap_unlock_write(mapping);


  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
2020-08-02 21:23   ` Kirill A. Shutemov
2020-08-02 19:16 ` Hugh Dickins [this message]
2020-08-02 21:44   ` [PATCH] khugepaged: retract_page_tables() remember to test exit Kirill A. Shutemov
2020-08-03  0:35     ` Hugh Dickins
2020-08-03  8:59       ` Kirill A. Shutemov
2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2008021215400.27773@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git