[PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range
@ 2020-08-02 19:12 Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.

Found by source inspection while chasing a hugepage locking bug, which
I then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.

Much easier to fix than to think about.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:48:59.890925896 -0700
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s

 	/* step 4: collapse pmd */
 	ptl = pmd_lock(vma->vm_mm, pmd);
-	_pmd = pmdp_collapse_flush(vma, addr, pmd);
+	_pmd = pmdp_collapse_flush(vma, haddr, pmd);
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
@ 2020-08-02 19:15 ` Hugh Dickins
  2020-08-02 21:23   ` Kirill A. Shutemov
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle
the case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.

That's not enough.  One machine has twice crashed under load, with
"BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the
second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl
(serving page_referenced() on a file THP, that had found a page table
at *pmd) discovers that the page table page and its lock have already
been freed by the time it comes to unlock.

Follow the example of retract_page_tables(), but we only need one of
huge page lock or i_mmap_lock_write to secure against this: because it's
the narrower lock, and because it simplifies collapse_pte_mapped_thp()
to know the hpage earlier, choose to rely on huge page lock here.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |   44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:51:02.127688808 -0700
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
 {
 	unsigned long haddr = addr & HPAGE_PMD_MASK;
 	struct vm_area_struct *vma = find_vma(mm, haddr);
-	struct page *hpage = NULL;
+	struct page *hpage;
 	pte_t *start_pte, *pte;
 	pmd_t *pmd, _pmd;
 	spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
 	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
 		return;
 
+	hpage = find_lock_page(vma->vm_file->f_mapping,
+			       linear_page_index(vma, haddr));
+	if (!hpage)
+		return;
+
+	if (!PageHead(hpage))
+		goto drop_hpage;
+
 	pmd = mm_find_pmd(mm, haddr);
 	if (!pmd)
-		return;
+		goto drop_hpage;
 
 	start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
 
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
 
 		page = vm_normal_page(vma, addr, *pte);
 
-		if (!page || !PageCompound(page))
-			goto abort;
-
-		if (!hpage) {
-			hpage = compound_head(page);
-			/*
-			 * The mapping of the THP should not change.
-			 *
-			 * Note that uprobe, debugger, or MAP_PRIVATE may
-			 * change the page table, but the new page will
-			 * not pass PageCompound() check.
-			 */
-			if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
-				goto abort;
-		}
-
 		/*
-		 * Confirm the page maps to the correct subpage.
-		 *
-		 * Note that uprobe, debugger, or MAP_PRIVATE may change
-		 * the page table, but the new page will not pass
-		 * PageCompound() check.
+		 * Note that uprobe, debugger, or MAP_PRIVATE may change the
+		 * page table, but the new page will not be a subpage of hpage.
 		 */
-		if (WARN_ON(hpage + i != page))
+		if (hpage + i != page)
 			goto abort;
 		count++;
 	}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
 	pte_unmap_unlock(start_pte, ptl);
 
 	/* step 3: set proper refcount and mm_counters. */
-	if (hpage) {
+	if (count) {
 		page_ref_sub(hpage, count);
 		add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
 	}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+	unlock_page(hpage);
+	put_page(hpage);
 	return;
 
 abort:
 	pte_unmap_unlock(start_pte, ptl);
+	goto drop_hpage;
 }
 
 static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
@ 2020-08-02 19:16 ` Hugh Dickins
  2020-08-02 21:44   ` Kirill A. Shutemov
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
  2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov
  3 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Only once have I seen this scenario (and forgot even to notice what
forced the eventual crash): a sequence of "BUG: Bad page map" alerts
from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a
huge page of ext4 text (but could as well have been shmem): my belief
is that it was racing with collapse_file()'s retract_page_tables(),
found *pmd pointing to a page table, locked it, but *pmd had become
0 by the time start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock;
but exit_mmap() proceeds without mmap lock.  Most of what's run by
khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
do so, for example.  But retract_page_tables() did not: fix that
(using an mm variable instead of vma->vm_mm repeatedly).

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:53:37.892660983 -0700
@@ -1538,6 +1538,7 @@ out:
 static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 {
 	struct vm_area_struct *vma;
+	struct mm_struct *mm;
 	unsigned long addr;
 	pmd_t *pmd, _pmd;

@@ -1566,7 +1567,8 @@ static void retract_page_tables(struct a
 			continue;
 		if (vma->vm_end < addr + HPAGE_PMD_SIZE)
 			continue;
-		pmd = mm_find_pmd(vma->vm_mm, addr);
+		mm = vma->vm_mm;
+		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
 			continue;
 		/*
@@ -1576,17 +1578,19 @@ static void retract_page_tables(struct a
 		 * mmap_lock while holding page lock. Fault path does it in
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
-		if (mmap_write_trylock(vma->vm_mm)) {
-			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
-			/* assume page table is clear */
-			_pmd = pmdp_collapse_flush(vma, addr, pmd);
-			spin_unlock(ptl);
-			mmap_write_unlock(vma->vm_mm);
-			mm_dec_nr_ptes(vma->vm_mm);
-			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+		if (mmap_write_trylock(mm)) {
+			if (!khugepaged_test_exit(mm)) {
+				spinlock_t *ptl = pmd_lock(mm, pmd);
+				/* assume page table is clear */
+				_pmd = pmdp_collapse_flush(vma, addr, pmd);
+				spin_unlock(ptl);
+				mm_dec_nr_ptes(mm);
+				pte_free(mm, pmd_pgtable(_pmd));
+			}
+			mmap_write_unlock(mm);
 		} else {
 			/* Try again later */
-			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+			khugepaged_add_pte_mapped_thp(mm, addr);
 		}
 	}
 	i_mmap_unlock_write(mapping);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid()
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
@ 2020-08-02 19:18 ` Hugh Dickins
  2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
  2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov
  3 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself.  collapse_huge_page() is used for anon
THP only, and earned its mmget_still_valid() check because it inserts
a huge pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry.  But core dumping without
mmap lock must have been as open to mistaking a racily cleared pmd
entry for a page table at physical page 0, as exit_mmap() was.  And
we certainly have no interest in mapping as a THP once dumping core.

Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---
Note on stable backporting: the coredump fix was backported as far as
v3.16, but this extension only needed where shmem or file THP supported.

 mm/khugepaged.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:56:11.105617241 -0700
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru

 static inline int khugepaged_test_exit(struct mm_struct *mm)
 {
-	return atomic_read(&mm->mm_users) == 0;
+	return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
 }

 static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
 	 * handled by the anon_vma lock + PG_lock.
 	 */
 	mmap_write_lock(mm);
-	result = SCAN_ANY_PROCESS;
-	if (!mmget_still_valid(mm))
-		goto out;
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result)
 		goto out;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
                   ` (2 preceding siblings ...)
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
@ 2020-08-02 21:07 ` Kirill A. Shutemov
  3 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:12:42PM -0700, Hugh Dickins wrote:
> pmdp_collapse_flush() should be given the start address at which the huge
> page is mapped, haddr: it was given addr, which at that point has been
> used as a local variable, incremented to the end address of the extent.
> 
> Found by source inspection while chasing a hugepage locking bug, which
> I then could not explain by this. At first I thought this was very bad;
> then saw that all of the page translations that were not flushed would
> actually still point to the right pages afterwards, so harmless; then
> realized that I know nothing of how different architectures and models
> cache intermediate paging structures, so maybe it matters after all -
> particularly since the page table concerned is immediately freed.
> 
> Much easier to fix than to think about.
> 
> Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v5.4+

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
@ 2020-08-02 21:23   ` Kirill A. Shutemov
  0 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:15:24PM -0700, Hugh Dickins wrote:
> When retract_page_tables() removes a page table to make way for a huge
> pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
> pmd lock; but when collapse_pte_mapped_thp() does the same (to handle
> the case when the original mmap_write_trylock had failed), only
> mmap_write_trylock and pmd lock are held.
> 
> That's not enough.  One machine has twice crashed under load, with
> "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the
> second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl
> (serving page_referenced() on a file THP, that had found a page table
> at *pmd) discovers that the page table page and its lock have already
> been freed by the time it comes to unlock.
> 
> Follow the example of retract_page_tables(), but we only need one of
> huge page lock or i_mmap_lock_write to secure against this: because it's
> the narrower lock, and because it simplifies collapse_pte_mapped_thp()
> to know the hpage earlier, choose to rely on huge page lock here.
> 
> Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v5.4+

We could avoid the page cache lookup by locking the page on the first
valid PTE and recheck page->mapping, but this way is cleaner.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
@ 2020-08-02 21:44   ` Kirill A. Shutemov
  2020-08-03  0:35     ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> Only once have I seen this scenario (and forgot even to notice what
> forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> pmd:00000000, pte values corresponding to data in physical page 0.
> 
> The pte mappings being zapped in this case were supposed to be from a
> huge page of ext4 text (but could as well have been shmem): my belief
> is that it was racing with collapse_file()'s retract_page_tables(),
> found *pmd pointing to a page table, locked it, but *pmd had become
> 0 by the time start_pte was decided.
> 
> In most cases, that possibility is excluded by holding mmap lock;
> but exit_mmap() proceeds without mmap lock.  Most of what's run by
> khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> do so, for example.  But retract_page_tables() did not: fix that
> (using an mm variable instead of vma->vm_mm repeatedly).

Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
i_mmap lock, no? Unlinking a VMA requires it.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 21:44   ` Kirill A. Shutemov
@ 2020-08-03  0:35     ` Hugh Dickins
  2020-08-03  8:59       ` Kirill A. Shutemov
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-08-03  0:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > Only once have I seen this scenario (and forgot even to notice what
> > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > pmd:00000000, pte values corresponding to data in physical page 0.
> > 
> > The pte mappings being zapped in this case were supposed to be from a
> > huge page of ext4 text (but could as well have been shmem): my belief
> > is that it was racing with collapse_file()'s retract_page_tables(),
> > found *pmd pointing to a page table, locked it, but *pmd had become
> > 0 by the time start_pte was decided.
> > 
> > In most cases, that possibility is excluded by holding mmap lock;
> > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > do so, for example.  But retract_page_tables() did not: fix that
> > (using an mm variable instead of vma->vm_mm repeatedly).
> 
> Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> i_mmap lock, no? Unlinking a VMA requires it.

Ah, my wording is misleading, yes.  That comment
"(using an mm variable instead of vma->vm_mm repeatedly)"
was nothing more than a note, that the patch is bigger than it could be,
because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
But it looks as if I'm saying there used to be a need for READ_ONCE() or
something, and by using the mm variable I was fixing the problem.

No, sorry: delete that line now the point is made: the mm variable is
just a patch detail, it's not important.

The fix (as the subject suggested) is for retract_page_tables() to check
khugepaged_test_exit(), after acquiring mmap lock, before doing anything
to the page table.  Getting the mmap lock serializes with __mmput(),
which briefly takes and drops it in __khugepaged_exit(); then the
khugepaged_test_exit() check on mm_users makes sure we don't touch the
page table once exit_mmap() might reach it, since exit_mmap() will be
proceeding without mmap lock, not expecting anyone to be racing with it.

(I devised that protocol for ksmd, then Andrea adopted it for khugepaged:
back then it was important for these daemons to have a hold on the mm,
without an actual reference to mm_users, because that would prevent the
OOM killer from reaching exit_mmap().  Nowadays with the OOM reaper, it's
probably less crucial to avoid mm_users, but I think still worthwhile.)

Thanks a lot for looking at these patches so quickly,
Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-03  0:35     ` Hugh Dickins
@ 2020-08-03  8:59       ` Kirill A. Shutemov
  0 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2020-08-03  8:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 05:35:23PM -0700, Hugh Dickins wrote:
> On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> > On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > > Only once have I seen this scenario (and forgot even to notice what
> > > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > > pmd:00000000, pte values corresponding to data in physical page 0.
> > > 
> > > The pte mappings being zapped in this case were supposed to be from a
> > > huge page of ext4 text (but could as well have been shmem): my belief
> > > is that it was racing with collapse_file()'s retract_page_tables(),
> > > found *pmd pointing to a page table, locked it, but *pmd had become
> > > 0 by the time start_pte was decided.
> > > 
> > > In most cases, that possibility is excluded by holding mmap lock;
> > > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > > do so, for example.  But retract_page_tables() did not: fix that
> > > (using an mm variable instead of vma->vm_mm repeatedly).
> > 
> > Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> > i_mmap lock, no? Unlinking a VMA requires it.
> 
> Ah, my wording is misleading, yes.  That comment
> "(using an mm variable instead of vma->vm_mm repeatedly)"
> was nothing more than a note, that the patch is bigger than it could be,
> because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
> But it looks as if I'm saying there used to be a need for READ_ONCE() or
> something, and by using the mm variable I was fixing the problem.
> 
> No, sorry: delete that line now the point is made: the mm variable is
> just a patch detail, it's not important.
> 
> The fix (as the subject suggested) is for retract_page_tables() to check
> khugepaged_test_exit(), after acquiring mmap lock, before doing anything
> to the page table.  Getting the mmap lock serializes with __mmput(),
> which briefly takes and drops it in __khugepaged_exit(); then the
> khugepaged_test_exit() check on mm_users makes sure we don't touch the
> page table once exit_mmap() might reach it, since exit_mmap() will be
> proceeding without mmap lock, not expecting anyone to be racing with it.

Okay, makes sense.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
@ 2020-08-14 22:13   ` Hugh Dickins
  2020-08-17 20:50     ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-08-14 22:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, linux-kernel, linux-mm

syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- v5.9-rc/mm/khugepaged.c	2020-08-12 19:46:50.867196579 -0700
+++ linux/mm/khugepaged.c	2020-08-14 14:24:32.739457309 -0700
@@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
 		return -ENOMEM;
 
 	/* __khugepaged_exit() must not run from under us */
-	VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
+	VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
 	if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
 		free_mm_slot(mm_slot);
 		return 0;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
@ 2020-08-17 20:50     ` Yang Shi
  0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2020-08-17 20:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, Linux Kernel Mailing List, Linux MM

On Fri, Aug 14, 2020 at 3:13 PM Hugh Dickins <hughd@google.com> wrote:
>
> syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
> __khugepaged_enter(): yes, when one thread is about to dump core, has set
> core_state, and is waiting for others, another might do something calling
> __khugepaged_enter(), which now crashes because I lumped the core_state
> test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
> think it's best to lump them together, so just in this exceptional case,
> check mm->mm_users directly instead of khugepaged_test_exit().
>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v4.8+

Acked-by: Yang Shi <shy828301@gmail.com>

> ---
>
>  mm/khugepaged.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- v5.9-rc/mm/khugepaged.c     2020-08-12 19:46:50.867196579 -0700
> +++ linux/mm/khugepaged.c       2020-08-14 14:24:32.739457309 -0700
> @@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
>                 return -ENOMEM;
>
>         /* __khugepaged_exit() must not run from under us */
> -       VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
> +       VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
>         if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
>                 free_mm_slot(mm_slot);
>                 return 0;
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-17 20:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
2020-08-02 21:23   ` Kirill A. Shutemov
2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
2020-08-02 21:44   ` Kirill A. Shutemov
2020-08-03  0:35     ` Hugh Dickins
2020-08-03  8:59       ` Kirill A. Shutemov
2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
2020-08-17 20:50     ` Yang Shi
2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).