All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range
@ 2020-08-02 19:12 Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.

Found by source inspection while chasing a hugepage locking bug, which
I then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.

Much easier to fix than to think about.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:48:59.890925896 -0700
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s
 
 	/* step 4: collapse pmd */
 	ptl = pmd_lock(vma->vm_mm, pmd);
-	_pmd = pmdp_collapse_flush(vma, addr, pmd);
+	_pmd = pmdp_collapse_flush(vma, haddr, pmd);
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range
@ 2020-08-02 19:12 Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.

Found by source inspection while chasing a hugepage locking bug, which
I then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.

Much easier to fix than to think about.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:48:59.890925896 -0700
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s
 
 	/* step 4: collapse pmd */
 	ptl = pmd_lock(vma->vm_mm, pmd);
-	_pmd = pmdp_collapse_flush(vma, addr, pmd);
+	_pmd = pmdp_collapse_flush(vma, haddr, pmd);
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
@ 2020-08-02 19:15 ` Hugh Dickins
  2020-08-02 21:23   ` Kirill A. Shutemov
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle
the case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.

That's not enough.  One machine has twice crashed under load, with
"BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the
second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl
(serving page_referenced() on a file THP, that had found a page table
at *pmd) discovers that the page table page and its lock have already
been freed by the time it comes to unlock.

Follow the example of retract_page_tables(), but we only need one of
huge page lock or i_mmap_lock_write to secure against this: because it's
the narrower lock, and because it simplifies collapse_pte_mapped_thp()
to know the hpage earlier, choose to rely on huge page lock here.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |   44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:51:02.127688808 -0700
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
 {
 	unsigned long haddr = addr & HPAGE_PMD_MASK;
 	struct vm_area_struct *vma = find_vma(mm, haddr);
-	struct page *hpage = NULL;
+	struct page *hpage;
 	pte_t *start_pte, *pte;
 	pmd_t *pmd, _pmd;
 	spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
 	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
 		return;
 
+	hpage = find_lock_page(vma->vm_file->f_mapping,
+			       linear_page_index(vma, haddr));
+	if (!hpage)
+		return;
+
+	if (!PageHead(hpage))
+		goto drop_hpage;
+
 	pmd = mm_find_pmd(mm, haddr);
 	if (!pmd)
-		return;
+		goto drop_hpage;
 
 	start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
 
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
 
 		page = vm_normal_page(vma, addr, *pte);
 
-		if (!page || !PageCompound(page))
-			goto abort;
-
-		if (!hpage) {
-			hpage = compound_head(page);
-			/*
-			 * The mapping of the THP should not change.
-			 *
-			 * Note that uprobe, debugger, or MAP_PRIVATE may
-			 * change the page table, but the new page will
-			 * not pass PageCompound() check.
-			 */
-			if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
-				goto abort;
-		}
-
 		/*
-		 * Confirm the page maps to the correct subpage.
-		 *
-		 * Note that uprobe, debugger, or MAP_PRIVATE may change
-		 * the page table, but the new page will not pass
-		 * PageCompound() check.
+		 * Note that uprobe, debugger, or MAP_PRIVATE may change the
+		 * page table, but the new page will not be a subpage of hpage.
 		 */
-		if (WARN_ON(hpage + i != page))
+		if (hpage + i != page)
 			goto abort;
 		count++;
 	}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
 	pte_unmap_unlock(start_pte, ptl);
 
 	/* step 3: set proper refcount and mm_counters. */
-	if (hpage) {
+	if (count) {
 		page_ref_sub(hpage, count);
 		add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
 	}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+	unlock_page(hpage);
+	put_page(hpage);
 	return;
 
 abort:
 	pte_unmap_unlock(start_pte, ptl);
+	goto drop_hpage;
 }
 
 static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock
@ 2020-08-02 19:15 ` Hugh Dickins
  2020-08-02 21:23   ` Kirill A. Shutemov
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle
the case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.

That's not enough.  One machine has twice crashed under load, with
"BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the
second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl
(serving page_referenced() on a file THP, that had found a page table
at *pmd) discovers that the page table page and its lock have already
been freed by the time it comes to unlock.

Follow the example of retract_page_tables(), but we only need one of
huge page lock or i_mmap_lock_write to secure against this: because it's
the narrower lock, and because it simplifies collapse_pte_mapped_thp()
to know the hpage earlier, choose to rely on huge page lock here.

Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.4+
---

 mm/khugepaged.c |   44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:51:02.127688808 -0700
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
 {
 	unsigned long haddr = addr & HPAGE_PMD_MASK;
 	struct vm_area_struct *vma = find_vma(mm, haddr);
-	struct page *hpage = NULL;
+	struct page *hpage;
 	pte_t *start_pte, *pte;
 	pmd_t *pmd, _pmd;
 	spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
 	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
 		return;
 
+	hpage = find_lock_page(vma->vm_file->f_mapping,
+			       linear_page_index(vma, haddr));
+	if (!hpage)
+		return;
+
+	if (!PageHead(hpage))
+		goto drop_hpage;
+
 	pmd = mm_find_pmd(mm, haddr);
 	if (!pmd)
-		return;
+		goto drop_hpage;
 
 	start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
 
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
 
 		page = vm_normal_page(vma, addr, *pte);
 
-		if (!page || !PageCompound(page))
-			goto abort;
-
-		if (!hpage) {
-			hpage = compound_head(page);
-			/*
-			 * The mapping of the THP should not change.
-			 *
-			 * Note that uprobe, debugger, or MAP_PRIVATE may
-			 * change the page table, but the new page will
-			 * not pass PageCompound() check.
-			 */
-			if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
-				goto abort;
-		}
-
 		/*
-		 * Confirm the page maps to the correct subpage.
-		 *
-		 * Note that uprobe, debugger, or MAP_PRIVATE may change
-		 * the page table, but the new page will not pass
-		 * PageCompound() check.
+		 * Note that uprobe, debugger, or MAP_PRIVATE may change the
+		 * page table, but the new page will not be a subpage of hpage.
 		 */
-		if (WARN_ON(hpage + i != page))
+		if (hpage + i != page)
 			goto abort;
 		count++;
 	}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
 	pte_unmap_unlock(start_pte, ptl);
 
 	/* step 3: set proper refcount and mm_counters. */
-	if (hpage) {
+	if (count) {
 		page_ref_sub(hpage, count);
 		add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
 	}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
 	spin_unlock(ptl);
 	mm_dec_nr_ptes(mm);
 	pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+	unlock_page(hpage);
+	put_page(hpage);
 	return;
 
 abort:
 	pte_unmap_unlock(start_pte, ptl);
+	goto drop_hpage;
 }
 
 static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
@ 2020-08-02 19:16 ` Hugh Dickins
  2020-08-02 21:44   ` Kirill A. Shutemov
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
  2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov
  3 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Only once have I seen this scenario (and forgot even to notice what
forced the eventual crash): a sequence of "BUG: Bad page map" alerts
from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a
huge page of ext4 text (but could as well have been shmem): my belief
is that it was racing with collapse_file()'s retract_page_tables(),
found *pmd pointing to a page table, locked it, but *pmd had become
0 by the time start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock;
but exit_mmap() proceeds without mmap lock.  Most of what's run by
khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
do so, for example.  But retract_page_tables() did not: fix that
(using an mm variable instead of vma->vm_mm repeatedly).

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:53:37.892660983 -0700
@@ -1538,6 +1538,7 @@ out:
 static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 {
 	struct vm_area_struct *vma;
+	struct mm_struct *mm;
 	unsigned long addr;
 	pmd_t *pmd, _pmd;
 
@@ -1566,7 +1567,8 @@ static void retract_page_tables(struct a
 			continue;
 		if (vma->vm_end < addr + HPAGE_PMD_SIZE)
 			continue;
-		pmd = mm_find_pmd(vma->vm_mm, addr);
+		mm = vma->vm_mm;
+		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
 			continue;
 		/*
@@ -1576,17 +1578,19 @@ static void retract_page_tables(struct a
 		 * mmap_lock while holding page lock. Fault path does it in
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
-		if (mmap_write_trylock(vma->vm_mm)) {
-			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
-			/* assume page table is clear */
-			_pmd = pmdp_collapse_flush(vma, addr, pmd);
-			spin_unlock(ptl);
-			mmap_write_unlock(vma->vm_mm);
-			mm_dec_nr_ptes(vma->vm_mm);
-			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+		if (mmap_write_trylock(mm)) {
+			if (!khugepaged_test_exit(mm)) {
+				spinlock_t *ptl = pmd_lock(mm, pmd);
+				/* assume page table is clear */
+				_pmd = pmdp_collapse_flush(vma, addr, pmd);
+				spin_unlock(ptl);
+				mm_dec_nr_ptes(mm);
+				pte_free(mm, pmd_pgtable(_pmd));
+			}
+			mmap_write_unlock(mm);
 		} else {
 			/* Try again later */
-			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+			khugepaged_add_pte_mapped_thp(mm, addr);
 		}
 	}
 	i_mmap_unlock_write(mapping);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: retract_page_tables() remember to test exit
@ 2020-08-02 19:16 ` Hugh Dickins
  2020-08-02 21:44   ` Kirill A. Shutemov
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Only once have I seen this scenario (and forgot even to notice what
forced the eventual crash): a sequence of "BUG: Bad page map" alerts
from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a
huge page of ext4 text (but could as well have been shmem): my belief
is that it was racing with collapse_file()'s retract_page_tables(),
found *pmd pointing to a page table, locked it, but *pmd had become
0 by the time start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock;
but exit_mmap() proceeds without mmap lock.  Most of what's run by
khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
do so, for example.  But retract_page_tables() did not: fix that
(using an mm variable instead of vma->vm_mm repeatedly).

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:53:37.892660983 -0700
@@ -1538,6 +1538,7 @@ out:
 static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 {
 	struct vm_area_struct *vma;
+	struct mm_struct *mm;
 	unsigned long addr;
 	pmd_t *pmd, _pmd;
 
@@ -1566,7 +1567,8 @@ static void retract_page_tables(struct a
 			continue;
 		if (vma->vm_end < addr + HPAGE_PMD_SIZE)
 			continue;
-		pmd = mm_find_pmd(vma->vm_mm, addr);
+		mm = vma->vm_mm;
+		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
 			continue;
 		/*
@@ -1576,17 +1578,19 @@ static void retract_page_tables(struct a
 		 * mmap_lock while holding page lock. Fault path does it in
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
-		if (mmap_write_trylock(vma->vm_mm)) {
-			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
-			/* assume page table is clear */
-			_pmd = pmdp_collapse_flush(vma, addr, pmd);
-			spin_unlock(ptl);
-			mmap_write_unlock(vma->vm_mm);
-			mm_dec_nr_ptes(vma->vm_mm);
-			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+		if (mmap_write_trylock(mm)) {
+			if (!khugepaged_test_exit(mm)) {
+				spinlock_t *ptl = pmd_lock(mm, pmd);
+				/* assume page table is clear */
+				_pmd = pmdp_collapse_flush(vma, addr, pmd);
+				spin_unlock(ptl);
+				mm_dec_nr_ptes(mm);
+				pte_free(mm, pmd_pgtable(_pmd));
+			}
+			mmap_write_unlock(mm);
 		} else {
 			/* Try again later */
-			khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+			khugepaged_add_pte_mapped_thp(mm, addr);
 		}
 	}
 	i_mmap_unlock_write(mapping);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid()
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
@ 2020-08-02 19:18 ` Hugh Dickins
  2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
  2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov
  3 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself.  collapse_huge_page() is used for anon
THP only, and earned its mmget_still_valid() check because it inserts
a huge pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry.  But core dumping without
mmap lock must have been as open to mistaking a racily cleared pmd
entry for a page table at physical page 0, as exit_mmap() was.  And
we certainly have no interest in mapping as a THP once dumping core.

Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---
Note on stable backporting: the coredump fix was backported as far as
v3.16, but this extension only needed where shmem or file THP supported.

 mm/khugepaged.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:56:11.105617241 -0700
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru
 
 static inline int khugepaged_test_exit(struct mm_struct *mm)
 {
-	return atomic_read(&mm->mm_users) == 0;
+	return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
 }
 
 static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
 	 * handled by the anon_vma lock + PG_lock.
 	 */
 	mmap_write_lock(mm);
-	result = SCAN_ANY_PROCESS;
-	if (!mmget_still_valid(mm))
-		goto out;
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result)
 		goto out;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid()
@ 2020-08-02 19:18 ` Hugh Dickins
  2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-02 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself.  collapse_huge_page() is used for anon
THP only, and earned its mmget_still_valid() check because it inserts
a huge pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry.  But core dumping without
mmap lock must have been as open to mistaking a racily cleared pmd
entry for a page table at physical page 0, as exit_mmap() was.  And
we certainly have no interest in mapping as a THP once dumping core.

Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---
Note on stable backporting: the coredump fix was backported as far as
v3.16, but this extension only needed where shmem or file THP supported.

 mm/khugepaged.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- 5.8-rc7/mm/khugepaged.c	2020-07-26 16:58:02.189038680 -0700
+++ linux/mm/khugepaged.c	2020-08-02 10:56:11.105617241 -0700
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru
 
 static inline int khugepaged_test_exit(struct mm_struct *mm)
 {
-	return atomic_read(&mm->mm_users) == 0;
+	return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
 }
 
 static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
 	 * handled by the anon_vma lock + PG_lock.
 	 */
 	mmap_write_lock(mm);
-	result = SCAN_ANY_PROCESS;
-	if (!mmget_still_valid(mm))
-		goto out;
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result)
 		goto out;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range
  2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
                   ` (2 preceding siblings ...)
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
@ 2020-08-02 21:07 ` Kirill A. Shutemov
  3 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:12:42PM -0700, Hugh Dickins wrote:
> pmdp_collapse_flush() should be given the start address at which the huge
> page is mapped, haddr: it was given addr, which at that point has been
> used as a local variable, incremented to the end address of the extent.
> 
> Found by source inspection while chasing a hugepage locking bug, which
> I then could not explain by this. At first I thought this was very bad;
> then saw that all of the page translations that were not flushed would
> actually still point to the right pages afterwards, so harmless; then
> realized that I know nothing of how different architectures and models
> cache intermediate paging structures, so maybe it matters after all -
> particularly since the page table concerned is immediately freed.
> 
> Much easier to fix than to think about.
> 
> Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v5.4+

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
@ 2020-08-02 21:23   ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:15:24PM -0700, Hugh Dickins wrote:
> When retract_page_tables() removes a page table to make way for a huge
> pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
> pmd lock; but when collapse_pte_mapped_thp() does the same (to handle
> the case when the original mmap_write_trylock had failed), only
> mmap_write_trylock and pmd lock are held.
> 
> That's not enough.  One machine has twice crashed under load, with
> "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the
> second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl
> (serving page_referenced() on a file THP, that had found a page table
> at *pmd) discovers that the page table page and its lock have already
> been freed by the time it comes to unlock.
> 
> Follow the example of retract_page_tables(), but we only need one of
> huge page lock or i_mmap_lock_write to secure against this: because it's
> the narrower lock, and because it simplifies collapse_pte_mapped_thp()
> to know the hpage earlier, choose to rely on huge page lock here.
> 
> Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v5.4+

We could avoid the page cache lookup by locking the page on the first
valid PTE and recheck page->mapping, but this way is cleaner.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
@ 2020-08-02 21:44   ` Kirill A. Shutemov
  2020-08-03  0:35     ` Hugh Dickins
  0 siblings, 1 reply; 18+ messages in thread
From: Kirill A. Shutemov @ 2020-08-02 21:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> Only once have I seen this scenario (and forgot even to notice what
> forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> pmd:00000000, pte values corresponding to data in physical page 0.
> 
> The pte mappings being zapped in this case were supposed to be from a
> huge page of ext4 text (but could as well have been shmem): my belief
> is that it was racing with collapse_file()'s retract_page_tables(),
> found *pmd pointing to a page table, locked it, but *pmd had become
> 0 by the time start_pte was decided.
> 
> In most cases, that possibility is excluded by holding mmap lock;
> but exit_mmap() proceeds without mmap lock.  Most of what's run by
> khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> do so, for example.  But retract_page_tables() did not: fix that
> (using an mm variable instead of vma->vm_mm repeatedly).

Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
i_mmap lock, no? Unlinking a VMA requires it.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-02 21:44   ` Kirill A. Shutemov
@ 2020-08-03  0:35     ` Hugh Dickins
  2020-08-03  8:59       ` Kirill A. Shutemov
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-03  0:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > Only once have I seen this scenario (and forgot even to notice what
> > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > pmd:00000000, pte values corresponding to data in physical page 0.
> > 
> > The pte mappings being zapped in this case were supposed to be from a
> > huge page of ext4 text (but could as well have been shmem): my belief
> > is that it was racing with collapse_file()'s retract_page_tables(),
> > found *pmd pointing to a page table, locked it, but *pmd had become
> > 0 by the time start_pte was decided.
> > 
> > In most cases, that possibility is excluded by holding mmap lock;
> > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > do so, for example.  But retract_page_tables() did not: fix that
> > (using an mm variable instead of vma->vm_mm repeatedly).
> 
> Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> i_mmap lock, no? Unlinking a VMA requires it.

Ah, my wording is misleading, yes.  That comment
"(using an mm variable instead of vma->vm_mm repeatedly)"
was nothing more than a note, that the patch is bigger than it could be,
because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
But it looks as if I'm saying there used to be a need for READ_ONCE() or
something, and by using the mm variable I was fixing the problem.

No, sorry: delete that line now the point is made: the mm variable is
just a patch detail, it's not important.

The fix (as the subject suggested) is for retract_page_tables() to check
khugepaged_test_exit(), after acquiring mmap lock, before doing anything
to the page table.  Getting the mmap lock serializes with __mmput(),
which briefly takes and drops it in __khugepaged_exit(); then the
khugepaged_test_exit() check on mm_users makes sure we don't touch the
page table once exit_mmap() might reach it, since exit_mmap() will be
proceeding without mmap lock, not expecting anyone to be racing with it.

(I devised that protocol for ksmd, then Andrea adopted it for khugepaged:
back then it was important for these daemons to have a hold on the mm,
without an actual reference to mm_users, because that would prevent the
OOM killer from reaching exit_mmap().  Nowadays with the OOM reaper, it's
probably less crucial to avoid mm_users, but I think still worthwhile.)

Thanks a lot for looking at these patches so quickly,
Hugh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
@ 2020-08-03  0:35     ` Hugh Dickins
  2020-08-03  8:59       ` Kirill A. Shutemov
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-03  0:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov,
	Andrea Arcangeli, Song Liu, linux-kernel, linux-mm

On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > Only once have I seen this scenario (and forgot even to notice what
> > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > pmd:00000000, pte values corresponding to data in physical page 0.
> > 
> > The pte mappings being zapped in this case were supposed to be from a
> > huge page of ext4 text (but could as well have been shmem): my belief
> > is that it was racing with collapse_file()'s retract_page_tables(),
> > found *pmd pointing to a page table, locked it, but *pmd had become
> > 0 by the time start_pte was decided.
> > 
> > In most cases, that possibility is excluded by holding mmap lock;
> > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > do so, for example.  But retract_page_tables() did not: fix that
> > (using an mm variable instead of vma->vm_mm repeatedly).
> 
> Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> i_mmap lock, no? Unlinking a VMA requires it.

Ah, my wording is misleading, yes.  That comment
"(using an mm variable instead of vma->vm_mm repeatedly)"
was nothing more than a note, that the patch is bigger than it could be,
because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
But it looks as if I'm saying there used to be a need for READ_ONCE() or
something, and by using the mm variable I was fixing the problem.

No, sorry: delete that line now the point is made: the mm variable is
just a patch detail, it's not important.

The fix (as the subject suggested) is for retract_page_tables() to check
khugepaged_test_exit(), after acquiring mmap lock, before doing anything
to the page table.  Getting the mmap lock serializes with __mmput(),
which briefly takes and drops it in __khugepaged_exit(); then the
khugepaged_test_exit() check on mm_users makes sure we don't touch the
page table once exit_mmap() might reach it, since exit_mmap() will be
proceeding without mmap lock, not expecting anyone to be racing with it.

(I devised that protocol for ksmd, then Andrea adopted it for khugepaged:
back then it was important for these daemons to have a hold on the mm,
without an actual reference to mm_users, because that would prevent the
OOM killer from reaching exit_mmap().  Nowadays with the OOM reaper, it's
probably less crucial to avoid mm_users, but I think still worthwhile.)

Thanks a lot for looking at these patches so quickly,
Hugh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
  2020-08-03  0:35     ` Hugh Dickins
@ 2020-08-03  8:59       ` Kirill A. Shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill A. Shutemov @ 2020-08-03  8:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	linux-kernel, linux-mm

On Sun, Aug 02, 2020 at 05:35:23PM -0700, Hugh Dickins wrote:
> On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> > On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > > Only once have I seen this scenario (and forgot even to notice what
> > > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > > pmd:00000000, pte values corresponding to data in physical page 0.
> > > 
> > > The pte mappings being zapped in this case were supposed to be from a
> > > huge page of ext4 text (but could as well have been shmem): my belief
> > > is that it was racing with collapse_file()'s retract_page_tables(),
> > > found *pmd pointing to a page table, locked it, but *pmd had become
> > > 0 by the time start_pte was decided.
> > > 
> > > In most cases, that possibility is excluded by holding mmap lock;
> > > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > > do so, for example.  But retract_page_tables() did not: fix that
> > > (using an mm variable instead of vma->vm_mm repeatedly).
> > 
> > Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> > i_mmap lock, no? Unlinking a VMA requires it.
> 
> Ah, my wording is misleading, yes.  That comment
> "(using an mm variable instead of vma->vm_mm repeatedly)"
> was nothing more than a note, that the patch is bigger than it could be,
> because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
> But it looks as if I'm saying there used to be a need for READ_ONCE() or
> something, and by using the mm variable I was fixing the problem.
> 
> No, sorry: delete that line now the point is made: the mm variable is
> just a patch detail, it's not important.
> 
> The fix (as the subject suggested) is for retract_page_tables() to check
> khugepaged_test_exit(), after acquiring mmap lock, before doing anything
> to the page table.  Getting the mmap lock serializes with __mmput(),
> which briefly takes and drops it in __khugepaged_exit(); then the
> khugepaged_test_exit() check on mm_users makes sure we don't touch the
> page table once exit_mmap() might reach it, since exit_mmap() will be
> proceeding without mmap lock, not expecting anyone to be racing with it.

Okay, makes sense.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
@ 2020-08-14 22:13   ` Hugh Dickins
  2020-08-17 20:50     ` Yang Shi
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-14 22:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, linux-kernel, linux-mm

syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- v5.9-rc/mm/khugepaged.c	2020-08-12 19:46:50.867196579 -0700
+++ linux/mm/khugepaged.c	2020-08-14 14:24:32.739457309 -0700
@@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
 		return -ENOMEM;
 
 	/* __khugepaged_exit() must not run from under us */
-	VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
+	VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
 	if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
 		free_mm_slot(mm_slot);
 		return 0;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
@ 2020-08-14 22:13   ` Hugh Dickins
  2020-08-17 20:50     ` Yang Shi
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2020-08-14 22:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, linux-kernel, linux-mm

syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.8+
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- v5.9-rc/mm/khugepaged.c	2020-08-12 19:46:50.867196579 -0700
+++ linux/mm/khugepaged.c	2020-08-14 14:24:32.739457309 -0700
@@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
 		return -ENOMEM;
 
 	/* __khugepaged_exit() must not run from under us */
-	VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
+	VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
 	if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
 		free_mm_slot(mm_slot);
 		return 0;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
@ 2020-08-17 20:50     ` Yang Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2020-08-17 20:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, Linux Kernel Mailing List, Linux MM

On Fri, Aug 14, 2020 at 3:13 PM Hugh Dickins <hughd@google.com> wrote:
>
> syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
> __khugepaged_enter(): yes, when one thread is about to dump core, has set
> core_state, and is waiting for others, another might do something calling
> __khugepaged_enter(), which now crashes because I lumped the core_state
> test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
> think it's best to lump them together, so just in this exceptional case,
> check mm->mm_users directly instead of khugepaged_test_exit().
>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v4.8+

Acked-by: Yang Shi <shy828301@gmail.com>

> ---
>
>  mm/khugepaged.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- v5.9-rc/mm/khugepaged.c     2020-08-12 19:46:50.867196579 -0700
> +++ linux/mm/khugepaged.c       2020-08-14 14:24:32.739457309 -0700
> @@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
>                 return -ENOMEM;
>
>         /* __khugepaged_exit() must not run from under us */
> -       VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
> +       VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
>         if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
>                 free_mm_slot(mm_slot);
>                 return 0;
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
@ 2020-08-17 20:50     ` Yang Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2020-08-17 20:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli, Song Liu,
	Mike Kravetz, Eric Dumazet, Linux Kernel Mailing List, Linux MM

On Fri, Aug 14, 2020 at 3:13 PM Hugh Dickins <hughd@google.com> wrote:
>
> syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
> __khugepaged_enter(): yes, when one thread is about to dump core, has set
> core_state, and is waiting for others, another might do something calling
> __khugepaged_enter(), which now crashes because I lumped the core_state
> test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
> think it's best to lump them together, so just in this exceptional case,
> check mm->mm_users directly instead of khugepaged_test_exit().
>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org # v4.8+

Acked-by: Yang Shi <shy828301@gmail.com>

> ---
>
>  mm/khugepaged.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- v5.9-rc/mm/khugepaged.c     2020-08-12 19:46:50.867196579 -0700
> +++ linux/mm/khugepaged.c       2020-08-14 14:24:32.739457309 -0700
> @@ -466,7 +466,7 @@ int __khugepaged_enter(struct mm_struct
>                 return -ENOMEM;
>
>         /* __khugepaged_exit() must not run from under us */
> -       VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
> +       VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm);
>         if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
>                 free_mm_slot(mm_slot);
>                 return 0;
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-08-17 20:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
2020-08-02 21:23   ` Kirill A. Shutemov
2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
2020-08-02 21:44   ` Kirill A. Shutemov
2020-08-03  0:35     ` Hugh Dickins
2020-08-03  8:59       ` Kirill A. Shutemov
2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
2020-08-17 20:50     ` Yang Shi
2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.