All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] page table check fixes and cleanups
@ 2022-01-20 19:12 Pasha Tatashin
  2022-01-20 19:12 ` [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table Pasha Tatashin
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-20 19:12 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, rientjes, pjt,
	weixugc, gthelen, mingo, will, rppt, dave.hansen, hpa,
	aneesh.kumar, jirislaby, songmuchun, qydwhotmail, hughd, ziy,
	anshuman.khandual

Changelog:
v2:	- Addressed simplification comments from Wei Xu
	- Added Review-by/Tested-by's from Zi Yan and Wei Xu


Two fixes:

  mm/debug_vm_pgtable: remove pte entry from the page table
	- remove a pte entry from the page table at the end of debug_vm_pgtable pte test

  mm/page_table_check: check entries at pud and pmd levels
	- check pmd and pud levels in page_table_check for regular entries not only for
	  huge pages when entries are replaced or cleared.
	  repro.c: https://gist.github.com/soleen/fdcd501d5df103976245fe84e9535087
	  config: https://gist.github.com/soleen/8a56f923c2fea9ce9c75b4e2517d4162
	  qemu_script: https://gist.github.com/soleen/f4be4795826b7ab1a51ae659582e179c
	  base image:
	  https://storage.googleapis.com/syzkaller/wheezy.img
	  https://storage.googleapis.com/syzkaller/wheezy.img.key

Small cleanup:
  mm/page_table_check: use unsigned long for page counters

Previous versions:
v1: https://lore.kernel.org/all/20220120042513.1648831-1-pasha.tatashin@soleen.com

Pasha Tatashin (3):
  mm/debug_vm_pgtable: remove pte entry from the page table
  mm/page_table_check: check entries at pud and pmd levels
  mm/page_table_check: use unsigned long for page counters

 mm/debug_vm_pgtable.c |  2 ++
 mm/page_table_check.c | 68 +++++++++++++++++++++++++------------------
 2 files changed, 42 insertions(+), 28 deletions(-)

-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table
  2022-01-20 19:12 [PATCH v2 0/3] page table check fixes and cleanups Pasha Tatashin
@ 2022-01-20 19:12 ` Pasha Tatashin
  2022-01-21  3:37   ` Anshuman Khandual
  2022-01-20 19:12 ` [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels Pasha Tatashin
  2022-01-20 19:12 ` [PATCH v2 3/3] mm/page_table_check: use unsigned long for page counters Pasha Tatashin
  2 siblings, 1 reply; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-20 19:12 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, rientjes, pjt,
	weixugc, gthelen, mingo, will, rppt, dave.hansen, hpa,
	aneesh.kumar, jirislaby, songmuchun, qydwhotmail, hughd, ziy,
	anshuman.khandual

The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.

The issue is detected by page_table_check, to repro compile kernel with
the following configs:

CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y

During the boot the following BUG is printed:

[    7.483050][    T1] debug_vm_pgtable: [debug_vm_pgtable         ]:
Validating architecture page tabs
[    7.490930][    T1] ------------[ cut here ]------------
[    7.494926][    T1] kernel BUG at mm/page_table_check.c:194!
[    7.499172][    T1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[    7.503610][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0+
[    7.508600][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX,
...

The entry should be properly removed from the page table before the page
is released to the free list.

Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
---
 mm/debug_vm_pgtable.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index a7ac97c76762..db2abd9e415b 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -171,6 +171,8 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args)
 	ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
 	pte = ptep_get(args->ptep);
 	WARN_ON(pte_young(pte));
+
+	ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
 }
 
 static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels
  2022-01-20 19:12 [PATCH v2 0/3] page table check fixes and cleanups Pasha Tatashin
  2022-01-20 19:12 ` [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table Pasha Tatashin
@ 2022-01-20 19:12 ` Pasha Tatashin
  2022-01-20 19:19   ` Wei Xu
  2022-01-20 19:12 ` [PATCH v2 3/3] mm/page_table_check: use unsigned long for page counters Pasha Tatashin
  2 siblings, 1 reply; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-20 19:12 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, rientjes, pjt,
	weixugc, gthelen, mingo, will, rppt, dave.hansen, hpa,
	aneesh.kumar, jirislaby, songmuchun, qydwhotmail, hughd, ziy,
	anshuman.khandual

syzbot detected a case where the page table counters were not properly
updated.

syzkaller login:  ------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
RIP: 0010:__page_table_check_zero+0x159/0x1a0
Code: 7d 3a b2 ff 45 39 f5 74 2a e8 43 38 b2 ff 4d 85 e4 01
RSP: 0018:ffff888010667418 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000
RDX: ffff88800cea8680 RSI: ffffffff81becaf9 RDI: 0000000003
RBP: ffff888010667450 R08: 0000000000000001 R09: 0000000000
R10: ffffffff81becaab R11: 0000000000000001 R12: ffff888008
R13: 0000000000000001 R14: 0000000000000200 R15: dffffc0000
FS:  0000000000000000(0000) GS:ffff888035e00000(0000) knlG0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd875cad00 CR3: 00000000094ce000 CR4: 0000000000
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000
Call Trace:
 <TASK>
 free_pcp_prepare+0x3be/0xaa0
 free_unref_page+0x1c/0x650
 ? trace_hardirqs_on+0x6a/0x1d0
 free_compound_page+0xec/0x130
 free_transhuge_page+0x1be/0x260
 __put_compound_page+0x90/0xd0
 release_pages+0x54c/0x1060
 ? filemap_remove_folio+0x161/0x210
 ? lock_downgrade+0x720/0x720
 ? __put_page+0x150/0x150
 ? filemap_free_folio+0x164/0x350
 __pagevec_release+0x7c/0x110
 shmem_undo_range+0x85e/0x1250
...

The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page. Later the huge
page was combined again, but the counters were off, as the PTE level
was not properly updated.

Make sure that not only huge page but also small pages are updated when
a new entry is set or cleared.

Fixes: df4e817b7108 ("mm: page table check")

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 mm/page_table_check.c | 60 ++++++++++++++++++++++++++-----------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7504e7caa2a1..877d967742bc 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -145,6 +145,30 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+static void pte_clear_level(struct mm_struct *mm, unsigned long addr,
+			    pte_t *ptep)
+{
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		__page_table_check_pte_clear(mm, addr, *ptep);
+		addr += PAGE_SIZE;
+		ptep++;
+	}
+}
+
+static void pmd_clear_level(struct mm_struct *mm, unsigned long addr,
+			    pmd_t *pmdp)
+{
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		__page_table_check_pmd_clear(mm, addr, *pmdp);
+		addr += PMD_PAGE_SIZE;
+		pmdp++;
+	}
+}
+
 /*
  * page is on free list, or is being allocated, verify that counters are zeroes
  * crash if they are not.
@@ -186,6 +210,11 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
 	if (pmd_user_accessible_page(pmd)) {
 		page_table_check_clear(mm, addr, pmd_pfn(pmd),
 				       PMD_PAGE_SIZE >> PAGE_SHIFT);
+	} else if (!pmd_bad(pmd) && !pmd_leaf(pmd)) {
+		pte_t *ptep = pte_offset_map(&pmd, addr);
+
+		pte_clear_level(mm, addr, ptep);
+		pte_unmap(ptep);
 	}
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
@@ -199,6 +228,10 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
 	if (pud_user_accessible_page(pud)) {
 		page_table_check_clear(mm, addr, pud_pfn(pud),
 				       PUD_PAGE_SIZE >> PAGE_SHIFT);
+	} else if (!pud_bad(pud) && !pud_leaf(pud)) {
+		pmd_t *pmdp = pmd_offset(&pud, addr);
+
+		pmd_clear_level(mm, addr, pmdp);
 	}
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
@@ -206,17 +239,10 @@ EXPORT_SYMBOL(__page_table_check_pud_clear);
 void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
 				pte_t *ptep, pte_t pte)
 {
-	pte_t old_pte;
-
 	if (&init_mm == mm)
 		return;
 
-	old_pte = *ptep;
-	if (pte_user_accessible_page(old_pte)) {
-		page_table_check_clear(mm, addr, pte_pfn(old_pte),
-				       PAGE_SIZE >> PAGE_SHIFT);
-	}
-
+	__page_table_check_pte_clear(mm, addr, *ptep);
 	if (pte_user_accessible_page(pte)) {
 		page_table_check_set(mm, addr, pte_pfn(pte),
 				     PAGE_SIZE >> PAGE_SHIFT,
@@ -228,17 +254,10 @@ EXPORT_SYMBOL(__page_table_check_pte_set);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
 				pmd_t *pmdp, pmd_t pmd)
 {
-	pmd_t old_pmd;
-
 	if (&init_mm == mm)
 		return;
 
-	old_pmd = *pmdp;
-	if (pmd_user_accessible_page(old_pmd)) {
-		page_table_check_clear(mm, addr, pmd_pfn(old_pmd),
-				       PMD_PAGE_SIZE >> PAGE_SHIFT);
-	}
-
+	__page_table_check_pmd_clear(mm, addr, *pmdp);
 	if (pmd_user_accessible_page(pmd)) {
 		page_table_check_set(mm, addr, pmd_pfn(pmd),
 				     PMD_PAGE_SIZE >> PAGE_SHIFT,
@@ -250,17 +269,10 @@ EXPORT_SYMBOL(__page_table_check_pmd_set);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
 				pud_t *pudp, pud_t pud)
 {
-	pud_t old_pud;
-
 	if (&init_mm == mm)
 		return;
 
-	old_pud = *pudp;
-	if (pud_user_accessible_page(old_pud)) {
-		page_table_check_clear(mm, addr, pud_pfn(old_pud),
-				       PUD_PAGE_SIZE >> PAGE_SHIFT);
-	}
-
+	__page_table_check_pud_clear(mm, addr, *pudp);
 	if (pud_user_accessible_page(pud)) {
 		page_table_check_set(mm, addr, pud_pfn(pud),
 				     PUD_PAGE_SIZE >> PAGE_SHIFT,
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] mm/page_table_check: use unsigned long for page counters
  2022-01-20 19:12 [PATCH v2 0/3] page table check fixes and cleanups Pasha Tatashin
  2022-01-20 19:12 ` [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table Pasha Tatashin
  2022-01-20 19:12 ` [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels Pasha Tatashin
@ 2022-01-20 19:12 ` Pasha Tatashin
  2 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-20 19:12 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, akpm, rientjes, pjt,
	weixugc, gthelen, mingo, will, rppt, dave.hansen, hpa,
	aneesh.kumar, jirislaby, songmuchun, qydwhotmail, hughd, ziy,
	anshuman.khandual

For the consistency, use "unsigned long" for all page counters.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Wei Xu <weixugc@google.com>
---
 mm/page_table_check.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 877d967742bc..f1db4de8bed2 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -86,8 +86,8 @@ static void page_table_check_clear(struct mm_struct *mm, unsigned long addr,
 {
 	struct page_ext *page_ext;
 	struct page *page;
+	unsigned long i;
 	bool anon;
-	int i;
 
 	if (!pfn_valid(pfn))
 		return;
@@ -121,8 +121,8 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
 {
 	struct page_ext *page_ext;
 	struct page *page;
+	unsigned long i;
 	bool anon;
-	int i;
 
 	if (!pfn_valid(pfn))
 		return;
@@ -176,10 +176,10 @@ static void pmd_clear_level(struct mm_struct *mm, unsigned long addr,
 void __page_table_check_zero(struct page *page, unsigned int order)
 {
 	struct page_ext *page_ext = lookup_page_ext(page);
-	int i;
+	unsigned long i;
 
 	BUG_ON(!page_ext);
-	for (i = 0; i < (1 << order); i++) {
+	for (i = 0; i < (1ul << order); i++) {
 		struct page_table_check *ptc = get_page_table_check(page_ext);
 
 		BUG_ON(atomic_read(&ptc->anon_map_count));
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels
  2022-01-20 19:12 ` [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels Pasha Tatashin
@ 2022-01-20 19:19   ` Wei Xu
  2022-01-21 19:32     ` Pasha Tatashin
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Xu @ 2022-01-20 19:19 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Linux Kernel Mailing List, Linux MM, Andrew Morton,
	David Rientjes, Paul Turner, Greg Thelen, mingo, will, rppt,
	Dave Hansen, hpa, aneesh.kumar, jirislaby, songmuchun,
	qydwhotmail, Hugh Dickins, Zi Yan, anshuman.khandual

On Thu, Jan 20, 2022 at 11:12 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> syzbot detected a case where the page table counters were not properly
> updated.
>
> syzkaller login:  ------------[ cut here ]------------
> kernel BUG at mm/page_table_check.c:162!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
> RIP: 0010:__page_table_check_zero+0x159/0x1a0
> Code: 7d 3a b2 ff 45 39 f5 74 2a e8 43 38 b2 ff 4d 85 e4 01
> RSP: 0018:ffff888010667418 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000
> RDX: ffff88800cea8680 RSI: ffffffff81becaf9 RDI: 0000000003
> RBP: ffff888010667450 R08: 0000000000000001 R09: 0000000000
> R10: ffffffff81becaab R11: 0000000000000001 R12: ffff888008
> R13: 0000000000000001 R14: 0000000000000200 R15: dffffc0000
> FS:  0000000000000000(0000) GS:ffff888035e00000(0000) knlG0
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffd875cad00 CR3: 00000000094ce000 CR4: 0000000000
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000
> Call Trace:
>  <TASK>
>  free_pcp_prepare+0x3be/0xaa0
>  free_unref_page+0x1c/0x650
>  ? trace_hardirqs_on+0x6a/0x1d0
>  free_compound_page+0xec/0x130
>  free_transhuge_page+0x1be/0x260
>  __put_compound_page+0x90/0xd0
>  release_pages+0x54c/0x1060
>  ? filemap_remove_folio+0x161/0x210
>  ? lock_downgrade+0x720/0x720
>  ? __put_page+0x150/0x150
>  ? filemap_free_folio+0x164/0x350
>  __pagevec_release+0x7c/0x110
>  shmem_undo_range+0x85e/0x1250
> ...
>
> The repro involved having a huge page that is split due to uprobe event
> temporarily replacing one of the pages in the huge page. Later the huge
> page was combined again, but the counters were off, as the PTE level
> was not properly updated.
>
> Make sure that not only huge page but also small pages are updated when
> a new entry is set or cleared.
>
> Fixes: df4e817b7108 ("mm: page table check")
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/page_table_check.c | 60 ++++++++++++++++++++++++++-----------------
>  1 file changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 7504e7caa2a1..877d967742bc 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -145,6 +145,30 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
>         }
>  }
>
> +static void pte_clear_level(struct mm_struct *mm, unsigned long addr,
> +                           pte_t *ptep)
> +{
> +       unsigned long i;
> +
> +       for (i = 0; i < PTRS_PER_PTE; i++) {
> +               __page_table_check_pte_clear(mm, addr, *ptep);
> +               addr += PAGE_SIZE;
> +               ptep++;
> +       }
> +}
> +
> +static void pmd_clear_level(struct mm_struct *mm, unsigned long addr,
> +                           pmd_t *pmdp)
> +{
> +       unsigned long i;
> +
> +       for (i = 0; i < PTRS_PER_PMD; i++) {
> +               __page_table_check_pmd_clear(mm, addr, *pmdp);
> +               addr += PMD_PAGE_SIZE;
> +               pmdp++;
> +       }
> +}
> +
>  /*
>   * page is on free list, or is being allocated, verify that counters are zeroes
>   * crash if they are not.
> @@ -186,6 +210,11 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
>         if (pmd_user_accessible_page(pmd)) {
>                 page_table_check_clear(mm, addr, pmd_pfn(pmd),
>                                        PMD_PAGE_SIZE >> PAGE_SHIFT);
> +       } else if (!pmd_bad(pmd) && !pmd_leaf(pmd)) {
> +               pte_t *ptep = pte_offset_map(&pmd, addr);
> +
> +               pte_clear_level(mm, addr, ptep);
> +               pte_unmap(ptep);
>         }
>  }
>  EXPORT_SYMBOL(__page_table_check_pmd_clear);
> @@ -199,6 +228,10 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>         if (pud_user_accessible_page(pud)) {
>                 page_table_check_clear(mm, addr, pud_pfn(pud),
>                                        PUD_PAGE_SIZE >> PAGE_SHIFT);
> +       } else if (!pud_bad(pud) && !pud_leaf(pud)) {
> +               pmd_t *pmdp = pmd_offset(&pud, addr);
> +
> +               pmd_clear_level(mm, addr, pmdp);
>         }
>  }
>  EXPORT_SYMBOL(__page_table_check_pud_clear);
> @@ -206,17 +239,10 @@ EXPORT_SYMBOL(__page_table_check_pud_clear);
>  void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
>                                 pte_t *ptep, pte_t pte)
>  {
> -       pte_t old_pte;
> -
>         if (&init_mm == mm)
>                 return;
>
> -       old_pte = *ptep;
> -       if (pte_user_accessible_page(old_pte)) {
> -               page_table_check_clear(mm, addr, pte_pfn(old_pte),
> -                                      PAGE_SIZE >> PAGE_SHIFT);
> -       }
> -
> +       __page_table_check_pte_clear(mm, addr, *ptep);
>         if (pte_user_accessible_page(pte)) {
>                 page_table_check_set(mm, addr, pte_pfn(pte),
>                                      PAGE_SIZE >> PAGE_SHIFT,
> @@ -228,17 +254,10 @@ EXPORT_SYMBOL(__page_table_check_pte_set);
>  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
>                                 pmd_t *pmdp, pmd_t pmd)
>  {
> -       pmd_t old_pmd;
> -
>         if (&init_mm == mm)
>                 return;
>
> -       old_pmd = *pmdp;
> -       if (pmd_user_accessible_page(old_pmd)) {
> -               page_table_check_clear(mm, addr, pmd_pfn(old_pmd),
> -                                      PMD_PAGE_SIZE >> PAGE_SHIFT);
> -       }
> -
> +       __page_table_check_pmd_clear(mm, addr, *pmdp);
>         if (pmd_user_accessible_page(pmd)) {
>                 page_table_check_set(mm, addr, pmd_pfn(pmd),
>                                      PMD_PAGE_SIZE >> PAGE_SHIFT,
> @@ -250,17 +269,10 @@ EXPORT_SYMBOL(__page_table_check_pmd_set);
>  void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
>                                 pud_t *pudp, pud_t pud)
>  {
> -       pud_t old_pud;
> -
>         if (&init_mm == mm)
>                 return;
>
> -       old_pud = *pudp;
> -       if (pud_user_accessible_page(old_pud)) {
> -               page_table_check_clear(mm, addr, pud_pfn(old_pud),
> -                                      PUD_PAGE_SIZE >> PAGE_SHIFT);
> -       }
> -
> +       __page_table_check_pud_clear(mm, addr, *pudp);
>         if (pud_user_accessible_page(pud)) {
>                 page_table_check_set(mm, addr, pud_pfn(pud),
>                                      PUD_PAGE_SIZE >> PAGE_SHIFT,
> --
> 2.34.1.703.g22d0c6ccf7-goog
>

Reviewed-by: Wei Xu <weixugc@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table
  2022-01-20 19:12 ` [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table Pasha Tatashin
@ 2022-01-21  3:37   ` Anshuman Khandual
  2022-01-21 14:03     ` Pasha Tatashin
  0 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2022-01-21  3:37 UTC (permalink / raw)
  To: Pasha Tatashin, linux-kernel, linux-mm, akpm, rientjes, pjt,
	weixugc, gthelen, mingo, will, rppt, dave.hansen, hpa,
	aneesh.kumar, jirislaby, songmuchun, qydwhotmail, hughd, ziy



On 1/21/22 12:42 AM, Pasha Tatashin wrote:
> The pte entry that is used in pte_advanced_tests() is never removed from
> the page table at the end of the test.
> 
> The issue is detected by page_table_check, to repro compile kernel with
> the following configs:
> 
> CONFIG_DEBUG_VM_PGTABLE=y
> CONFIG_PAGE_TABLE_CHECK=y
> CONFIG_PAGE_TABLE_CHECK_ENFORCED=y

Assuming this is on latest mainline.

I could enable PAGE_TABLE_CHECK on arm64 after some hacks. It did not build
on the platform otherwise. But enabling DEBUG_VM_PGTABLE afterwards did not
create below mentioned problems. Is the problem x86 specific ?

> 
> During the boot the following BUG is printed:
> 
> [    7.483050][    T1] debug_vm_pgtable: [debug_vm_pgtable         ]:
> Validating architecture page tabs
> [    7.490930][    T1] ------------[ cut here ]------------
> [    7.494926][    T1] kernel BUG at mm/page_table_check.c:194!

Which BUG() is this ? mm/page_table_check.c:194 on latest mainline ..

void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
                                  pud_t pud) <----

> [    7.499172][    T1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [    7.503610][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0+
> [    7.508600][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX,
> ...
> 
> The entry should be properly removed from the page table before the page
> is released to the free list.
> 
> Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
I am not sure whether this really fixes an existing problem.

> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/debug_vm_pgtable.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index a7ac97c76762..db2abd9e415b 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -171,6 +171,8 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args)
>  	ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
>  	pte = ptep_get(args->ptep);
>  	WARN_ON(pte_young(pte));
> +
> +	ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
>  }

Although I dont see any problem on arm64 after this change.

>  
>  static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table
  2022-01-21  3:37   ` Anshuman Khandual
@ 2022-01-21 14:03     ` Pasha Tatashin
  0 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-21 14:03 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, Paul Turner,
	Wei Xu, Greg Thelen, Ingo Molnar, Will Deacon, Mike Rapoport,
	Dave Hansen, H. Peter Anvin, Aneesh Kumar K.V, Jiri Slaby,
	Muchun Song, Fusion Future, Hugh Dickins, Zi Yan

Hi Anshuman,

Thanks for looking into this. See my replies below.

> > CONFIG_DEBUG_VM_PGTABLE=y
> > CONFIG_PAGE_TABLE_CHECK=y
> > CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
>
> Assuming this is on latest mainline.
>
> I could enable PAGE_TABLE_CHECK on arm64 after some hacks. It did not build
> on the platform otherwise. But enabling DEBUG_VM_PGTABLE afterwards did not
> create below mentioned problems. Is the problem x86 specific ?

This is not x86 specific problem, but page_table_check does not have
support for other arches yet. The arm64 support is on my todo list.
The patch for arm64 would look something like this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d283d422c6c4f0264fe8ecf5ae80036bf73f4594

>
> >
> > During the boot the following BUG is printed:
> >
> > [    7.483050][    T1] debug_vm_pgtable: [debug_vm_pgtable         ]:
> > Validating architecture page tabs
> > [    7.490930][    T1] ------------[ cut here ]------------
> > [    7.494926][    T1] kernel BUG at mm/page_table_check.c:194!
>
> Which BUG() is this ? mm/page_table_check.c:194 on latest mainline ..
>
> void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>                                   pud_t pud) <----

It turns out I pasted the backtrace from the modified kernel. Here the
snippet of backtrace from the mainline:
[    2.276826] ------------[ cut here ]------------
[    2.280426] kernel BUG at mm/page_table_check.c:162!
[    2.284118] invalid opcode: 0000 [#1] PREEMPT SMP PTI
...

Which corresponds to:
152 void __page_table_check_zero(struct page *page, unsigned int order)
153 {
154         struct page_ext *page_ext = lookup_page_ext(page);
155         int i;
156
157         BUG_ON(!page_ext);
158         for (i = 0; i < (1 << order); i++) {
159                 struct page_table_check *ptc =
get_page_table_check(page_ext);
160
161                 BUG_ON(atomic_read(&ptc->anon_map_count));
162                 BUG_ON(atomic_read(&ptc->file_map_count));

I will update the bug log with the mainline backtrace.

>
> > [    7.499172][    T1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > [    7.503610][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0+
> > [    7.508600][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > ...
> >
> > The entry should be properly removed from the page table before the page
> > is released to the free list.
> >
> > Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
> I am not sure whether this really fixes an existing problem.

What is detected is that a page that potentially has a PTE entry in a
user page table was put on a free list. It is not an issue for this
test, but would be an issue if it happened elsewhere.

>
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > Tested-by: Zi Yan <ziy@nvidia.com>
> > ---
> >  mm/debug_vm_pgtable.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> > index a7ac97c76762..db2abd9e415b 100644
> > --- a/mm/debug_vm_pgtable.c
> > +++ b/mm/debug_vm_pgtable.c
> > @@ -171,6 +171,8 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args)
> >       ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
> >       pte = ptep_get(args->ptep);
> >       WARN_ON(pte_young(pte));
> > +
> > +     ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
> >  }
>
> Although I dont see any problem on arm64 after this change.

This is because page_table_check does not have support for anything
beside x86 at the moment.

>
> >
> >  static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
> >

Pasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels
  2022-01-20 19:19   ` Wei Xu
@ 2022-01-21 19:32     ` Pasha Tatashin
  0 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2022-01-21 19:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Xu, Linux Kernel Mailing List, Linux MM, David Rientjes,
	Paul Turner, Greg Thelen, Ingo Molnar, Will Deacon,
	Mike Rapoport, Dave Hansen, H. Peter Anvin, Aneesh Kumar K.V,
	Jiri Slaby, Muchun Song, Fusion Future, Hugh Dickins, Zi Yan,
	Anshuman Khandual

On Thu, Jan 20, 2022 at 2:19 PM Wei Xu <weixugc@google.com> wrote:
>
> On Thu, Jan 20, 2022 at 11:12 AM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
> >
> > syzbot detected a case where the page table counters were not properly
> > updated.
> >
> > syzkaller login:  ------------[ cut here ]------------
> > kernel BUG at mm/page_table_check.c:162!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
> > RIP: 0010:__page_table_check_zero+0x159/0x1a0
> > Code: 7d 3a b2 ff 45 39 f5 74 2a e8 43 38 b2 ff 4d 85 e4 01
> > RSP: 0018:ffff888010667418 EFLAGS: 00010293
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000
> > RDX: ffff88800cea8680 RSI: ffffffff81becaf9 RDI: 0000000003
> > RBP: ffff888010667450 R08: 0000000000000001 R09: 0000000000
> > R10: ffffffff81becaab R11: 0000000000000001 R12: ffff888008
> > R13: 0000000000000001 R14: 0000000000000200 R15: dffffc0000
> > FS:  0000000000000000(0000) GS:ffff888035e00000(0000) knlG0
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007ffd875cad00 CR3: 00000000094ce000 CR4: 0000000000
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000
> > Call Trace:
> >  <TASK>
> >  free_pcp_prepare+0x3be/0xaa0
> >  free_unref_page+0x1c/0x650
> >  ? trace_hardirqs_on+0x6a/0x1d0
> >  free_compound_page+0xec/0x130
> >  free_transhuge_page+0x1be/0x260
> >  __put_compound_page+0x90/0xd0
> >  release_pages+0x54c/0x1060
> >  ? filemap_remove_folio+0x161/0x210
> >  ? lock_downgrade+0x720/0x720
> >  ? __put_page+0x150/0x150
> >  ? filemap_free_folio+0x164/0x350
> >  __pagevec_release+0x7c/0x110
> >  shmem_undo_range+0x85e/0x1250
> > ...
> >
> > The repro involved having a huge page that is split due to uprobe event
> > temporarily replacing one of the pages in the huge page. Later the huge
> > page was combined again, but the counters were off, as the PTE level
> > was not properly updated.
> >
> > Make sure that not only huge page but also small pages are updated when
> > a new entry is set or cleared.
> >
> > Fixes: df4e817b7108 ("mm: page table check")
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  mm/page_table_check.c | 60 ++++++++++++++++++++++++++-----------------
> >  1 file changed, 36 insertions(+), 24 deletions(-)
> >
> > diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> > index 7504e7caa2a1..877d967742bc 100644
> > --- a/mm/page_table_check.c
> > +++ b/mm/page_table_check.c
> > @@ -145,6 +145,30 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
> >         }
> >  }
> >
> > +static void pte_clear_level(struct mm_struct *mm, unsigned long addr,
> > +                           pte_t *ptep)
> > +{
> > +       unsigned long i;
> > +
> > +       for (i = 0; i < PTRS_PER_PTE; i++) {
> > +               __page_table_check_pte_clear(mm, addr, *ptep);
> > +               addr += PAGE_SIZE;
> > +               ptep++;
> > +       }
> > +}
> > +
> > +static void pmd_clear_level(struct mm_struct *mm, unsigned long addr,
> > +                           pmd_t *pmdp)
> > +{
> > +       unsigned long i;
> > +
> > +       for (i = 0; i < PTRS_PER_PMD; i++) {
> > +               __page_table_check_pmd_clear(mm, addr, *pmdp);
> > +               addr += PMD_PAGE_SIZE;
> > +               pmdp++;
> > +       }
> > +}
> > +
> >  /*
> >   * page is on free list, or is being allocated, verify that counters are zeroes
> >   * crash if they are not.
> > @@ -186,6 +210,11 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
> >         if (pmd_user_accessible_page(pmd)) {
> >                 page_table_check_clear(mm, addr, pmd_pfn(pmd),
> >                                        PMD_PAGE_SIZE >> PAGE_SHIFT);
> > +       } else if (!pmd_bad(pmd) && !pmd_leaf(pmd)) {
> > +               pte_t *ptep = pte_offset_map(&pmd, addr);
> > +
> > +               pte_clear_level(mm, addr, ptep);
> > +               pte_unmap(ptep);
> >         }
> >  }
> >  EXPORT_SYMBOL(__page_table_check_pmd_clear);
> > @@ -199,6 +228,10 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
> >         if (pud_user_accessible_page(pud)) {
> >                 page_table_check_clear(mm, addr, pud_pfn(pud),
> >                                        PUD_PAGE_SIZE >> PAGE_SHIFT);
> > +       } else if (!pud_bad(pud) && !pud_leaf(pud)) {
> > +               pmd_t *pmdp = pmd_offset(&pud, addr);
> > +
> > +               pmd_clear_level(mm, addr, pmdp);
> >         }
> >  }
> >  EXPORT_SYMBOL(__page_table_check_pud_clear);
> > @@ -206,17 +239,10 @@ EXPORT_SYMBOL(__page_table_check_pud_clear);
> >  void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
> >                                 pte_t *ptep, pte_t pte)
> >  {
> > -       pte_t old_pte;
> > -
> >         if (&init_mm == mm)
> >                 return;
> >
> > -       old_pte = *ptep;
> > -       if (pte_user_accessible_page(old_pte)) {
> > -               page_table_check_clear(mm, addr, pte_pfn(old_pte),
> > -                                      PAGE_SIZE >> PAGE_SHIFT);
> > -       }
> > -
> > +       __page_table_check_pte_clear(mm, addr, *ptep);
> >         if (pte_user_accessible_page(pte)) {
> >                 page_table_check_set(mm, addr, pte_pfn(pte),
> >                                      PAGE_SIZE >> PAGE_SHIFT,
> > @@ -228,17 +254,10 @@ EXPORT_SYMBOL(__page_table_check_pte_set);
> >  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
> >                                 pmd_t *pmdp, pmd_t pmd)
> >  {
> > -       pmd_t old_pmd;
> > -
> >         if (&init_mm == mm)
> >                 return;
> >
> > -       old_pmd = *pmdp;
> > -       if (pmd_user_accessible_page(old_pmd)) {
> > -               page_table_check_clear(mm, addr, pmd_pfn(old_pmd),
> > -                                      PMD_PAGE_SIZE >> PAGE_SHIFT);
> > -       }
> > -
> > +       __page_table_check_pmd_clear(mm, addr, *pmdp);
> >         if (pmd_user_accessible_page(pmd)) {
> >                 page_table_check_set(mm, addr, pmd_pfn(pmd),
> >                                      PMD_PAGE_SIZE >> PAGE_SHIFT,
> > @@ -250,17 +269,10 @@ EXPORT_SYMBOL(__page_table_check_pmd_set);
> >  void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
> >                                 pud_t *pudp, pud_t pud)
> >  {
> > -       pud_t old_pud;
> > -
> >         if (&init_mm == mm)
> >                 return;
> >
> > -       old_pud = *pudp;
> > -       if (pud_user_accessible_page(old_pud)) {
> > -               page_table_check_clear(mm, addr, pud_pfn(old_pud),
> > -                                      PUD_PAGE_SIZE >> PAGE_SHIFT);
> > -       }
> > -
> > +       __page_table_check_pud_clear(mm, addr, *pudp);
> >         if (pud_user_accessible_page(pud)) {
> >                 page_table_check_set(mm, addr, pud_pfn(pud),
> >                                      PUD_PAGE_SIZE >> PAGE_SHIFT,
> > --
> > 2.34.1.703.g22d0c6ccf7-goog
> >
>
> Reviewed-by: Wei Xu <weixugc@google.com>

Andrew, please hold off on taking this patch. During testing, I have
found that the proposed fix does not work with khugepaged: we have
double clearing of PTE: first remove PMD entry and clear PTEs, but
later clear PTEs again in __collapse_huge_page_copy(). I am looking
for a more complete fix.

Thanks,
Pasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-21 19:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 19:12 [PATCH v2 0/3] page table check fixes and cleanups Pasha Tatashin
2022-01-20 19:12 ` [PATCH v2 1/3] mm/debug_vm_pgtable: remove pte entry from the page table Pasha Tatashin
2022-01-21  3:37   ` Anshuman Khandual
2022-01-21 14:03     ` Pasha Tatashin
2022-01-20 19:12 ` [PATCH v2 2/3] mm/page_table_check: check entries at pud and pmd levels Pasha Tatashin
2022-01-20 19:19   ` Wei Xu
2022-01-21 19:32     ` Pasha Tatashin
2022-01-20 19:12 ` [PATCH v2 3/3] mm/page_table_check: use unsigned long for page counters Pasha Tatashin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.