All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-06  8:59 ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Hi,

Now we will miss to account the PUD level pagetable and kernel PTE level
pagetable, as well as missing to set the PG_table flags for these pagetable
pages, which will get an inaccurate pagetable accounting, and miss
PageTable() validation in some cases. So this patch set introduces new
helpers to help to account PUD and kernel PTE pagetable pages.

Note there are still some architectures specific pagetable allocation
that need to account the pagetable pages, which need more investigation
and cleanup in future.

Changes from RFC v3:
 - Rebased on 20220706 linux-next.
 - Introduce new pgtable_pud_page_ctor/dtor() and rename the helpers.
 - Change back to use inc_lruvec_page_state()/dec_lruvec_page_state().
 - Update some commit message.
link: https://lore.kernel.org/all/cover.1656586863.git.baolin.wang@linux.alibaba.com/

Changes from RFC v2:
 - Convert to use mod_lruvec_page_state() for non-order-0 case.
 - Rename the helpers.
 - Update some commit messages.
 - Remove unnecessary __GFP_HIGHMEM clear.
link: https://lore.kernel.org/all/cover.1655887440.git.baolin.wang@linux.alibaba.com/

Changes from RFC v1:
 - Update some commit message.
 - Add missing pgtable_clear_and_dec() on X86 arch.
 - Use __free_page() to free pagetable which can avoid duplicated virt_to_page().
link: https://lore.kernel.org/all/cover.1654271618.git.baolin.wang@linux.alibaba.com/

Baolin Wang (3):
  mm: Factor out the pagetable pages account into new helper function
  mm: Add PUD level pagetable account
  mm: Add kernel PTE level pagetable pages account

 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/csky/include/asm/pgalloc.h      |  2 +-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/microblaze/mm/pgtable.c         |  2 +-
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/openrisc/mm/ioremap.c           |  2 +-
 arch/x86/mm/pgtable.c                |  7 +++++--
 include/asm-generic/pgalloc.h        | 26 ++++++++++++++++++++++----
 include/linux/mm.h                   | 34 ++++++++++++++++++++++++++--------
 9 files changed, 78 insertions(+), 24 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-06  8:59 ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: x86, baolin.wang, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, loongarch, openrisc, bp, luto,
	linux-arm-kernel, monstr, tsbogend, linux-mips

Hi,

Now we will miss to account the PUD level pagetable and kernel PTE level
pagetable, as well as missing to set the PG_table flags for these pagetable
pages, which will get an inaccurate pagetable accounting, and miss
PageTable() validation in some cases. So this patch set introduces new
helpers to help to account PUD and kernel PTE pagetable pages.

Note there are still some architectures specific pagetable allocation
that need to account the pagetable pages, which need more investigation
and cleanup in future.

Changes from RFC v3:
 - Rebased on 20220706 linux-next.
 - Introduce new pgtable_pud_page_ctor/dtor() and rename the helpers.
 - Change back to use inc_lruvec_page_state()/dec_lruvec_page_state().
 - Update some commit message.
link: https://lore.kernel.org/all/cover.1656586863.git.baolin.wang@linux.alibaba.com/

Changes from RFC v2:
 - Convert to use mod_lruvec_page_state() for non-order-0 case.
 - Rename the helpers.
 - Update some commit messages.
 - Remove unnecessary __GFP_HIGHMEM clear.
link: https://lore.kernel.org/all/cover.1655887440.git.baolin.wang@linux.alibaba.com/

Changes from RFC v1:
 - Update some commit message.
 - Add missing pgtable_clear_and_dec() on X86 arch.
 - Use __free_page() to free pagetable which can avoid duplicated virt_to_page().
link: https://lore.kernel.org/all/cover.1654271618.git.baolin.wang@linux.alibaba.com/

Baolin Wang (3):
  mm: Factor out the pagetable pages account into new helper function
  mm: Add PUD level pagetable account
  mm: Add kernel PTE level pagetable pages account

 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/csky/include/asm/pgalloc.h      |  2 +-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/microblaze/mm/pgtable.c         |  2 +-
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/openrisc/mm/ioremap.c           |  2 +-
 arch/x86/mm/pgtable.c                |  7 +++++--
 include/asm-generic/pgalloc.h        | 26 ++++++++++++++++++++++----
 include/linux/mm.h                   | 34 ++++++++++++++++++++++++++--------
 9 files changed, 78 insertions(+), 24 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-06  8:59 ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Hi,

Now we will miss to account the PUD level pagetable and kernel PTE level
pagetable, as well as missing to set the PG_table flags for these pagetable
pages, which will get an inaccurate pagetable accounting, and miss
PageTable() validation in some cases. So this patch set introduces new
helpers to help to account PUD and kernel PTE pagetable pages.

Note there are still some architectures specific pagetable allocation
that need to account the pagetable pages, which need more investigation
and cleanup in future.

Changes from RFC v3:
 - Rebased on 20220706 linux-next.
 - Introduce new pgtable_pud_page_ctor/dtor() and rename the helpers.
 - Change back to use inc_lruvec_page_state()/dec_lruvec_page_state().
 - Update some commit message.
link: https://lore.kernel.org/all/cover.1656586863.git.baolin.wang@linux.alibaba.com/

Changes from RFC v2:
 - Convert to use mod_lruvec_page_state() for non-order-0 case.
 - Rename the helpers.
 - Update some commit messages.
 - Remove unnecessary __GFP_HIGHMEM clear.
link: https://lore.kernel.org/all/cover.1655887440.git.baolin.wang@linux.alibaba.com/

Changes from RFC v1:
 - Update some commit message.
 - Add missing pgtable_clear_and_dec() on X86 arch.
 - Use __free_page() to free pagetable which can avoid duplicated virt_to_page().
link: https://lore.kernel.org/all/cover.1654271618.git.baolin.wang@linux.alibaba.com/

Baolin Wang (3):
  mm: Factor out the pagetable pages account into new helper function
  mm: Add PUD level pagetable account
  mm: Add kernel PTE level pagetable pages account

 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/csky/include/asm/pgalloc.h      |  2 +-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/microblaze/mm/pgtable.c         |  2 +-
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/openrisc/mm/ioremap.c           |  2 +-
 arch/x86/mm/pgtable.c                |  7 +++++--
 include/asm-generic/pgalloc.h        | 26 ++++++++++++++++++++++----
 include/linux/mm.h                   | 34 ++++++++++++++++++++++++++--------
 9 files changed, 78 insertions(+), 24 deletions(-)

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-07-06  8:59 ` Baolin Wang
  (?)
@ 2022-07-06  8:59   ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Factor out the pagetable pages account and pagetable setting into new
helper functions to avoid duplicated code. Meanwhile these helper
functions also will be used to account pagetable pages which do not
need split pagetable lock.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mm.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d084ce5..7894bc5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2352,20 +2352,30 @@ static inline void pgtable_init(void)
 	pgtable_cache_init();
 }
 
+static inline void page_set_pgtable(struct page *page)
+{
+	__SetPageTable(page);
+	inc_lruvec_page_state(page, NR_PAGETABLE);
+}
+
+static inline void page_clear_pgtable(struct page *page)
+{
+	__ClearPageTable(page);
+	dec_lruvec_page_state(page, NR_PAGETABLE);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
 	ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
@@ -2451,16 +2461,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
 	if (!pmd_ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
 	pmd_ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/3] mm: Factor out the pagetable pages account into new helper function
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: x86, baolin.wang, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, loongarch, openrisc, bp, luto,
	linux-arm-kernel, monstr, tsbogend, linux-mips

Factor out the pagetable pages account and pagetable setting into new
helper functions to avoid duplicated code. Meanwhile these helper
functions also will be used to account pagetable pages which do not
need split pagetable lock.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mm.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d084ce5..7894bc5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2352,20 +2352,30 @@ static inline void pgtable_init(void)
 	pgtable_cache_init();
 }
 
+static inline void page_set_pgtable(struct page *page)
+{
+	__SetPageTable(page);
+	inc_lruvec_page_state(page, NR_PAGETABLE);
+}
+
+static inline void page_clear_pgtable(struct page *page)
+{
+	__ClearPageTable(page);
+	dec_lruvec_page_state(page, NR_PAGETABLE);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
 	ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
@@ -2451,16 +2461,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
 	if (!pmd_ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
 	pmd_ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/3] mm: Factor out the pagetable pages account into new helper function
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Factor out the pagetable pages account and pagetable setting into new
helper functions to avoid duplicated code. Meanwhile these helper
functions also will be used to account pagetable pages which do not
need split pagetable lock.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mm.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d084ce5..7894bc5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2352,20 +2352,30 @@ static inline void pgtable_init(void)
 	pgtable_cache_init();
 }
 
+static inline void page_set_pgtable(struct page *page)
+{
+	__SetPageTable(page);
+	inc_lruvec_page_state(page, NR_PAGETABLE);
+}
+
+static inline void page_clear_pgtable(struct page *page)
+{
+	__ClearPageTable(page);
+	dec_lruvec_page_state(page, NR_PAGETABLE);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
 	ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
@@ -2451,16 +2461,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
 	if (!pmd_ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	page_set_pgtable(page);
 	return true;
 }
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
 	pmd_ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	page_clear_pgtable(page);
 }
 
 /*
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] mm: Add PUD level pagetable account
  2022-07-06  8:59 ` Baolin Wang
  (?)
@ 2022-07-06  8:59   ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Now the PUD level ptes are always protected by mm->page_table_lock,
which means no split pagetable lock needed. So the generic PUD level
pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
that means we will miss to account PUD level pagetable pages.

So introducing pgtable_pud_page_ctor/dtor(), which are just wrappers
of page_{set,clear}_pgtable() to help to get an accurate PUD pagetable
accounting, when allocating or freeing PUD level pagetable pages.

Moreover this patch will also mark the PUD level pagetable with PG_table
flag, which will help to do sanity validation in unpoison_memory() and
get more accurate pagetable accounting by /proc/kpageflags interface.

Meanwhile converting the architectures with using generic PUD pagatable
allocation to add corresponding pgtable_pud_page_ctor() or
pgtable_pud_page_dtor() to account PUD level pagetable.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/x86/mm/pgtable.c                |  5 ++++-
 include/asm-generic/pgalloc.h        | 12 ++++++++++--
 include/linux/mm.h                   | 10 ++++++++++
 6 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f..6665f33 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pudp));
+	struct page *page = virt_to_page(pudp);
+
+	pgtable_pud_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index 4bfeb3c..8138101 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -89,10 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	pud_t *pud;
+	struct page *page;
+
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 
-	pud = (pud_t *) __get_free_page(GFP_KERNEL);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 	return pud;
 }
 
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 7960357..5da5880 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -89,11 +89,17 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
+	struct page *page;
 	pud_t *pud;
 
-	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+	page = alloc_pages(GFP_KERNEL, PUD_TABLE_ORDER);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+
 	return pud;
 }
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a932d77..ea39670 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
+	pgtable_pud_page_dtor(page);
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
+	paravirt_tlb_remove_table(tlb, page);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 4
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea1..8ce8d7c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
+	struct page *page;
 	gfp_t gfp = GFP_PGTABLE_USER;
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	pgtable_pud_page_ctor(page);
+	return (pud_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
@@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 
 static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	pgtable_pud_page_dtor(page);
+	__free_page(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_FREE
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7894bc5..54ed6f7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2364,6 +2364,16 @@ static inline void page_clear_pgtable(struct page *page)
 	dec_lruvec_page_state(page, NR_PAGETABLE);
 }
 
+static inline void pgtable_pud_page_ctor(struct page *page)
+{
+	page_set_pgtable(page);
+}
+
+static inline void pgtable_pud_page_dtor(struct page *page)
+{
+	page_clear_pgtable(page);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] mm: Add PUD level pagetable account
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: x86, baolin.wang, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, loongarch, openrisc, bp, luto,
	linux-arm-kernel, monstr, tsbogend, linux-mips

Now the PUD level ptes are always protected by mm->page_table_lock,
which means no split pagetable lock needed. So the generic PUD level
pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
that means we will miss to account PUD level pagetable pages.

So introducing pgtable_pud_page_ctor/dtor(), which are just wrappers
of page_{set,clear}_pgtable() to help to get an accurate PUD pagetable
accounting, when allocating or freeing PUD level pagetable pages.

Moreover this patch will also mark the PUD level pagetable with PG_table
flag, which will help to do sanity validation in unpoison_memory() and
get more accurate pagetable accounting by /proc/kpageflags interface.

Meanwhile converting the architectures with using generic PUD pagatable
allocation to add corresponding pgtable_pud_page_ctor() or
pgtable_pud_page_dtor() to account PUD level pagetable.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/x86/mm/pgtable.c                |  5 ++++-
 include/asm-generic/pgalloc.h        | 12 ++++++++++--
 include/linux/mm.h                   | 10 ++++++++++
 6 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f..6665f33 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pudp));
+	struct page *page = virt_to_page(pudp);
+
+	pgtable_pud_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index 4bfeb3c..8138101 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -89,10 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	pud_t *pud;
+	struct page *page;
+
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 
-	pud = (pud_t *) __get_free_page(GFP_KERNEL);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 	return pud;
 }
 
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 7960357..5da5880 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -89,11 +89,17 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
+	struct page *page;
 	pud_t *pud;
 
-	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+	page = alloc_pages(GFP_KERNEL, PUD_TABLE_ORDER);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+
 	return pud;
 }
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a932d77..ea39670 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
+	pgtable_pud_page_dtor(page);
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
+	paravirt_tlb_remove_table(tlb, page);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 4
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea1..8ce8d7c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
+	struct page *page;
 	gfp_t gfp = GFP_PGTABLE_USER;
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	pgtable_pud_page_ctor(page);
+	return (pud_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
@@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 
 static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	pgtable_pud_page_dtor(page);
+	__free_page(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_FREE
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7894bc5..54ed6f7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2364,6 +2364,16 @@ static inline void page_clear_pgtable(struct page *page)
 	dec_lruvec_page_state(page, NR_PAGETABLE);
 }
 
+static inline void pgtable_pud_page_ctor(struct page *page)
+{
+	page_set_pgtable(page);
+}
+
+static inline void pgtable_pud_page_dtor(struct page *page)
+{
+	page_clear_pgtable(page);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] mm: Add PUD level pagetable account
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Now the PUD level ptes are always protected by mm->page_table_lock,
which means no split pagetable lock needed. So the generic PUD level
pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
that means we will miss to account PUD level pagetable pages.

So introducing pgtable_pud_page_ctor/dtor(), which are just wrappers
of page_{set,clear}_pgtable() to help to get an accurate PUD pagetable
accounting, when allocating or freeing PUD level pagetable pages.

Moreover this patch will also mark the PUD level pagetable with PG_table
flag, which will help to do sanity validation in unpoison_memory() and
get more accurate pagetable accounting by /proc/kpageflags interface.

Meanwhile converting the architectures with using generic PUD pagatable
allocation to add corresponding pgtable_pud_page_ctor() or
pgtable_pud_page_dtor() to account PUD level pagetable.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/x86/mm/pgtable.c                |  5 ++++-
 include/asm-generic/pgalloc.h        | 12 ++++++++++--
 include/linux/mm.h                   | 10 ++++++++++
 6 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f..6665f33 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pudp));
+	struct page *page = virt_to_page(pudp);
+
+	pgtable_pud_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index 4bfeb3c..8138101 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -89,10 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	pud_t *pud;
+	struct page *page;
+
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 
-	pud = (pud_t *) __get_free_page(GFP_KERNEL);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 	return pud;
 }
 
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 7960357..5da5880 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -89,11 +89,17 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
+	struct page *page;
 	pud_t *pud;
 
-	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+	page = alloc_pages(GFP_KERNEL, PUD_TABLE_ORDER);
+	if (!page)
+		return NULL;
+
+	pgtable_pud_page_ctor(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+
 	return pud;
 }
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a932d77..ea39670 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
+	pgtable_pud_page_dtor(page);
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
+	paravirt_tlb_remove_table(tlb, page);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 4
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea1..8ce8d7c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
+	struct page *page;
 	gfp_t gfp = GFP_PGTABLE_USER;
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	pgtable_pud_page_ctor(page);
+	return (pud_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
@@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 
 static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	pgtable_pud_page_dtor(page);
+	__free_page(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_FREE
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7894bc5..54ed6f7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2364,6 +2364,16 @@ static inline void page_clear_pgtable(struct page *page)
 	dec_lruvec_page_state(page, NR_PAGETABLE);
 }
 
+static inline void pgtable_pud_page_ctor(struct page *page)
+{
+	page_set_pgtable(page);
+}
+
+static inline void pgtable_pud_page_dtor(struct page *page)
+{
+	page_clear_pgtable(page);
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
  2022-07-06  8:59 ` Baolin Wang
  (?)
@ 2022-07-06  8:59   ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Now the kernel PTE level ptes are always protected by mm->page_table_lock
instead of split pagetable lock, so the kernel PTE level pagetable pages
are not accounted. Especially the vmalloc()/vmap() can consume lots of
kernel pagetable, so to get an accurate pagetable accounting, calling new
helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
PTE level pagetable page.

Meanwhile converting architectures to use corresponding generic PTE pagetable
allocation and freeing functions.

Note this patch only adds accounting to the page tables allocated after boot.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 arch/csky/include/asm/pgalloc.h |  2 +-
 arch/microblaze/mm/pgtable.c    |  2 +-
 arch/openrisc/mm/ioremap.c      |  2 +-
 arch/x86/mm/pgtable.c           |  2 +-
 include/asm-generic/pgalloc.h   | 14 ++++++++++++--
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5d..56f8d25 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265..e96dd1b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index daae13a..3453acc 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -118,7 +118,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ea39670..20f3076 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -858,7 +858,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(NULL, pte);
 
 	return 1;
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 8ce8d7c..cd8420f 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,14 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page;
+	gfp_t gfp = GFP_PGTABLE_KERNEL;
+
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	page_set_pgtable(page);
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +48,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	struct page *page = virt_to_page(pte);
+
+	page_clear_pgtable(page);
+	__free_page(page);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: x86, baolin.wang, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, loongarch, openrisc, bp, luto,
	linux-arm-kernel, monstr, tsbogend, linux-mips

Now the kernel PTE level ptes are always protected by mm->page_table_lock
instead of split pagetable lock, so the kernel PTE level pagetable pages
are not accounted. Especially the vmalloc()/vmap() can consume lots of
kernel pagetable, so to get an accurate pagetable accounting, calling new
helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
PTE level pagetable page.

Meanwhile converting architectures to use corresponding generic PTE pagetable
allocation and freeing functions.

Note this patch only adds accounting to the page tables allocated after boot.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 arch/csky/include/asm/pgalloc.h |  2 +-
 arch/microblaze/mm/pgtable.c    |  2 +-
 arch/openrisc/mm/ioremap.c      |  2 +-
 arch/x86/mm/pgtable.c           |  2 +-
 include/asm-generic/pgalloc.h   | 14 ++++++++++++--
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5d..56f8d25 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265..e96dd1b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index daae13a..3453acc 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -118,7 +118,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ea39670..20f3076 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -858,7 +858,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(NULL, pte);
 
 	return 1;
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 8ce8d7c..cd8420f 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,14 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page;
+	gfp_t gfp = GFP_PGTABLE_KERNEL;
+
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	page_set_pgtable(page);
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +48,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	struct page *page = virt_to_page(pte);
+
+	page_clear_pgtable(page);
+	__free_page(page);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-06  8:59   ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-06  8:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, baolin.wang, x86, linux-arch,
	linux-arm-kernel, loongarch, linux-mips, linux-csky, openrisc,
	linux-mm, linux-kernel

Now the kernel PTE level ptes are always protected by mm->page_table_lock
instead of split pagetable lock, so the kernel PTE level pagetable pages
are not accounted. Especially the vmalloc()/vmap() can consume lots of
kernel pagetable, so to get an accurate pagetable accounting, calling new
helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
PTE level pagetable page.

Meanwhile converting architectures to use corresponding generic PTE pagetable
allocation and freeing functions.

Note this patch only adds accounting to the page tables allocated after boot.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 arch/csky/include/asm/pgalloc.h |  2 +-
 arch/microblaze/mm/pgtable.c    |  2 +-
 arch/openrisc/mm/ioremap.c      |  2 +-
 arch/x86/mm/pgtable.c           |  2 +-
 include/asm-generic/pgalloc.h   | 14 ++++++++++++--
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5d..56f8d25 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265..e96dd1b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index daae13a..3453acc 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -118,7 +118,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ea39670..20f3076 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -858,7 +858,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(NULL, pte);
 
 	return 1;
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 8ce8d7c..cd8420f 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,14 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page;
+	gfp_t gfp = GFP_PGTABLE_KERNEL;
+
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	page_set_pgtable(page);
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +48,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	struct page *page = virt_to_page(pte);
+
+	page_clear_pgtable(page);
+	__free_page(page);
 }
 
 /**
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
  2022-07-06  8:59   ` Baolin Wang
  (?)
@ 2022-07-06 15:45     ` Matthew Wilcox
  -1 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2022-07-06 15:45 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, will, aneesh.kumar, npiggin, peterz, catalin.marinas,
	chenhuacai, kernel, tsbogend, dave.hansen, luto, tglx, mingo, bp,
	hpa, arnd, guoren, monstr, jonas, stefan.kristiansson, shorne,
	x86, linux-arch, linux-arm-kernel, loongarch, linux-mips,
	linux-csky, openrisc, linux-mm, linux-kernel

On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
> Now the kernel PTE level ptes are always protected by mm->page_table_lock
> instead of split pagetable lock, so the kernel PTE level pagetable pages
> are not accounted. Especially the vmalloc()/vmap() can consume lots of
> kernel pagetable, so to get an accurate pagetable accounting, calling new
> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
> PTE level pagetable page.
> 
> Meanwhile converting architectures to use corresponding generic PTE pagetable
> allocation and freeing functions.
> 
> Note this patch only adds accounting to the page tables allocated after boot.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>

What does this Reported-by: even mean?  the kernel test robot told you
that the page tables weren't being accounted?

I don't understand why we want to start accounting kernel page tables.
an we have a *discussion* about that with a sensible thread name instead
of just trying to sneak it in as patch 3/3?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-06 15:45     ` Matthew Wilcox
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2022-07-06 15:45 UTC (permalink / raw)
  To: Baolin Wang
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, hpa, kernel, will, jonas,
	aneesh.kumar, chenhuacai, linux-csky, rppt, mingo, linux-arch,
	arnd, npiggin, openrisc, bp, luto, tglx, linux-arm-kernel,
	monstr, tsbogend, linux-mips, akpm

On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
> Now the kernel PTE level ptes are always protected by mm->page_table_lock
> instead of split pagetable lock, so the kernel PTE level pagetable pages
> are not accounted. Especially the vmalloc()/vmap() can consume lots of
> kernel pagetable, so to get an accurate pagetable accounting, calling new
> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
> PTE level pagetable page.
> 
> Meanwhile converting architectures to use corresponding generic PTE pagetable
> allocation and freeing functions.
> 
> Note this patch only adds accounting to the page tables allocated after boot.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>

What does this Reported-by: even mean?  the kernel test robot told you
that the page tables weren't being accounted?

I don't understand why we want to start accounting kernel page tables.
an we have a *discussion* about that with a sensible thread name instead
of just trying to sneak it in as patch 3/3?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-06 15:45     ` Matthew Wilcox
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2022-07-06 15:45 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, will, aneesh.kumar, npiggin, peterz, catalin.marinas,
	chenhuacai, kernel, tsbogend, dave.hansen, luto, tglx, mingo, bp,
	hpa, arnd, guoren, monstr, jonas, stefan.kristiansson, shorne,
	x86, linux-arch, linux-arm-kernel, loongarch, linux-mips,
	linux-csky, openrisc, linux-mm, linux-kernel

On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
> Now the kernel PTE level ptes are always protected by mm->page_table_lock
> instead of split pagetable lock, so the kernel PTE level pagetable pages
> are not accounted. Especially the vmalloc()/vmap() can consume lots of
> kernel pagetable, so to get an accurate pagetable accounting, calling new
> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
> PTE level pagetable page.
> 
> Meanwhile converting architectures to use corresponding generic PTE pagetable
> allocation and freeing functions.
> 
> Note this patch only adds accounting to the page tables allocated after boot.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>

What does this Reported-by: even mean?  the kernel test robot told you
that the page tables weren't being accounted?

I don't understand why we want to start accounting kernel page tables.
an we have a *discussion* about that with a sensible thread name instead
of just trying to sneak it in as patch 3/3?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
  2022-07-06  8:59 ` Baolin Wang
  (?)
@ 2022-07-06 15:48   ` Dave Hansen
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-06 15:48 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel

On 7/6/22 01:59, Baolin Wang wrote:
> Now we will miss to account the PUD level pagetable and kernel PTE level
> pagetable, as well as missing to set the PG_table flags for these pagetable
> pages, which will get an inaccurate pagetable accounting, and miss
> PageTable() validation in some cases. So this patch set introduces new
> helpers to help to account PUD and kernel PTE pagetable pages.

Could you explain the motivation for this series a bit more?  Is there a
real-world problem that this fixes?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-06 15:48   ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-06 15:48 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, openrisc, bp, luto, linux-arm-kernel,
	monstr, tsbogend, linux-mips

On 7/6/22 01:59, Baolin Wang wrote:
> Now we will miss to account the PUD level pagetable and kernel PTE level
> pagetable, as well as missing to set the PG_table flags for these pagetable
> pages, which will get an inaccurate pagetable accounting, and miss
> PageTable() validation in some cases. So this patch set introduces new
> helpers to help to account PUD and kernel PTE pagetable pages.

Could you explain the motivation for this series a bit more?  Is there a
real-world problem that this fixes?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-06 15:48   ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-06 15:48 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel

On 7/6/22 01:59, Baolin Wang wrote:
> Now we will miss to account the PUD level pagetable and kernel PTE level
> pagetable, as well as missing to set the PG_table flags for these pagetable
> pages, which will get an inaccurate pagetable accounting, and miss
> PageTable() validation in some cases. So this patch set introduces new
> helpers to help to account PUD and kernel PTE pagetable pages.

Could you explain the motivation for this series a bit more?  Is there a
real-world problem that this fixes?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
  2022-07-06 15:48   ` Dave Hansen
  (?)
@ 2022-07-07 11:32     ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:32 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel



On 7/6/2022 11:48 PM, Dave Hansen wrote:
> On 7/6/22 01:59, Baolin Wang wrote:
>> Now we will miss to account the PUD level pagetable and kernel PTE level
>> pagetable, as well as missing to set the PG_table flags for these pagetable
>> pages, which will get an inaccurate pagetable accounting, and miss
>> PageTable() validation in some cases. So this patch set introduces new
>> helpers to help to account PUD and kernel PTE pagetable pages.
> 
> Could you explain the motivation for this series a bit more?  Is there a
> real-world problem that this fixes?

Not fix real problem. The motivation is that making the pagetable 
accounting more accurate, which helps us to analyse the consumption of 
the pagetable pages in some cases, and maybe help to do some empty 
pagetable reclaiming in future.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-07 11:32     ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:32 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, openrisc, bp, luto, linux-arm-kernel,
	monstr, tsbogend, linux-mips



On 7/6/2022 11:48 PM, Dave Hansen wrote:
> On 7/6/22 01:59, Baolin Wang wrote:
>> Now we will miss to account the PUD level pagetable and kernel PTE level
>> pagetable, as well as missing to set the PG_table flags for these pagetable
>> pages, which will get an inaccurate pagetable accounting, and miss
>> PageTable() validation in some cases. So this patch set introduces new
>> helpers to help to account PUD and kernel PTE pagetable pages.
> 
> Could you explain the motivation for this series a bit more?  Is there a
> real-world problem that this fixes?

Not fix real problem. The motivation is that making the pagetable 
accounting more accurate, which helps us to analyse the consumption of 
the pagetable pages in some cases, and maybe help to do some empty 
pagetable reclaiming in future.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-07 11:32     ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:32 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel



On 7/6/2022 11:48 PM, Dave Hansen wrote:
> On 7/6/22 01:59, Baolin Wang wrote:
>> Now we will miss to account the PUD level pagetable and kernel PTE level
>> pagetable, as well as missing to set the PG_table flags for these pagetable
>> pages, which will get an inaccurate pagetable accounting, and miss
>> PageTable() validation in some cases. So this patch set introduces new
>> helpers to help to account PUD and kernel PTE pagetable pages.
> 
> Could you explain the motivation for this series a bit more?  Is there a
> real-world problem that this fixes?

Not fix real problem. The motivation is that making the pagetable 
accounting more accurate, which helps us to analyse the consumption of 
the pagetable pages in some cases, and maybe help to do some empty 
pagetable reclaiming in future.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
  2022-07-06 15:45     ` Matthew Wilcox
  (?)
@ 2022-07-07 11:45       ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, rppt, will, aneesh.kumar, npiggin, peterz, catalin.marinas,
	chenhuacai, kernel, tsbogend, dave.hansen, luto, tglx, mingo, bp,
	hpa, arnd, guoren, monstr, jonas, stefan.kristiansson, shorne,
	x86, linux-arch, linux-arm-kernel, loongarch, linux-mips,
	linux-csky, openrisc, linux-mm, linux-kernel



On 7/6/2022 11:45 PM, Matthew Wilcox wrote:
> On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
>> Now the kernel PTE level ptes are always protected by mm->page_table_lock
>> instead of split pagetable lock, so the kernel PTE level pagetable pages
>> are not accounted. Especially the vmalloc()/vmap() can consume lots of
>> kernel pagetable, so to get an accurate pagetable accounting, calling new
>> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
>> PTE level pagetable page.
>>
>> Meanwhile converting architectures to use corresponding generic PTE pagetable
>> allocation and freeing functions.
>>
>> Note this patch only adds accounting to the page tables allocated after boot.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> What does this Reported-by: even mean?  the kernel test robot told you
> that the page tables weren't being accounted?

I fixed an issue reported by this robot. OK, I can remove the tag.

> I don't understand why we want to start accounting kernel page tables.
> an we have a *discussion* about that with a sensible thread name instead
> of just trying to sneak it in as patch 3/3?

I think I have replied to you in below link [1]. The reason is we should 
keep consistent with PMD or PUD pagetable allocation.

[1] 
https://lore.kernel.org/all/68a5286b-7ff3-2c4e-1ab2-305e7860a2f3@linux.alibaba.com/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-07 11:45       ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, hpa, kernel, will, jonas,
	aneesh.kumar, chenhuacai, linux-csky, rppt, mingo, linux-arch,
	arnd, npiggin, openrisc, bp, luto, tglx, linux-arm-kernel,
	monstr, tsbogend, linux-mips, akpm



On 7/6/2022 11:45 PM, Matthew Wilcox wrote:
> On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
>> Now the kernel PTE level ptes are always protected by mm->page_table_lock
>> instead of split pagetable lock, so the kernel PTE level pagetable pages
>> are not accounted. Especially the vmalloc()/vmap() can consume lots of
>> kernel pagetable, so to get an accurate pagetable accounting, calling new
>> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
>> PTE level pagetable page.
>>
>> Meanwhile converting architectures to use corresponding generic PTE pagetable
>> allocation and freeing functions.
>>
>> Note this patch only adds accounting to the page tables allocated after boot.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> What does this Reported-by: even mean?  the kernel test robot told you
> that the page tables weren't being accounted?

I fixed an issue reported by this robot. OK, I can remove the tag.

> I don't understand why we want to start accounting kernel page tables.
> an we have a *discussion* about that with a sensible thread name instead
> of just trying to sneak it in as patch 3/3?

I think I have replied to you in below link [1]. The reason is we should 
keep consistent with PMD or PUD pagetable allocation.

[1] 
https://lore.kernel.org/all/68a5286b-7ff3-2c4e-1ab2-305e7860a2f3@linux.alibaba.com/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm: Add kernel PTE level pagetable pages account
@ 2022-07-07 11:45       ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-07 11:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, rppt, will, aneesh.kumar, npiggin, peterz, catalin.marinas,
	chenhuacai, kernel, tsbogend, dave.hansen, luto, tglx, mingo, bp,
	hpa, arnd, guoren, monstr, jonas, stefan.kristiansson, shorne,
	x86, linux-arch, linux-arm-kernel, loongarch, linux-mips,
	linux-csky, openrisc, linux-mm, linux-kernel



On 7/6/2022 11:45 PM, Matthew Wilcox wrote:
> On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
>> Now the kernel PTE level ptes are always protected by mm->page_table_lock
>> instead of split pagetable lock, so the kernel PTE level pagetable pages
>> are not accounted. Especially the vmalloc()/vmap() can consume lots of
>> kernel pagetable, so to get an accurate pagetable accounting, calling new
>> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
>> PTE level pagetable page.
>>
>> Meanwhile converting architectures to use corresponding generic PTE pagetable
>> allocation and freeing functions.
>>
>> Note this patch only adds accounting to the page tables allocated after boot.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> What does this Reported-by: even mean?  the kernel test robot told you
> that the page tables weren't being accounted?

I fixed an issue reported by this robot. OK, I can remove the tag.

> I don't understand why we want to start accounting kernel page tables.
> an we have a *discussion* about that with a sensible thread name instead
> of just trying to sneak it in as patch 3/3?

I think I have replied to you in below link [1]. The reason is we should 
keep consistent with PMD or PUD pagetable allocation.

[1] 
https://lore.kernel.org/all/68a5286b-7ff3-2c4e-1ab2-305e7860a2f3@linux.alibaba.com/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
  2022-07-07 11:32     ` Baolin Wang
  (?)
@ 2022-07-07 14:44       ` Dave Hansen
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-07 14:44 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel

On 7/7/22 04:32, Baolin Wang wrote:
> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>> On 7/6/22 01:59, Baolin Wang wrote:
>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>> pagetable, as well as missing to set the PG_table flags for these
>>> pagetable
>>> pages, which will get an inaccurate pagetable accounting, and miss
>>> PageTable() validation in some cases. So this patch set introduces new
>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>
>> Could you explain the motivation for this series a bit more?  Is there a
>> real-world problem that this fixes?
> 
> Not fix real problem. The motivation is that making the pagetable
> accounting more accurate, which helps us to analyse the consumption of
> the pagetable pages in some cases, and maybe help to do some empty
> pagetable reclaiming in future.

This accounting isn't free.  It costs storage (and also parts of
cachelines) in each mm and CPU time to maintain it, plus maintainer
eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
pages unless someone is using gigantic hugetlbfs mappings.

Even with 1G gigantic pages, you would need a quarter of a million
(well, 262144 or 512*512) mappings of one 1G page to consume 1G of
memory on PUD pages.

That just doesn't seem like something anyone is likely to actually do in
practice.  That makes the benefits of the PUD portion of this series
rather unclear in the real world.

As for the kernel page tables, I'm not really aware of them causing any
problems.  We have a pretty good idea how much space they consume from
the DirectMap* entries in meminfo:

	DirectMap4k:     2262720 kB
	DirectMap2M:    40507392 kB
	DirectMap1G:    24117248 kB

as well as our page table debugging infrastructure.  I haven't found
myself dying for more specific info on them.

So, nothing in this series seems like a *BAD* idea, but I'm not sure in
the end it solves more problems than it creates.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-07 14:44       ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-07 14:44 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, openrisc, bp, luto, linux-arm-kernel,
	monstr, tsbogend, linux-mips

On 7/7/22 04:32, Baolin Wang wrote:
> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>> On 7/6/22 01:59, Baolin Wang wrote:
>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>> pagetable, as well as missing to set the PG_table flags for these
>>> pagetable
>>> pages, which will get an inaccurate pagetable accounting, and miss
>>> PageTable() validation in some cases. So this patch set introduces new
>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>
>> Could you explain the motivation for this series a bit more?  Is there a
>> real-world problem that this fixes?
> 
> Not fix real problem. The motivation is that making the pagetable
> accounting more accurate, which helps us to analyse the consumption of
> the pagetable pages in some cases, and maybe help to do some empty
> pagetable reclaiming in future.

This accounting isn't free.  It costs storage (and also parts of
cachelines) in each mm and CPU time to maintain it, plus maintainer
eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
pages unless someone is using gigantic hugetlbfs mappings.

Even with 1G gigantic pages, you would need a quarter of a million
(well, 262144 or 512*512) mappings of one 1G page to consume 1G of
memory on PUD pages.

That just doesn't seem like something anyone is likely to actually do in
practice.  That makes the benefits of the PUD portion of this series
rather unclear in the real world.

As for the kernel page tables, I'm not really aware of them causing any
problems.  We have a pretty good idea how much space they consume from
the DirectMap* entries in meminfo:

	DirectMap4k:     2262720 kB
	DirectMap2M:    40507392 kB
	DirectMap1G:    24117248 kB

as well as our page table debugging infrastructure.  I haven't found
myself dying for more specific info on them.

So, nothing in this series seems like a *BAD* idea, but I'm not sure in
the end it solves more problems than it creates.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-07 14:44       ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-07-07 14:44 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel

On 7/7/22 04:32, Baolin Wang wrote:
> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>> On 7/6/22 01:59, Baolin Wang wrote:
>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>> pagetable, as well as missing to set the PG_table flags for these
>>> pagetable
>>> pages, which will get an inaccurate pagetable accounting, and miss
>>> PageTable() validation in some cases. So this patch set introduces new
>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>
>> Could you explain the motivation for this series a bit more?  Is there a
>> real-world problem that this fixes?
> 
> Not fix real problem. The motivation is that making the pagetable
> accounting more accurate, which helps us to analyse the consumption of
> the pagetable pages in some cases, and maybe help to do some empty
> pagetable reclaiming in future.

This accounting isn't free.  It costs storage (and also parts of
cachelines) in each mm and CPU time to maintain it, plus maintainer
eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
pages unless someone is using gigantic hugetlbfs mappings.

Even with 1G gigantic pages, you would need a quarter of a million
(well, 262144 or 512*512) mappings of one 1G page to consume 1G of
memory on PUD pages.

That just doesn't seem like something anyone is likely to actually do in
practice.  That makes the benefits of the PUD portion of this series
rather unclear in the real world.

As for the kernel page tables, I'm not really aware of them causing any
problems.  We have a pretty good idea how much space they consume from
the DirectMap* entries in meminfo:

	DirectMap4k:     2262720 kB
	DirectMap2M:    40507392 kB
	DirectMap1G:    24117248 kB

as well as our page table debugging infrastructure.  I haven't found
myself dying for more specific info on them.

So, nothing in this series seems like a *BAD* idea, but I'm not sure in
the end it solves more problems than it creates.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
  2022-07-07 14:44       ` Dave Hansen
  (?)
@ 2022-07-10 11:19         ` Baolin Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-10 11:19 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel



On 7/7/2022 10:44 PM, Dave Hansen wrote:
> On 7/7/22 04:32, Baolin Wang wrote:
>> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>>> On 7/6/22 01:59, Baolin Wang wrote:
>>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>>> pagetable, as well as missing to set the PG_table flags for these
>>>> pagetable
>>>> pages, which will get an inaccurate pagetable accounting, and miss
>>>> PageTable() validation in some cases. So this patch set introduces new
>>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>>
>>> Could you explain the motivation for this series a bit more?  Is there a
>>> real-world problem that this fixes?
>>
>> Not fix real problem. The motivation is that making the pagetable
>> accounting more accurate, which helps us to analyse the consumption of
>> the pagetable pages in some cases, and maybe help to do some empty
>> pagetable reclaiming in future.
> 
> This accounting isn't free.  It costs storage (and also parts of
> cachelines) in each mm and CPU time to maintain it, plus maintainer
> eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
> least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
> pages unless someone is using gigantic hugetlbfs mappings.

Yes, agree. However I think the performence influence of this patch is 
small from some testing I did (like mysql, no obvious performance 
influence). Moreover the pagetable accounting gap is about 1% from below 
testing data.

Without this patchset, the pagetable consumption is about 110M with 
mysql testing.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28232      110 
__________________________g__________________      pgtable

With this patchset, and the consumption is about 111M.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28459      111 
__________________________g__________________      pgtable


> Even with 1G gigantic pages, you would need a quarter of a million
> (well, 262144 or 512*512) mappings of one 1G page to consume 1G of
> memory on PUD pages.
> 
> That just doesn't seem like something anyone is likely to actually do in
> practice.  That makes the benefits of the PUD portion of this series
> rather unclear in the real world.
> 
> As for the kernel page tables, I'm not really aware of them causing any
> problems.  We have a pretty good idea how much space they consume from
> the DirectMap* entries in meminfo:
> 
> 	DirectMap4k:     2262720 kB
> 	DirectMap2M:    40507392 kB
> 	DirectMap1G:    24117248 kB

However these statistics are arch-specific information, which only 
available on x86, s390 and powerpc.

> as well as our page table debugging infrastructure.  I haven't found
> myself dying for more specific info on them.
> 
> So, nothing in this series seems like a *BAD* idea, but I'm not sure in
> the end it solves more problems than it creates.

Thanks for your input.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-10 11:19         ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-10 11:19 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: x86, loongarch, peterz, catalin.marinas, dave.hansen,
	linux-kernel, linux-mm, guoren, linux-csky, hpa, kernel, will,
	tglx, jonas, aneesh.kumar, chenhuacai, willy, rppt, mingo,
	linux-arch, arnd, npiggin, openrisc, bp, luto, linux-arm-kernel,
	monstr, tsbogend, linux-mips



On 7/7/2022 10:44 PM, Dave Hansen wrote:
> On 7/7/22 04:32, Baolin Wang wrote:
>> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>>> On 7/6/22 01:59, Baolin Wang wrote:
>>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>>> pagetable, as well as missing to set the PG_table flags for these
>>>> pagetable
>>>> pages, which will get an inaccurate pagetable accounting, and miss
>>>> PageTable() validation in some cases. So this patch set introduces new
>>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>>
>>> Could you explain the motivation for this series a bit more?  Is there a
>>> real-world problem that this fixes?
>>
>> Not fix real problem. The motivation is that making the pagetable
>> accounting more accurate, which helps us to analyse the consumption of
>> the pagetable pages in some cases, and maybe help to do some empty
>> pagetable reclaiming in future.
> 
> This accounting isn't free.  It costs storage (and also parts of
> cachelines) in each mm and CPU time to maintain it, plus maintainer
> eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
> least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
> pages unless someone is using gigantic hugetlbfs mappings.

Yes, agree. However I think the performence influence of this patch is 
small from some testing I did (like mysql, no obvious performance 
influence). Moreover the pagetable accounting gap is about 1% from below 
testing data.

Without this patchset, the pagetable consumption is about 110M with 
mysql testing.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28232      110 
__________________________g__________________      pgtable

With this patchset, and the consumption is about 111M.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28459      111 
__________________________g__________________      pgtable


> Even with 1G gigantic pages, you would need a quarter of a million
> (well, 262144 or 512*512) mappings of one 1G page to consume 1G of
> memory on PUD pages.
> 
> That just doesn't seem like something anyone is likely to actually do in
> practice.  That makes the benefits of the PUD portion of this series
> rather unclear in the real world.
> 
> As for the kernel page tables, I'm not really aware of them causing any
> problems.  We have a pretty good idea how much space they consume from
> the DirectMap* entries in meminfo:
> 
> 	DirectMap4k:     2262720 kB
> 	DirectMap2M:    40507392 kB
> 	DirectMap1G:    24117248 kB

However these statistics are arch-specific information, which only 
available on x86, s390 and powerpc.

> as well as our page table debugging infrastructure.  I haven't found
> myself dying for more specific info on them.
> 
> So, nothing in this series seems like a *BAD* idea, but I'm not sure in
> the end it solves more problems than it creates.

Thanks for your input.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-07-10 11:19         ` Baolin Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Baolin Wang @ 2022-07-10 11:19 UTC (permalink / raw)
  To: Dave Hansen, akpm
  Cc: rppt, willy, will, aneesh.kumar, npiggin, peterz,
	catalin.marinas, chenhuacai, kernel, tsbogend, dave.hansen, luto,
	tglx, mingo, bp, hpa, arnd, guoren, monstr, jonas,
	stefan.kristiansson, shorne, x86, linux-arch, linux-arm-kernel,
	loongarch, linux-mips, linux-csky, openrisc, linux-mm,
	linux-kernel



On 7/7/2022 10:44 PM, Dave Hansen wrote:
> On 7/7/22 04:32, Baolin Wang wrote:
>> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>>> On 7/6/22 01:59, Baolin Wang wrote:
>>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>>> pagetable, as well as missing to set the PG_table flags for these
>>>> pagetable
>>>> pages, which will get an inaccurate pagetable accounting, and miss
>>>> PageTable() validation in some cases. So this patch set introduces new
>>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>>
>>> Could you explain the motivation for this series a bit more?  Is there a
>>> real-world problem that this fixes?
>>
>> Not fix real problem. The motivation is that making the pagetable
>> accounting more accurate, which helps us to analyse the consumption of
>> the pagetable pages in some cases, and maybe help to do some empty
>> pagetable reclaiming in future.
> 
> This accounting isn't free.  It costs storage (and also parts of
> cachelines) in each mm and CPU time to maintain it, plus maintainer
> eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
> least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
> pages unless someone is using gigantic hugetlbfs mappings.

Yes, agree. However I think the performence influence of this patch is 
small from some testing I did (like mysql, no obvious performance 
influence). Moreover the pagetable accounting gap is about 1% from below 
testing data.

Without this patchset, the pagetable consumption is about 110M with 
mysql testing.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28232      110 
__________________________g__________________      pgtable

With this patchset, and the consumption is about 111M.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28459      111 
__________________________g__________________      pgtable


> Even with 1G gigantic pages, you would need a quarter of a million
> (well, 262144 or 512*512) mappings of one 1G page to consume 1G of
> memory on PUD pages.
> 
> That just doesn't seem like something anyone is likely to actually do in
> practice.  That makes the benefits of the PUD portion of this series
> rather unclear in the real world.
> 
> As for the kernel page tables, I'm not really aware of them causing any
> problems.  We have a pretty good idea how much space they consume from
> the DirectMap* entries in meminfo:
> 
> 	DirectMap4k:     2262720 kB
> 	DirectMap2M:    40507392 kB
> 	DirectMap1G:    24117248 kB

However these statistics are arch-specific information, which only 
available on x86, s390 and powerpc.

> as well as our page table debugging infrastructure.  I haven't found
> myself dying for more specific info on them.
> 
> So, nothing in this series seems like a *BAD* idea, but I'm not sure in
> the end it solves more problems than it creates.

Thanks for your input.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-07-10 11:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06  8:59 [PATCH 0/3] Add PUD and kernel PTE level pagetable account Baolin Wang
2022-07-06  8:59 ` Baolin Wang
2022-07-06  8:59 ` Baolin Wang
2022-07-06  8:59 ` [PATCH 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06  8:59 ` [PATCH 2/3] mm: Add PUD level pagetable account Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06  8:59 ` [PATCH 3/3] mm: Add kernel PTE level pagetable pages account Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06  8:59   ` Baolin Wang
2022-07-06 15:45   ` Matthew Wilcox
2022-07-06 15:45     ` Matthew Wilcox
2022-07-06 15:45     ` Matthew Wilcox
2022-07-07 11:45     ` Baolin Wang
2022-07-07 11:45       ` Baolin Wang
2022-07-07 11:45       ` Baolin Wang
2022-07-06 15:48 ` [PATCH 0/3] Add PUD and kernel PTE level pagetable account Dave Hansen
2022-07-06 15:48   ` Dave Hansen
2022-07-06 15:48   ` Dave Hansen
2022-07-07 11:32   ` Baolin Wang
2022-07-07 11:32     ` Baolin Wang
2022-07-07 11:32     ` Baolin Wang
2022-07-07 14:44     ` Dave Hansen
2022-07-07 14:44       ` Dave Hansen
2022-07-07 14:44       ` Dave Hansen
2022-07-10 11:19       ` Baolin Wang
2022-07-10 11:19         ` Baolin Wang
2022-07-10 11:19         ` Baolin Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.