All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] Add PUD and kernel PTE level pagetable account
@ 2022-06-30 11:11 Baolin Wang
  2022-06-30 11:11 ` [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Baolin Wang @ 2022-06-30 11:11 UTC (permalink / raw)
  To: akpm; +Cc: rppt, willy, baolin.wang, linux-mm, linux-kernel

Hi,

Now we will miss to account the PUD level pagetable and kernel PTE level
pagetable, as well as missing to set the PG_table flags for these pagetable
pages, which will get an inaccurate pagetable accounting, and miss
PageTable() validation in some cases. So this patch set introduces 2 new
helpers to help to account PUD and kernel PTE pagetable pages.

Note there are still some architectures specific pagetable allocation
that need to account the pagetable pages, which need more investigation
and cleanup in future. Now I only send patches to mm mailist, and if no
objections from mm people, then I will send to the related arch's maillist
to get more review. Thanks.

Changes from RFC v2:
 - Convert to use mod_lruvec_page_state() for non-order-0 case.
 - Rename the helpers.
 - Update some commit messages.
 - Remove unnecessary __GFP_HIGHMEM clear.

Note: I still keep using alloc_pages() to allocate page in this version,
which follows pmd_alloc_one() and saves unnecessary conversion pointed
out by Matthew.

Changes from RFC v1:
 - Update some commit message.
 - Add missing pgtable_clear_and_dec() on X86 arch.
 - Use __free_page() to free pagetable which can avoid duplicated virt_to_page().

Baolin Wang (3):
  mm: Factor out the pagetable pages account into new helper function
  mm: Add PUD level pagetable account
  mm: Add kernel PTE level pagetable pages account

 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/csky/include/asm/pgalloc.h      |  2 +-
 arch/loongarch/include/asm/pgalloc.h | 11 ++++++++---
 arch/microblaze/mm/pgtable.c         |  2 +-
 arch/mips/include/asm/pgalloc.h      | 11 ++++++++---
 arch/openrisc/mm/ioremap.c           |  2 +-
 arch/s390/include/asm/tlb.h          |  1 +
 arch/x86/mm/pgtable.c                | 10 ++++++++--
 include/asm-generic/pgalloc.h        | 26 ++++++++++++++++++++++----
 include/linux/mm.h                   | 24 ++++++++++++++++--------
 10 files changed, 70 insertions(+), 24 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-06-30 11:11 [RFC PATCH v3 0/3] Add PUD and kernel PTE level pagetable account Baolin Wang
@ 2022-06-30 11:11 ` Baolin Wang
  2022-06-30 14:11   ` Mike Rapoport
  2022-06-30 11:11 ` [RFC PATCH v3 2/3] mm: Add PUD level pagetable account Baolin Wang
  2022-06-30 11:11 ` [RFC PATCH v3 3/3] mm: Add kernel PTE level pagetable pages account Baolin Wang
  2 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2022-06-30 11:11 UTC (permalink / raw)
  To: akpm; +Cc: rppt, willy, baolin.wang, linux-mm, linux-kernel

Factor out the pagetable pages account into new helper functions to avoid
duplicated code. Meanwhile these helper functions also will be used to
account pagetable pages which do not need split pagetale lock.

Meanwhile convert to use mod_lruvec_page_state() in case of non-order-0
page table allocation.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mm.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a2270e3..3be6d2c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2353,20 +2353,30 @@ static inline void pgtable_init(void)
 	pgtable_cache_init();
 }
 
+static inline void pgtable_page_inc(struct page *page)
+{
+	__SetPageTable(page);
+	mod_lruvec_page_state(page, NR_PAGETABLE, compound_nr(page));
+}
+
+static inline void pgtable_page_dec(struct page *page)
+{
+	__ClearPageTable(page);
+	mod_lruvec_page_state(page, NR_PAGETABLE, -compound_nr(page));
+}
+
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
 	if (!ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	pgtable_page_inc(page);
 	return true;
 }
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
 	ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	pgtable_page_dec(page);
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
@@ -2452,16 +2462,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
 	if (!pmd_ptlock_init(page))
 		return false;
-	__SetPageTable(page);
-	inc_lruvec_page_state(page, NR_PAGETABLE);
+	pgtable_page_inc(page);
 	return true;
 }
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
 	pmd_ptlock_free(page);
-	__ClearPageTable(page);
-	dec_lruvec_page_state(page, NR_PAGETABLE);
+	pgtable_page_dec(page);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-06-30 11:11 [RFC PATCH v3 0/3] Add PUD and kernel PTE level pagetable account Baolin Wang
  2022-06-30 11:11 ` [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
@ 2022-06-30 11:11 ` Baolin Wang
  2022-06-30 14:17   ` Mike Rapoport
  2022-07-03  3:47   ` Matthew Wilcox
  2022-06-30 11:11 ` [RFC PATCH v3 3/3] mm: Add kernel PTE level pagetable pages account Baolin Wang
  2 siblings, 2 replies; 18+ messages in thread
From: Baolin Wang @ 2022-06-30 11:11 UTC (permalink / raw)
  To: akpm; +Cc: rppt, willy, baolin.wang, linux-mm, linux-kernel

Now the PUD level ptes are always protected by mm->page_table_lock,
which means no split pagetable lock needed. So the generic PUD level
pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
that means we will miss to account PUD level pagetable pages.

Adding pagetable account by calling pgtable_page_inc() or
pgtable_page_dec() when allocating or freeing PUD level pagetable
pages to help to get an accurate pagetable accounting.

Moreover this patch will also mark the PUD level pagetable with PG_table
flag, which will help to do sanity validation in unpoison_memory() and
get more accurate pagetable accounting by /proc/kpageflags interface.

Meanwhile converting the architectures with using generic PUD pagatable
allocation to add corresponding pgtable_page_inc() or pgtable_page_dec()
to account PUD level pagetable.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/loongarch/include/asm/pgalloc.h | 11 ++++++++---
 arch/mips/include/asm/pgalloc.h      | 11 ++++++++---
 arch/s390/include/asm/tlb.h          |  1 +
 arch/x86/mm/pgtable.c                |  5 ++++-
 include/asm-generic/pgalloc.h        | 12 ++++++++++--
 6 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f..1772df9 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pudp));
+	struct page *page = virt_to_page(pudp);
+
+	pgtable_page_dec(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index b0a57b2..19bfe14 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -89,10 +89,15 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	pud_t *pud;
+	struct page *page;
 
-	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
+	if (!page)
+		return NULL;
+
+	pgtable_page_inc(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 	return pud;
 }
 
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 867e9c3..990f614 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -89,11 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
+	struct page *page;
 	pud_t *pud;
 
-	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
-	if (pud)
-		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
+	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
+	if (!page)
+		return NULL;
+
+	pgtable_page_inc(page);
+	pud = (pud_t *)page_address(page);
+	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
 	return pud;
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index fe6407f..744e2d7 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -125,6 +125,7 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 {
 	if (mm_pud_folded(tlb->mm))
 		return;
+	pgtable_page_dec(virt_to_page(pud));
 	tlb->mm->context.flush_mm = 1;
 	tlb->freed_tables = 1;
 	tlb->cleared_p4ds = 1;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a932d77..5e46e31 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
+	pgtable_page_dec(page);
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
+	paravirt_tlb_remove_table(tlb, page);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 4
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea1..11350f7 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
+	struct page *page;
 	gfp_t gfp = GFP_PGTABLE_USER;
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	pgtable_page_inc(page);
+	return (pud_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
@@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 
 static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
 {
+	struct page *page = virt_to_page(pud);
+
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	pgtable_page_dec(page);
+	__free_page(page);
 }
 
 #ifndef __HAVE_ARCH_PUD_FREE
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v3 3/3] mm: Add kernel PTE level pagetable pages account
  2022-06-30 11:11 [RFC PATCH v3 0/3] Add PUD and kernel PTE level pagetable account Baolin Wang
  2022-06-30 11:11 ` [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
  2022-06-30 11:11 ` [RFC PATCH v3 2/3] mm: Add PUD level pagetable account Baolin Wang
@ 2022-06-30 11:11 ` Baolin Wang
  2 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2022-06-30 11:11 UTC (permalink / raw)
  To: akpm; +Cc: rppt, willy, baolin.wang, linux-mm, linux-kernel

Now the kernel PTE level ptes are always protected by mm->page_table_lock
instead of split pagetable lock, so the kernel PTE level pagetable pages
are not accounted. Especially the vmalloc()/vmap() can consume lots of
kernel pagetable, so to get an accurate pagetable accounting, calling new
helpers pgtable_page_inc()/pgtable_page_dec() when allocating or freeing a
kernel PTE level pagetable page.

Meanwhile converting architectures to use corresponding generic PTE pagetable
allocation and freeing functions.

Note this patch only adds accounting to the page tables allocated after boot.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 arch/csky/include/asm/pgalloc.h |  2 +-
 arch/microblaze/mm/pgtable.c    |  2 +-
 arch/openrisc/mm/ioremap.c      |  2 +-
 arch/x86/mm/pgtable.c           |  5 ++++-
 include/asm-generic/pgalloc.h   | 14 ++++++++++++--
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index bbbd069..2443226 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265..e96dd1b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index daae13a..3453acc 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -118,7 +118,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 5e46e31..645868b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -851,6 +851,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
+	struct page *page;
 
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
@@ -858,7 +859,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	page = virt_to_page(pte);
+	pgtable_page_dec(page);
+	__free_page(page);
 
 	return 1;
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 11350f7..e1a6771 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,14 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page;
+	gfp_t gfp = GFP_PGTABLE_KERNEL;
+
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	pgtable_page_inc(page);
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +48,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	struct page *page = virt_to_page(pte);
+
+	pgtable_page_dec(page);
+	__free_page(page);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-06-30 11:11 ` [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
@ 2022-06-30 14:11   ` Mike Rapoport
  2022-07-01  8:00     ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2022-06-30 14:11 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, willy, linux-mm, linux-kernel

On Thu, Jun 30, 2022 at 07:11:14PM +0800, Baolin Wang wrote:
> Factor out the pagetable pages account into new helper functions to avoid
> duplicated code. Meanwhile these helper functions also will be used to
> account pagetable pages which do not need split pagetale lock.
> 
> Meanwhile convert to use mod_lruvec_page_state() in case of non-order-0
> page table allocation.

These are *very* rare. I think only parisc may have non-order-0 pmd and pud
tables.
With that, I'd suggest making use of compound_nr() build time opt-in.
 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  include/linux/mm.h | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a2270e3..3be6d2c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2353,20 +2353,30 @@ static inline void pgtable_init(void)
>  	pgtable_cache_init();
>  }
>  
> +static inline void pgtable_page_inc(struct page *page)
> +{
> +	__SetPageTable(page);
> +	mod_lruvec_page_state(page, NR_PAGETABLE, compound_nr(page));
> +}
> +
> +static inline void pgtable_page_dec(struct page *page)
> +{
> +	__ClearPageTable(page);
> +	mod_lruvec_page_state(page, NR_PAGETABLE, -compound_nr(page));
> +}
> +
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
>  	if (!ptlock_init(page))
>  		return false;
> -	__SetPageTable(page);
> -	inc_lruvec_page_state(page, NR_PAGETABLE);
> +	pgtable_page_inc(page);
>  	return true;
>  }
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
>  	ptlock_free(page);
> -	__ClearPageTable(page);
> -	dec_lruvec_page_state(page, NR_PAGETABLE);
> +	pgtable_page_dec(page);
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
> @@ -2452,16 +2462,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
>  	if (!pmd_ptlock_init(page))
>  		return false;
> -	__SetPageTable(page);
> -	inc_lruvec_page_state(page, NR_PAGETABLE);
> +	pgtable_page_inc(page);
>  	return true;
>  }
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
>  	pmd_ptlock_free(page);
> -	__ClearPageTable(page);
> -	dec_lruvec_page_state(page, NR_PAGETABLE);
> +	pgtable_page_dec(page);
>  }
>  
>  /*
> -- 
> 1.8.3.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-06-30 11:11 ` [RFC PATCH v3 2/3] mm: Add PUD level pagetable account Baolin Wang
@ 2022-06-30 14:17   ` Mike Rapoport
  2022-07-01  8:04     ` Baolin Wang
  2022-07-03  3:47   ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2022-06-30 14:17 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, willy, linux-mm, linux-kernel

On Thu, Jun 30, 2022 at 07:11:15PM +0800, Baolin Wang wrote:
> Now the PUD level ptes are always protected by mm->page_table_lock,
> which means no split pagetable lock needed. So the generic PUD level
> pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
> that means we will miss to account PUD level pagetable pages.
> 
> Adding pagetable account by calling pgtable_page_inc() or
> pgtable_page_dec() when allocating or freeing PUD level pagetable
> pages to help to get an accurate pagetable accounting.
> 
> Moreover this patch will also mark the PUD level pagetable with PG_table
> flag, which will help to do sanity validation in unpoison_memory() and
> get more accurate pagetable accounting by /proc/kpageflags interface.
> 
> Meanwhile converting the architectures with using generic PUD pagatable
> allocation to add corresponding pgtable_page_inc() or pgtable_page_dec()
> to account PUD level pagetable.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  arch/arm64/include/asm/tlb.h         |  5 ++++-
>  arch/loongarch/include/asm/pgalloc.h | 11 ++++++++---
>  arch/mips/include/asm/pgalloc.h      | 11 ++++++++---
>  arch/s390/include/asm/tlb.h          |  1 +
>  arch/x86/mm/pgtable.c                |  5 ++++-
>  include/asm-generic/pgalloc.h        | 12 ++++++++++--
>  6 files changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f..1772df9 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
>  				  unsigned long addr)
>  {
> -	tlb_remove_table(tlb, virt_to_page(pudp));
> +	struct page *page = virt_to_page(pudp);
> +
> +	pgtable_page_dec(page);

Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
consistent with what we currently have for PTEs and PMDs.

This applies to all the additions of pgtable_page_dec() and
pgtable_page_inc().

> +	tlb_remove_table(tlb, page);
>  }
>  #endif
>  
> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> index b0a57b2..19bfe14 100644
> --- a/arch/loongarch/include/asm/pgalloc.h
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -89,10 +89,15 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
>  {
>  	pud_t *pud;
> +	struct page *page;
>  
> -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
> -	if (pud)
> -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
> +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
> +	if (!page)
> +		return NULL;
> +
> +	pgtable_page_inc(page);
> +	pud = (pud_t *)page_address(page);
> +	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>  	return pud;
>  }
>  
> diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
> index 867e9c3..990f614 100644
> --- a/arch/mips/include/asm/pgalloc.h
> +++ b/arch/mips/include/asm/pgalloc.h
> @@ -89,11 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
>  {
> +	struct page *page;
>  	pud_t *pud;
>  
> -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
> -	if (pud)
> -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
> +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
> +	if (!page)
> +		return NULL;
> +
> +	pgtable_page_inc(page);
> +	pud = (pud_t *)page_address(page);
> +	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>  	return pud;
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index fe6407f..744e2d7 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -125,6 +125,7 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>  {
>  	if (mm_pud_folded(tlb->mm))
>  		return;
> +	pgtable_page_dec(virt_to_page(pud));
>  	tlb->mm->context.flush_mm = 1;
>  	tlb->freed_tables = 1;
>  	tlb->cleared_p4ds = 1;
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index a932d77..5e46e31 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #if CONFIG_PGTABLE_LEVELS > 3
>  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
>  {
> +	struct page *page = virt_to_page(pud);
> +
> +	pgtable_page_dec(page);
>  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> -	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
> +	paravirt_tlb_remove_table(tlb, page);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 4
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index 977bea1..11350f7 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  
>  static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> +	struct page *page;
>  	gfp_t gfp = GFP_PGTABLE_USER;
>  
>  	if (mm == &init_mm)
>  		gfp = GFP_PGTABLE_KERNEL;
> -	return (pud_t *)get_zeroed_page(gfp);
> +	page = alloc_pages(gfp, 0);
> +	if (!page)
> +		return NULL;
> +	pgtable_page_inc(page);
> +	return (pud_t *)page_address(page);
>  }
>  
>  #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
> @@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>  
>  static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
>  {
> +	struct page *page = virt_to_page(pud);
> +
>  	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
> -	free_page((unsigned long)pud);
> +	pgtable_page_dec(page);
> +	__free_page(page);
>  }
>  
>  #ifndef __HAVE_ARCH_PUD_FREE
> -- 
> 1.8.3.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-06-30 14:11   ` Mike Rapoport
@ 2022-07-01  8:00     ` Baolin Wang
  2022-07-02 10:15       ` Mike Rapoport
  0 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2022-07-01  8:00 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: akpm, willy, linux-mm, linux-kernel



On 6/30/2022 10:11 PM, Mike Rapoport wrote:
> On Thu, Jun 30, 2022 at 07:11:14PM +0800, Baolin Wang wrote:
>> Factor out the pagetable pages account into new helper functions to avoid
>> duplicated code. Meanwhile these helper functions also will be used to
>> account pagetable pages which do not need split pagetale lock.
>>
>> Meanwhile convert to use mod_lruvec_page_state() in case of non-order-0
>> page table allocation.
> 
> These are *very* rare. I think only parisc may have non-order-0 pmd and pud
> tables.

s390 also has non-order-0 page table allocation, but they both do not 
use the generic page table allocation now.

> With that, I'd suggest making use of compound_nr() build time opt-in.

After more thinking, I'd prefer to change back to use 
inc_lruvec_page_state()/dec_lruvec_page_state(), since now no 
architecures will need non-order-0 page table allocation.

After this patchset, I plan to convert parisc and s390 to use generic 
pagetable allocation, then I will add non-order-0 page table allocation 
support. Like Matthew suggested, maybe I need change the API to pass the 
number of pages.

>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   include/linux/mm.h | 24 ++++++++++++++++--------
>>   1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index a2270e3..3be6d2c 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2353,20 +2353,30 @@ static inline void pgtable_init(void)
>>   	pgtable_cache_init();
>>   }
>>   
>> +static inline void pgtable_page_inc(struct page *page)
>> +{
>> +	__SetPageTable(page);
>> +	mod_lruvec_page_state(page, NR_PAGETABLE, compound_nr(page));
>> +}
>> +
>> +static inline void pgtable_page_dec(struct page *page)
>> +{
>> +	__ClearPageTable(page);
>> +	mod_lruvec_page_state(page, NR_PAGETABLE, -compound_nr(page));
>> +}
>> +
>>   static inline bool pgtable_pte_page_ctor(struct page *page)
>>   {
>>   	if (!ptlock_init(page))
>>   		return false;
>> -	__SetPageTable(page);
>> -	inc_lruvec_page_state(page, NR_PAGETABLE);
>> +	pgtable_page_inc(page);
>>   	return true;
>>   }
>>   
>>   static inline void pgtable_pte_page_dtor(struct page *page)
>>   {
>>   	ptlock_free(page);
>> -	__ClearPageTable(page);
>> -	dec_lruvec_page_state(page, NR_PAGETABLE);
>> +	pgtable_page_dec(page);
>>   }
>>   
>>   #define pte_offset_map_lock(mm, pmd, address, ptlp)	\
>> @@ -2452,16 +2462,14 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
>>   {
>>   	if (!pmd_ptlock_init(page))
>>   		return false;
>> -	__SetPageTable(page);
>> -	inc_lruvec_page_state(page, NR_PAGETABLE);
>> +	pgtable_page_inc(page);
>>   	return true;
>>   }
>>   
>>   static inline void pgtable_pmd_page_dtor(struct page *page)
>>   {
>>   	pmd_ptlock_free(page);
>> -	__ClearPageTable(page);
>> -	dec_lruvec_page_state(page, NR_PAGETABLE);
>> +	pgtable_page_dec(page);
>>   }
>>   
>>   /*
>> -- 
>> 1.8.3.1
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-06-30 14:17   ` Mike Rapoport
@ 2022-07-01  8:04     ` Baolin Wang
  2022-07-03  3:40       ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2022-07-01  8:04 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: akpm, willy, linux-mm, linux-kernel



On 6/30/2022 10:17 PM, Mike Rapoport wrote:
> On Thu, Jun 30, 2022 at 07:11:15PM +0800, Baolin Wang wrote:
>> Now the PUD level ptes are always protected by mm->page_table_lock,
>> which means no split pagetable lock needed. So the generic PUD level
>> pagetable pages allocation will not call pgtable_pte_page_ctor/dtor(),
>> that means we will miss to account PUD level pagetable pages.
>>
>> Adding pagetable account by calling pgtable_page_inc() or
>> pgtable_page_dec() when allocating or freeing PUD level pagetable
>> pages to help to get an accurate pagetable accounting.
>>
>> Moreover this patch will also mark the PUD level pagetable with PG_table
>> flag, which will help to do sanity validation in unpoison_memory() and
>> get more accurate pagetable accounting by /proc/kpageflags interface.
>>
>> Meanwhile converting the architectures with using generic PUD pagatable
>> allocation to add corresponding pgtable_page_inc() or pgtable_page_dec()
>> to account PUD level pagetable.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   arch/arm64/include/asm/tlb.h         |  5 ++++-
>>   arch/loongarch/include/asm/pgalloc.h | 11 ++++++++---
>>   arch/mips/include/asm/pgalloc.h      | 11 ++++++++---
>>   arch/s390/include/asm/tlb.h          |  1 +
>>   arch/x86/mm/pgtable.c                |  5 ++++-
>>   include/asm-generic/pgalloc.h        | 12 ++++++++++--
>>   6 files changed, 35 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>> index c995d1f..1772df9 100644
>> --- a/arch/arm64/include/asm/tlb.h
>> +++ b/arch/arm64/include/asm/tlb.h
>> @@ -94,7 +94,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
>>   static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
>>   				  unsigned long addr)
>>   {
>> -	tlb_remove_table(tlb, virt_to_page(pudp));
>> +	struct page *page = virt_to_page(pudp);
>> +
>> +	pgtable_page_dec(page);
> 
> Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
> consistent with what we currently have for PTEs and PMDs.
> 
> This applies to all the additions of pgtable_page_dec() and
> pgtable_page_inc().

OK. I can add pgtable_pud_page_ctor() and pgtable_pud_page_dtor() 
helpers to keep consistent, which are just wrappers of 
pgtable_page_inc() and pgtable_page_dec().

> 
>> +	tlb_remove_table(tlb, page);
>>   }
>>   #endif
>>   
>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>> index b0a57b2..19bfe14 100644
>> --- a/arch/loongarch/include/asm/pgalloc.h
>> +++ b/arch/loongarch/include/asm/pgalloc.h
>> @@ -89,10 +89,15 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>>   static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
>>   {
>>   	pud_t *pud;
>> +	struct page *page;
>>   
>> -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
>> -	if (pud)
>> -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>> +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
>> +	if (!page)
>> +		return NULL;
>> +
>> +	pgtable_page_inc(page);
>> +	pud = (pud_t *)page_address(page);
>> +	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>>   	return pud;
>>   }
>>   
>> diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
>> index 867e9c3..990f614 100644
>> --- a/arch/mips/include/asm/pgalloc.h
>> +++ b/arch/mips/include/asm/pgalloc.h
>> @@ -89,11 +89,16 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>>   
>>   static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
>>   {
>> +	struct page *page;
>>   	pud_t *pud;
>>   
>> -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
>> -	if (pud)
>> -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>> +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
>> +	if (!page)
>> +		return NULL;
>> +
>> +	pgtable_page_inc(page);
>> +	pud = (pud_t *)page_address(page);
>> +	pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
>>   	return pud;
>>   }
>>   
>> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
>> index fe6407f..744e2d7 100644
>> --- a/arch/s390/include/asm/tlb.h
>> +++ b/arch/s390/include/asm/tlb.h
>> @@ -125,6 +125,7 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>>   {
>>   	if (mm_pud_folded(tlb->mm))
>>   		return;
>> +	pgtable_page_dec(virt_to_page(pud));
>>   	tlb->mm->context.flush_mm = 1;
>>   	tlb->freed_tables = 1;
>>   	tlb->cleared_p4ds = 1;
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index a932d77..5e46e31 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -76,8 +76,11 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>>   #if CONFIG_PGTABLE_LEVELS > 3
>>   void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
>>   {
>> +	struct page *page = virt_to_page(pud);
>> +
>> +	pgtable_page_dec(page);
>>   	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
>> -	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
>> +	paravirt_tlb_remove_table(tlb, page);
>>   }
>>   
>>   #if CONFIG_PGTABLE_LEVELS > 4
>> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
>> index 977bea1..11350f7 100644
>> --- a/include/asm-generic/pgalloc.h
>> +++ b/include/asm-generic/pgalloc.h
>> @@ -149,11 +149,16 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>>   
>>   static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>>   {
>> +	struct page *page;
>>   	gfp_t gfp = GFP_PGTABLE_USER;
>>   
>>   	if (mm == &init_mm)
>>   		gfp = GFP_PGTABLE_KERNEL;
>> -	return (pud_t *)get_zeroed_page(gfp);
>> +	page = alloc_pages(gfp, 0);
>> +	if (!page)
>> +		return NULL;
>> +	pgtable_page_inc(page);
>> +	return (pud_t *)page_address(page);
>>   }
>>   
>>   #ifndef __HAVE_ARCH_PUD_ALLOC_ONE
>> @@ -174,8 +179,11 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>>   
>>   static inline void __pud_free(struct mm_struct *mm, pud_t *pud)
>>   {
>> +	struct page *page = virt_to_page(pud);
>> +
>>   	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
>> -	free_page((unsigned long)pud);
>> +	pgtable_page_dec(page);
>> +	__free_page(page);
>>   }
>>   
>>   #ifndef __HAVE_ARCH_PUD_FREE
>> -- 
>> 1.8.3.1
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-07-01  8:00     ` Baolin Wang
@ 2022-07-02 10:15       ` Mike Rapoport
  2022-07-03 13:48         ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2022-07-02 10:15 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, willy, linux-mm, linux-kernel

On Fri, Jul 01, 2022 at 04:00:59PM +0800, Baolin Wang wrote:
> 
> 
> On 6/30/2022 10:11 PM, Mike Rapoport wrote:
> > On Thu, Jun 30, 2022 at 07:11:14PM +0800, Baolin Wang wrote:
> > > Factor out the pagetable pages account into new helper functions to avoid
> > > duplicated code. Meanwhile these helper functions also will be used to
> > > account pagetable pages which do not need split pagetale lock.
> > > 
> > > Meanwhile convert to use mod_lruvec_page_state() in case of non-order-0
> > > page table allocation.
> > 
> > These are *very* rare. I think only parisc may have non-order-0 pmd and pud
> > tables.
> 
> s390 also has non-order-0 page table allocation, but they both do not use
> the generic page table allocation now.
> 
> > With that, I'd suggest making use of compound_nr() build time opt-in.
> 
> After more thinking, I'd prefer to change back to use
> inc_lruvec_page_state()/dec_lruvec_page_state(), since now no architecures
> will need non-order-0 page table allocation.
> 
> After this patchset, I plan to convert parisc and s390 to use generic
> pagetable allocation, then I will add non-order-0 page table allocation
> support. Like Matthew suggested, maybe I need change the API to pass the
> number of pages.

I think it would be simpler to add proper accounting to s390 and parisc
versions than make them use the generic allocation functions. Moreover, API
change to support these cases feels like unnecessary churn to me.
 
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-01  8:04     ` Baolin Wang
@ 2022-07-03  3:40       ` Matthew Wilcox
  2022-07-03 14:06         ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-07-03  3:40 UTC (permalink / raw)
  To: Baolin Wang; +Cc: Mike Rapoport, akpm, linux-mm, linux-kernel

On Fri, Jul 01, 2022 at 04:04:21PM +0800, Baolin Wang wrote:
> > Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
> > consistent with what we currently have for PTEs and PMDs.
> > 
> > This applies to all the additions of pgtable_page_dec() and
> > pgtable_page_inc().
> 
> OK. I can add pgtable_pud_page_ctor() and pgtable_pud_page_dtor() helpers to
> keep consistent, which are just wrappers of pgtable_page_inc() and
> pgtable_page_dec().

I think you misunderstand Mike.

Don't add pgtable_page_inc() and pgtable_page_dec().  Just add
pgtable_pud_page_ctor() and pgtable_pud_page_dtor().  At least, that
was what I said last time yo uposted these patches.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-06-30 11:11 ` [RFC PATCH v3 2/3] mm: Add PUD level pagetable account Baolin Wang
  2022-06-30 14:17   ` Mike Rapoport
@ 2022-07-03  3:47   ` Matthew Wilcox
  2022-07-03 14:18     ` Mike Rapoport
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-07-03  3:47 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, linux-mm, linux-kernel, Huacai Chen, WANG Xuerui, loongarch

On Thu, Jun 30, 2022 at 07:11:15PM +0800, Baolin Wang wrote:
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -89,10 +89,15 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
>  {
>  	pud_t *pud;
> +	struct page *page;
>  
> -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
> -	if (pud)
> -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
> +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);

Argh.  I just got rid of PMD_ORDER from PA-RISC.  Now Loongstupid has
reintroduced it, and worse introduced PUD_ORDER.  See commit
7bf82eb3873f.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function
  2022-07-02 10:15       ` Mike Rapoport
@ 2022-07-03 13:48         ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2022-07-03 13:48 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: akpm, willy, linux-mm, linux-kernel



On 7/2/2022 6:15 PM, Mike Rapoport wrote:
> On Fri, Jul 01, 2022 at 04:00:59PM +0800, Baolin Wang wrote:
>>
>>
>> On 6/30/2022 10:11 PM, Mike Rapoport wrote:
>>> On Thu, Jun 30, 2022 at 07:11:14PM +0800, Baolin Wang wrote:
>>>> Factor out the pagetable pages account into new helper functions to avoid
>>>> duplicated code. Meanwhile these helper functions also will be used to
>>>> account pagetable pages which do not need split pagetale lock.
>>>>
>>>> Meanwhile convert to use mod_lruvec_page_state() in case of non-order-0
>>>> page table allocation.
>>>
>>> These are *very* rare. I think only parisc may have non-order-0 pmd and pud
>>> tables.
>>
>> s390 also has non-order-0 page table allocation, but they both do not use
>> the generic page table allocation now.
>>
>>> With that, I'd suggest making use of compound_nr() build time opt-in.
>>
>> After more thinking, I'd prefer to change back to use
>> inc_lruvec_page_state()/dec_lruvec_page_state(), since now no architecures
>> will need non-order-0 page table allocation.
>>
>> After this patchset, I plan to convert parisc and s390 to use generic
>> pagetable allocation, then I will add non-order-0 page table allocation
>> support. Like Matthew suggested, maybe I need change the API to pass the
>> number of pages.
> 
> I think it would be simpler to add proper accounting to s390 and parisc
> versions than make them use the generic allocation functions. Moreover, API
> change to support these cases feels like unnecessary churn to me.

Got it. Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03  3:40       ` Matthew Wilcox
@ 2022-07-03 14:06         ` Baolin Wang
  2022-07-03 14:28           ` Mike Rapoport
  2022-07-03 14:52           ` Matthew Wilcox
  0 siblings, 2 replies; 18+ messages in thread
From: Baolin Wang @ 2022-07-03 14:06 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Mike Rapoport, akpm, linux-mm, linux-kernel



On 7/3/2022 11:40 AM, Matthew Wilcox wrote:
> On Fri, Jul 01, 2022 at 04:04:21PM +0800, Baolin Wang wrote:
>>> Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
>>> consistent with what we currently have for PTEs and PMDs.
>>>
>>> This applies to all the additions of pgtable_page_dec() and
>>> pgtable_page_inc().
>>
>> OK. I can add pgtable_pud_page_ctor() and pgtable_pud_page_dtor() helpers to
>> keep consistent, which are just wrappers of pgtable_page_inc() and
>> pgtable_page_dec().
> 
> I think you misunderstand Mike.
> 
> Don't add pgtable_page_inc() and pgtable_page_dec().  Just add
> pgtable_pud_page_ctor() and pgtable_pud_page_dtor().  At least, that
> was what I said last time yo uposted these patches.

My concern is that I need another helpers for kernel page table 
allocation helpers, if only adding pgtable_pud_page_ctor() and 
pgtable_pud_page_dtor() like below:

static inline void pgtable_pud_page_ctor(struct page *page)
{
	__SetPageTable(page);
	inc_lruvec_page_state(page, NR_PAGETABLE);
}

static inline void pgtable_pud_page_dtor(struct page *page)
{
	__ClearPageTable(page);
	dec_lruvec_page_state(page, NR_PAGETABLE);
}

So for kernel pte page table allocation, I need another similar helpers 
like below. However they do the samething with 
pgtable_pud_page_ctor/pgtable_pud_page_dtor, so I am not sure this is 
good for adding these duplicate code.

static inline void pgtable_kernel_pte_page_ctor(struct page *page)
{
	__SetPageTable(page);
	inc_lruvec_page_state(page, NR_PAGETABLE);
}

static inline void pgtable_kernel_pte_page_dtor(struct page *page)
{
	__ClearPageTable(page);
	dec_lruvec_page_state(page, NR_PAGETABLE);
}

Instead adding a common helpers seems more readable to me, which can 
also simplify original pgtable_pmd_page_dtor()/pgtable_pmd_page_ctor(). 
Something like below.

static inline void pgtable_page_inc(struct page *page)
{
	__SetPageTable(page);
	inc_lruvec_page_state(page, NR_PAGETABLE);
}

static inline void pgtable_page_dec(struct page *page)
{
	__ClearPageTable(page);
	dec_lruvec_page_state(page, NR_PAGETABLE);
}

static inline void pgtable_pud_page_ctor(struct page *page)
{
	pgtable_page_inc(page);
}

static inline void pgtable_pud_page_dtor(struct page *page)
{
	pgtable_page_dec(page);
}

For kernel pte page table, we can just use 
pgtable_page_inc/pgtable_page_dec(), or adding 
pgtable_kernel_pte_page_ctor/pgtable_kernel_pte_page_dtor, which just 
wrappers of pgtable_page_inc() and pgtable_page_dec().

Matthew and Mike, how do you think? Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03  3:47   ` Matthew Wilcox
@ 2022-07-03 14:18     ` Mike Rapoport
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2022-07-03 14:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Baolin Wang, akpm, linux-mm, linux-kernel, Huacai Chen,
	WANG Xuerui, loongarch

On Sun, Jul 03, 2022 at 04:47:07AM +0100, Matthew Wilcox wrote:
> On Thu, Jun 30, 2022 at 07:11:15PM +0800, Baolin Wang wrote:
> > +++ b/arch/loongarch/include/asm/pgalloc.h
> > @@ -89,10 +89,15 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
> >  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
> >  {
> >  	pud_t *pud;
> > +	struct page *page;
> >  
> > -	pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_ORDER);
> > -	if (pud)
> > -		pud_init((unsigned long)pud, (unsigned long)invalid_pmd_table);
> > +	page = alloc_pages(GFP_KERNEL, PUD_ORDER);
> 
> Argh.  I just got rid of PMD_ORDER from PA-RISC.  Now Loongstupid has
> reintroduced it, and worse introduced PUD_ORDER.  See commit
> 7bf82eb3873f.

Let's try and deal with all those PxD_ORDERs for once.

https://lore.kernel.org/linux-arch/20220703141203.147893-1-rppt@kernel.org/

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03 14:06         ` Baolin Wang
@ 2022-07-03 14:28           ` Mike Rapoport
  2022-07-03 14:49             ` Baolin Wang
  2022-07-03 14:52           ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2022-07-03 14:28 UTC (permalink / raw)
  To: Baolin Wang; +Cc: Matthew Wilcox, akpm, linux-mm, linux-kernel

On Sun, Jul 03, 2022 at 10:06:32PM +0800, Baolin Wang wrote:
> 
> 
> On 7/3/2022 11:40 AM, Matthew Wilcox wrote:
> > On Fri, Jul 01, 2022 at 04:04:21PM +0800, Baolin Wang wrote:
> > > > Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
> > > > consistent with what we currently have for PTEs and PMDs.
> > > > 
> > > > This applies to all the additions of pgtable_page_dec() and
> > > > pgtable_page_inc().
> > > 
> > > OK. I can add pgtable_pud_page_ctor() and pgtable_pud_page_dtor() helpers to
> > > keep consistent, which are just wrappers of pgtable_page_inc() and
> > > pgtable_page_dec().
> > 
> > I think you misunderstand Mike.
> > 
> > Don't add pgtable_page_inc() and pgtable_page_dec().  Just add
> > pgtable_pud_page_ctor() and pgtable_pud_page_dtor().  At least, that
> > was what I said last time you posted these patches.
> 
> My concern is that I need another helpers for kernel page table allocation
> helpers, if only adding pgtable_pud_page_ctor() and pgtable_pud_page_dtor()
> like below:
> 
> static inline void pgtable_pud_page_ctor(struct page *page)
> {
> 	__SetPageTable(page);
> 	inc_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> static inline void pgtable_pud_page_dtor(struct page *page)
> {
> 	__ClearPageTable(page);
> 	dec_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> So for kernel pte page table allocation, I need another similar helpers like
> below. However they do the samething with
> pgtable_pud_page_ctor/pgtable_pud_page_dtor, so I am not sure this is good
> for adding these duplicate code.
> 
> static inline void pgtable_kernel_pte_page_ctor(struct page *page)
> {
> 	__SetPageTable(page);
> 	inc_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> static inline void pgtable_kernel_pte_page_dtor(struct page *page)
> {
> 	__ClearPageTable(page);
> 	dec_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> Instead adding a common helpers seems more readable to me, which can also
> simplify original pgtable_pmd_page_dtor()/pgtable_pmd_page_ctor(). Something
> like below.
> 
> static inline void pgtable_page_inc(struct page *page)
> {
> 	__SetPageTable(page);
> 	inc_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> static inline void pgtable_page_dec(struct page *page)
> {
> 	__ClearPageTable(page);
> 	dec_lruvec_page_state(page, NR_PAGETABLE);
> }
> 
> static inline void pgtable_pud_page_ctor(struct page *page)
> {
> 	pgtable_page_inc(page);
> }
> 
> static inline void pgtable_pud_page_dtor(struct page *page)
> {
> 	pgtable_page_dec(page);
> }
> 
> For kernel pte page table, we can just use
> pgtable_page_inc/pgtable_page_dec(), or adding
> pgtable_kernel_pte_page_ctor/pgtable_kernel_pte_page_dtor, which just
> wrappers of pgtable_page_inc() and pgtable_page_dec().
> 
> Matthew and Mike, how do you think? Thanks.

I actually meant to add pgtable_pud_page_ctor/dtor() as a wrapper for the
new helper to keep pud tables allocation consistent with pmd and pte and
as a provision for the time we'll have per-page pud locks.

For the accounting of the kernel page tables a new helper does make sense
because there are no locks to initialize for the kernel page tables.

I can't say that I'm happy with the pgtable_page_inc/dec names, though.

Maybe page_{set,clear}_pgtable()?

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03 14:28           ` Mike Rapoport
@ 2022-07-03 14:49             ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2022-07-03 14:49 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Matthew Wilcox, akpm, linux-mm, linux-kernel



On 7/3/2022 10:28 PM, Mike Rapoport wrote:
> On Sun, Jul 03, 2022 at 10:06:32PM +0800, Baolin Wang wrote:
>>
>>
>> On 7/3/2022 11:40 AM, Matthew Wilcox wrote:
>>> On Fri, Jul 01, 2022 at 04:04:21PM +0800, Baolin Wang wrote:
>>>>> Using pgtable_pud_page_ctor() and pgtable_pud_page_dtor() would be
>>>>> consistent with what we currently have for PTEs and PMDs.
>>>>>
>>>>> This applies to all the additions of pgtable_page_dec() and
>>>>> pgtable_page_inc().
>>>>
>>>> OK. I can add pgtable_pud_page_ctor() and pgtable_pud_page_dtor() helpers to
>>>> keep consistent, which are just wrappers of pgtable_page_inc() and
>>>> pgtable_page_dec().
>>>
>>> I think you misunderstand Mike.
>>>
>>> Don't add pgtable_page_inc() and pgtable_page_dec().  Just add
>>> pgtable_pud_page_ctor() and pgtable_pud_page_dtor().  At least, that
>>> was what I said last time you posted these patches.
>>
>> My concern is that I need another helpers for kernel page table allocation
>> helpers, if only adding pgtable_pud_page_ctor() and pgtable_pud_page_dtor()
>> like below:
>>
>> static inline void pgtable_pud_page_ctor(struct page *page)
>> {
>> 	__SetPageTable(page);
>> 	inc_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> static inline void pgtable_pud_page_dtor(struct page *page)
>> {
>> 	__ClearPageTable(page);
>> 	dec_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> So for kernel pte page table allocation, I need another similar helpers like
>> below. However they do the samething with
>> pgtable_pud_page_ctor/pgtable_pud_page_dtor, so I am not sure this is good
>> for adding these duplicate code.
>>
>> static inline void pgtable_kernel_pte_page_ctor(struct page *page)
>> {
>> 	__SetPageTable(page);
>> 	inc_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> static inline void pgtable_kernel_pte_page_dtor(struct page *page)
>> {
>> 	__ClearPageTable(page);
>> 	dec_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> Instead adding a common helpers seems more readable to me, which can also
>> simplify original pgtable_pmd_page_dtor()/pgtable_pmd_page_ctor(). Something
>> like below.
>>
>> static inline void pgtable_page_inc(struct page *page)
>> {
>> 	__SetPageTable(page);
>> 	inc_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> static inline void pgtable_page_dec(struct page *page)
>> {
>> 	__ClearPageTable(page);
>> 	dec_lruvec_page_state(page, NR_PAGETABLE);
>> }
>>
>> static inline void pgtable_pud_page_ctor(struct page *page)
>> {
>> 	pgtable_page_inc(page);
>> }
>>
>> static inline void pgtable_pud_page_dtor(struct page *page)
>> {
>> 	pgtable_page_dec(page);
>> }
>>
>> For kernel pte page table, we can just use
>> pgtable_page_inc/pgtable_page_dec(), or adding
>> pgtable_kernel_pte_page_ctor/pgtable_kernel_pte_page_dtor, which just
>> wrappers of pgtable_page_inc() and pgtable_page_dec().
>>
>> Matthew and Mike, how do you think? Thanks.
> 
> I actually meant to add pgtable_pud_page_ctor/dtor() as a wrapper for the
> new helper to keep pud tables allocation consistent with pmd and pte and
> as a provision for the time we'll have per-page pud locks.
> 
> For the accounting of the kernel page tables a new helper does make sense
> because there are no locks to initialize for the kernel page tables.

Thanks for clarification. That is also my thought.

> 
> I can't say that I'm happy with the pgtable_page_inc/dec names, though.
> 
> Maybe page_{set,clear}_pgtable()?

Sounds better than pgtable_page_inc/dec() for me. I will use them in 
next version if no other objections. Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03 14:06         ` Baolin Wang
  2022-07-03 14:28           ` Mike Rapoport
@ 2022-07-03 14:52           ` Matthew Wilcox
  2022-07-03 15:07             ` Baolin Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2022-07-03 14:52 UTC (permalink / raw)
  To: Baolin Wang; +Cc: Mike Rapoport, akpm, linux-mm, linux-kernel

On Sun, Jul 03, 2022 at 10:06:32PM +0800, Baolin Wang wrote:
> So for kernel pte page table allocation, I need another similar helpers like
> below. However they do the samething with
> pgtable_pud_page_ctor/pgtable_pud_page_dtor, so I am not sure this is good
> for adding these duplicate code.

Why do we want to account kernel PTE page tables in NR_PAGETABLE?
I think that's confusing.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v3 2/3] mm: Add PUD level pagetable account
  2022-07-03 14:52           ` Matthew Wilcox
@ 2022-07-03 15:07             ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2022-07-03 15:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Mike Rapoport, akpm, linux-mm, linux-kernel



On 7/3/2022 10:52 PM, Matthew Wilcox wrote:
> On Sun, Jul 03, 2022 at 10:06:32PM +0800, Baolin Wang wrote:
>> So for kernel pte page table allocation, I need another similar helpers like
>> below. However they do the samething with
>> pgtable_pud_page_ctor/pgtable_pud_page_dtor, so I am not sure this is good
>> for adding these duplicate code.
> 
> Why do we want to account kernel PTE page tables in NR_PAGETABLE?
> I think that's confusing.

Why this will confuse you? I think it is inconsistent that kernel PTE 
page tables are not accounted, because we will account PMD/PUD level 
page tables no matter they are userspace pagetable pages or kernel 
pagetable pages.

Moreover the the vmalloc()/vmap() can consume some kernel pagetable 
pages, which should be accounted.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-07-03 15:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 11:11 [RFC PATCH v3 0/3] Add PUD and kernel PTE level pagetable account Baolin Wang
2022-06-30 11:11 ` [RFC PATCH v3 1/3] mm: Factor out the pagetable pages account into new helper function Baolin Wang
2022-06-30 14:11   ` Mike Rapoport
2022-07-01  8:00     ` Baolin Wang
2022-07-02 10:15       ` Mike Rapoport
2022-07-03 13:48         ` Baolin Wang
2022-06-30 11:11 ` [RFC PATCH v3 2/3] mm: Add PUD level pagetable account Baolin Wang
2022-06-30 14:17   ` Mike Rapoport
2022-07-01  8:04     ` Baolin Wang
2022-07-03  3:40       ` Matthew Wilcox
2022-07-03 14:06         ` Baolin Wang
2022-07-03 14:28           ` Mike Rapoport
2022-07-03 14:49             ` Baolin Wang
2022-07-03 14:52           ` Matthew Wilcox
2022-07-03 15:07             ` Baolin Wang
2022-07-03  3:47   ` Matthew Wilcox
2022-07-03 14:18     ` Mike Rapoport
2022-06-30 11:11 ` [RFC PATCH v3 3/3] mm: Add kernel PTE level pagetable pages account Baolin Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.