All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08  8:58 ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

Hi,

As Mike pointed out [1], the huge_ptep_get() will only return one specific
pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
size hugetlb page for those functions that want to check the dirty or
young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
will also get inaccurate access statistics.

To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
implementation, which will take into account any subpages' dirty or young bits.
However we should add a new parameter for ARM64 specific huge_ptep_get() to check
how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
should convert all the places using huge_ptep_get(), meanwhile most places using
huge_ptep_get() did not care about the dirty or young flags at all.

So instead of changing the prototype of huge_ptep_get(), this patch set introduces
a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
functions that care about the dirty or young flags of a hugetlb page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Baolin Wang (3):
  arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
  mm/damon/vaddr: Change to use huge_ptep_get_access_flags()

 arch/arm64/include/asm/hugetlb.h |  2 ++
 arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
 fs/proc/task_mmu.c               |  3 ++-
 include/asm-generic/hugetlb.h    |  7 +++++++
 mm/damon/vaddr.c                 |  5 +++--
 5 files changed, 38 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08  8:58 ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

Hi,

As Mike pointed out [1], the huge_ptep_get() will only return one specific
pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
size hugetlb page for those functions that want to check the dirty or
young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
will also get inaccurate access statistics.

To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
implementation, which will take into account any subpages' dirty or young bits.
However we should add a new parameter for ARM64 specific huge_ptep_get() to check
how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
should convert all the places using huge_ptep_get(), meanwhile most places using
huge_ptep_get() did not care about the dirty or young flags at all.

So instead of changing the prototype of huge_ptep_get(), this patch set introduces
a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
functions that care about the dirty or young flags of a hugetlb page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Baolin Wang (3):
  arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
  mm/damon/vaddr: Change to use huge_ptep_get_access_flags()

 arch/arm64/include/asm/hugetlb.h |  2 ++
 arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
 fs/proc/task_mmu.c               |  3 ++-
 include/asm-generic/hugetlb.h    |  7 +++++++
 mm/damon/vaddr.c                 |  5 +++--
 5 files changed, 38 insertions(+), 3 deletions(-)

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  2022-05-08  8:58 ` Baolin Wang
@ 2022-05-08  8:58   ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

Now we use huge_ptep_get() to get the pte value of a hugetlb page,
however it will only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
continuous pte or pmd entries with same page table attributes. And it
will not take into account the subpages' dirty or young bits of a
CONT-PTE/PMD size hugetlb page.

So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
which already takes account the dirty or young bits for any subpages
in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
young flags statistics for hugetlb pages with current huge_ptep_get(),
such as the gather_hugetlb_stats() function.

Thus introduce a new huge_ptep_get_access_flags() interface and define
an ARM64 specific implementation, that will take into account any subpages'
dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
that want to check the dirty and young flags of a hugetlb page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/hugetlb.h |  2 ++
 arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
 include/asm-generic/hugetlb.h    |  7 +++++++
 3 files changed, 33 insertions(+)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 616b2ca..a473544 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, unsigned long sz);
+#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
+extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 				 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index ca8e65c..ce39699 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
 	return contig_ptes;
 }
 
+pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
+{
+	int ncontig, i;
+	size_t pgsize;
+	pte_t orig_pte = ptep_get(ptep);
+
+	if (!pte_cont(orig_pte))
+		return orig_pte;
+
+	ncontig = num_contig_ptes(sz, &pgsize);
+
+	for (i = 0; i < ncontig; i++, ptep++) {
+		pte_t pte = ptep_get(ptep);
+
+		if (pte_dirty(pte))
+			orig_pte = pte_mkdirty(orig_pte);
+
+		if (pte_young(pte))
+			orig_pte = pte_mkyoung(orig_pte);
+	}
+
+	return orig_pte;
+}
+
 /*
  * Changing some bits of contiguous entries requires us to follow a
  * Break-Before-Make approach, breaking the whole contiguous set
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index a57d667..bb77fb0 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
 }
 #endif
 
+#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
+static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
+{
+	return ptep_get(ptep);
+}
+#endif
+
 #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED
 static inline bool gigantic_page_runtime_supported(void)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08  8:58   ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

Now we use huge_ptep_get() to get the pte value of a hugetlb page,
however it will only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
continuous pte or pmd entries with same page table attributes. And it
will not take into account the subpages' dirty or young bits of a
CONT-PTE/PMD size hugetlb page.

So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
which already takes account the dirty or young bits for any subpages
in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
young flags statistics for hugetlb pages with current huge_ptep_get(),
such as the gather_hugetlb_stats() function.

Thus introduce a new huge_ptep_get_access_flags() interface and define
an ARM64 specific implementation, that will take into account any subpages'
dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
that want to check the dirty and young flags of a hugetlb page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/hugetlb.h |  2 ++
 arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
 include/asm-generic/hugetlb.h    |  7 +++++++
 3 files changed, 33 insertions(+)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 616b2ca..a473544 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, unsigned long sz);
+#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
+extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 				 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index ca8e65c..ce39699 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
 	return contig_ptes;
 }
 
+pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
+{
+	int ncontig, i;
+	size_t pgsize;
+	pte_t orig_pte = ptep_get(ptep);
+
+	if (!pte_cont(orig_pte))
+		return orig_pte;
+
+	ncontig = num_contig_ptes(sz, &pgsize);
+
+	for (i = 0; i < ncontig; i++, ptep++) {
+		pte_t pte = ptep_get(ptep);
+
+		if (pte_dirty(pte))
+			orig_pte = pte_mkdirty(orig_pte);
+
+		if (pte_young(pte))
+			orig_pte = pte_mkyoung(orig_pte);
+	}
+
+	return orig_pte;
+}
+
 /*
  * Changing some bits of contiguous entries requires us to follow a
  * Break-Before-Make approach, breaking the whole contiguous set
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index a57d667..bb77fb0 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
 }
 #endif
 
+#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
+static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
+{
+	return ptep_get(ptep);
+}
+#endif
+
 #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED
 static inline bool gigantic_page_runtime_supported(void)
 {
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/3] fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
  2022-05-08  8:58 ` Baolin Wang
@ 2022-05-08  8:58   ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

The ARM64 platform can support CONT-PTE/PMD size hugetlb, which can
contain seravel continuous pte or pmd entries. However current
huge_ptep_get() only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb, which did not take into accounts the
subpages' dirty or young flags. So the gather_hugetlb_stats()
will miss some dirty hugetlb statistics.

Thus change to use huge_ptep_get_access_flags() taking into accounts
the subpages' dirty or young flags of a CONT-PTE/PMD size hugetlb,
to make the hugetlb statistics more accurate.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 fs/proc/task_mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f9c9abb..3f224a7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1880,7 +1880,8 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 		unsigned long addr, unsigned long end, struct mm_walk *walk)
 {
-	pte_t huge_pte = huge_ptep_get(pte);
+	pte_t huge_pte = huge_ptep_get_access_flags(pte,
+				huge_page_size(hstate_vma(walk->vma)));
 	struct numa_maps *md;
 	struct page *page;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/3] fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
@ 2022-05-08  8:58   ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

The ARM64 platform can support CONT-PTE/PMD size hugetlb, which can
contain seravel continuous pte or pmd entries. However current
huge_ptep_get() only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb, which did not take into accounts the
subpages' dirty or young flags. So the gather_hugetlb_stats()
will miss some dirty hugetlb statistics.

Thus change to use huge_ptep_get_access_flags() taking into accounts
the subpages' dirty or young flags of a CONT-PTE/PMD size hugetlb,
to make the hugetlb statistics more accurate.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 fs/proc/task_mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f9c9abb..3f224a7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1880,7 +1880,8 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 		unsigned long addr, unsigned long end, struct mm_walk *walk)
 {
-	pte_t huge_pte = huge_ptep_get(pte);
+	pte_t huge_pte = huge_ptep_get_access_flags(pte,
+				huge_page_size(hstate_vma(walk->vma)));
 	struct numa_maps *md;
 	struct page *page;
 
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/3] mm/damon/vaddr: Change to use huge_ptep_get_access_flags()
  2022-05-08  8:58 ` Baolin Wang
@ 2022-05-08  8:58   ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

The ARM64 platform can support CONT-PTE/PMD size hugetlb, which can
contain seravel continuous pte or pmd entries. However current
huge_ptep_get() only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb, which did not take into accounts the
subpages' dirty or young flags. That will make the hugetlb pages
monitoring inaccurate with missing young flags.

Thus change to use huge_ptep_get_access_flags() taking into accounts
the subpages' dirty or young flags of a CONT-PTE/PMD size hugetlb.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/damon/vaddr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index d6abf76..29459ed 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -400,7 +400,8 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
 				struct vm_area_struct *vma, unsigned long addr)
 {
 	bool referenced = false;
-	pte_t entry = huge_ptep_get(pte);
+	pte_t entry = huge_ptep_get_access_flags(pte,
+					huge_page_size(hstate_vma(vma)));
 	struct page *page = pte_page(entry);
 
 	get_page(page);
@@ -557,7 +558,7 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	pte_t entry;
 
 	ptl = huge_pte_lock(h, walk->mm, pte);
-	entry = huge_ptep_get(pte);
+	entry = huge_ptep_get_access_flags(pte, huge_page_size(h));
 	if (!pte_present(entry))
 		goto out;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/3] mm/damon/vaddr: Change to use huge_ptep_get_access_flags()
@ 2022-05-08  8:58   ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-08  8:58 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

The ARM64 platform can support CONT-PTE/PMD size hugetlb, which can
contain seravel continuous pte or pmd entries. However current
huge_ptep_get() only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb, which did not take into accounts the
subpages' dirty or young flags. That will make the hugetlb pages
monitoring inaccurate with missing young flags.

Thus change to use huge_ptep_get_access_flags() taking into accounts
the subpages' dirty or young flags of a CONT-PTE/PMD size hugetlb.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/damon/vaddr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index d6abf76..29459ed 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -400,7 +400,8 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
 				struct vm_area_struct *vma, unsigned long addr)
 {
 	bool referenced = false;
-	pte_t entry = huge_ptep_get(pte);
+	pte_t entry = huge_ptep_get_access_flags(pte,
+					huge_page_size(hstate_vma(vma)));
 	struct page *page = pte_page(entry);
 
 	get_page(page);
@@ -557,7 +558,7 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	pte_t entry;
 
 	ptl = huge_pte_lock(h, walk->mm, pte);
-	entry = huge_ptep_get(pte);
+	entry = huge_ptep_get_access_flags(pte, huge_page_size(h));
 	if (!pte_present(entry))
 		goto out;
 
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] mm/damon/vaddr: Change to use huge_ptep_get_access_flags()
  2022-05-08  8:58   ` Baolin Wang
  (?)
@ 2022-05-08 12:41   ` kernel test robot
  -1 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2022-05-08 12:41 UTC (permalink / raw)
  To: Baolin Wang; +Cc: llvm, kbuild-all

Hi Baolin,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on arnd-asm-generic/master hnaz-mm/master linus/master v5.18-rc5 next-20220506]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Introduce-new-huge_ptep_get_access_flags-interface/20220508-170027
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: s390-randconfig-r044-20220508 (https://download.01.org/0day-ci/archive/20220508/202205082037.kql6rnHD-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a385645b470e2d3a1534aae618ea56b31177639f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/4a6618c4db26ef143fd29f9ff2159fedd73ab733
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Baolin-Wang/Introduce-new-huge_ptep_get_access_flags-interface/20220508-170027
        git checkout 4a6618c4db26ef143fd29f9ff2159fedd73ab733
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/damon/vaddr.c:402:16: error: call to undeclared function 'huge_ptep_get_access_flags'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           pte_t entry = huge_ptep_get_access_flags(pte,
                         ^
   mm/damon/vaddr.c:402:16: note: did you mean 'huge_ptep_set_access_flags'?
   arch/s390/include/asm/hugetlb.h:59:19: note: 'huge_ptep_set_access_flags' declared here
   static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
                     ^
>> mm/damon/vaddr.c:402:8: error: initializing 'pte_t' with an expression of incompatible type 'int'
           pte_t entry = huge_ptep_get_access_flags(pte,
                 ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/damon/vaddr.c:560:10: error: call to undeclared function 'huge_ptep_get_access_flags'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           entry = huge_ptep_get_access_flags(pte, huge_page_size(h));
                   ^
>> mm/damon/vaddr.c:560:8: error: assigning to 'pte_t' from incompatible type 'int'
           entry = huge_ptep_get_access_flags(pte, huge_page_size(h));
                 ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:424:22: warning: array index 3 is past the end of the array (which contains 1 element) [-Warray-bounds]
           case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
                               ^        ~
   arch/s390/include/asm/signal.h:22:9: note: array 'sig' declared here
           unsigned long sig[_NSIG_WORDS];
           ^
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:424:10: warning: array index 7 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
                   ^     ~
   include/linux/compat.h:131:2: note: array 'sig' declared here
           compat_sigset_word      sig[_COMPAT_NSIG_WORDS];
           ^
   include/linux/compat.h:424:42: warning: array index 6 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
                                                   ^     ~
   include/linux/compat.h:131:2: note: array 'sig' declared here
           compat_sigset_word      sig[_COMPAT_NSIG_WORDS];
           ^
   include/linux/compat.h:424:53: warning: array index 3 is past the end of the array (which contains 1 element) [-Warray-bounds]
           case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
                                                              ^        ~
   arch/s390/include/asm/signal.h:22:9: note: array 'sig' declared here
           unsigned long sig[_NSIG_WORDS];
           ^
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:426:22: warning: array index 2 is past the end of the array (which contains 1 element) [-Warray-bounds]
           case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
                               ^        ~
   arch/s390/include/asm/signal.h:22:9: note: array 'sig' declared here
           unsigned long sig[_NSIG_WORDS];
           ^
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:426:10: warning: array index 5 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
                   ^     ~
   include/linux/compat.h:131:2: note: array 'sig' declared here
           compat_sigset_word      sig[_COMPAT_NSIG_WORDS];
           ^
   include/linux/compat.h:426:42: warning: array index 4 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
                                                   ^     ~
   include/linux/compat.h:131:2: note: array 'sig' declared here
           compat_sigset_word      sig[_COMPAT_NSIG_WORDS];
           ^
   include/linux/compat.h:426:53: warning: array index 2 is past the end of the array (which contains 1 element) [-Warray-bounds]
           case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
                                                              ^        ~
   arch/s390/include/asm/signal.h:22:9: note: array 'sig' declared here
           unsigned long sig[_NSIG_WORDS];
           ^
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:428:22: warning: array index 1 is past the end of the array (which contains 1 element) [-Warray-bounds]
           case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
                               ^        ~
   arch/s390/include/asm/signal.h:22:9: note: array 'sig' declared here
           unsigned long sig[_NSIG_WORDS];
           ^
   In file included from mm/damon/vaddr.c:763:
   In file included from mm/damon/vaddr-test.h:15:
   In file included from include/kunit/test.h:22:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:160:
   include/linux/compat.h:428:10: warning: array index 3 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
                   ^     ~
   include/linux/compat.h:131:2: note: array 'sig' declared here
           compat_sigset_word      sig[_COMPAT_NSIG_WORDS];
           ^
   include/linux/compat.h:428:42: warning: array index 2 is past the end of the array (which contains 2 elements) [-Warray-bounds]
           case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];


vim +/huge_ptep_get_access_flags +402 mm/damon/vaddr.c

   396	
   397	#ifdef CONFIG_HUGETLB_PAGE
   398	static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
   399					struct vm_area_struct *vma, unsigned long addr)
   400	{
   401		bool referenced = false;
 > 402		pte_t entry = huge_ptep_get_access_flags(pte,
   403						huge_page_size(hstate_vma(vma)));
   404		struct page *page = pte_page(entry);
   405	
   406		get_page(page);
   407	
   408		if (pte_young(entry)) {
   409			referenced = true;
   410			entry = pte_mkold(entry);
   411			huge_ptep_set_access_flags(vma, addr, pte, entry,
   412						   vma->vm_flags & VM_WRITE);
   413		}
   414	
   415	#ifdef CONFIG_MMU_NOTIFIER
   416		if (mmu_notifier_clear_young(mm, addr,
   417					     addr + huge_page_size(hstate_vma(vma))))
   418			referenced = true;
   419	#endif /* CONFIG_MMU_NOTIFIER */
   420	
   421		if (referenced)
   422			set_page_young(page);
   423	
   424		set_page_idle(page);
   425		put_page(page);
   426	}
   427	
   428	static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask,
   429					     unsigned long addr, unsigned long end,
   430					     struct mm_walk *walk)
   431	{
   432		struct hstate *h = hstate_vma(walk->vma);
   433		spinlock_t *ptl;
   434		pte_t entry;
   435	
   436		ptl = huge_pte_lock(h, walk->mm, pte);
   437		entry = huge_ptep_get(pte);
   438		if (!pte_present(entry))
   439			goto out;
   440	
   441		damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr);
   442	
   443	out:
   444		spin_unlock(ptl);
   445		return 0;
   446	}
   447	#else
   448	#define damon_mkold_hugetlb_entry NULL
   449	#endif /* CONFIG_HUGETLB_PAGE */
   450	
   451	static const struct mm_walk_ops damon_mkold_ops = {
   452		.pmd_entry = damon_mkold_pmd_entry,
   453		.hugetlb_entry = damon_mkold_hugetlb_entry,
   454	};
   455	
   456	static void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
   457	{
   458		mmap_read_lock(mm);
   459		walk_page_range(mm, addr, addr + 1, &damon_mkold_ops, NULL);
   460		mmap_read_unlock(mm);
   461	}
   462	
   463	/*
   464	 * Functions for the access checking of the regions
   465	 */
   466	
   467	static void __damon_va_prepare_access_check(struct damon_ctx *ctx,
   468				struct mm_struct *mm, struct damon_region *r)
   469	{
   470		r->sampling_addr = damon_rand(r->ar.start, r->ar.end);
   471	
   472		damon_va_mkold(mm, r->sampling_addr);
   473	}
   474	
   475	static void damon_va_prepare_access_checks(struct damon_ctx *ctx)
   476	{
   477		struct damon_target *t;
   478		struct mm_struct *mm;
   479		struct damon_region *r;
   480	
   481		damon_for_each_target(t, ctx) {
   482			mm = damon_get_mm(t);
   483			if (!mm)
   484				continue;
   485			damon_for_each_region(r, t)
   486				__damon_va_prepare_access_check(ctx, mm, r);
   487			mmput(mm);
   488		}
   489	}
   490	
   491	struct damon_young_walk_private {
   492		unsigned long *page_sz;
   493		bool young;
   494	};
   495	
   496	static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr,
   497			unsigned long next, struct mm_walk *walk)
   498	{
   499		pte_t *pte;
   500		spinlock_t *ptl;
   501		struct page *page;
   502		struct damon_young_walk_private *priv = walk->private;
   503	
   504	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
   505		if (pmd_huge(*pmd)) {
   506			ptl = pmd_lock(walk->mm, pmd);
   507			if (!pmd_huge(*pmd)) {
   508				spin_unlock(ptl);
   509				goto regular_page;
   510			}
   511			page = damon_get_page(pmd_pfn(*pmd));
   512			if (!page)
   513				goto huge_out;
   514			if (pmd_young(*pmd) || !page_is_idle(page) ||
   515						mmu_notifier_test_young(walk->mm,
   516							addr)) {
   517				*priv->page_sz = ((1UL) << HPAGE_PMD_SHIFT);
   518				priv->young = true;
   519			}
   520			put_page(page);
   521	huge_out:
   522			spin_unlock(ptl);
   523			return 0;
   524		}
   525	
   526	regular_page:
   527	#endif	/* CONFIG_TRANSPARENT_HUGEPAGE */
   528	
   529		if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
   530			return -EINVAL;
   531		pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
   532		if (!pte_present(*pte))
   533			goto out;
   534		page = damon_get_page(pte_pfn(*pte));
   535		if (!page)
   536			goto out;
   537		if (pte_young(*pte) || !page_is_idle(page) ||
   538				mmu_notifier_test_young(walk->mm, addr)) {
   539			*priv->page_sz = PAGE_SIZE;
   540			priv->young = true;
   541		}
   542		put_page(page);
   543	out:
   544		pte_unmap_unlock(pte, ptl);
   545		return 0;
   546	}
   547	
   548	#ifdef CONFIG_HUGETLB_PAGE
   549	static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
   550					     unsigned long addr, unsigned long end,
   551					     struct mm_walk *walk)
   552	{
   553		struct damon_young_walk_private *priv = walk->private;
   554		struct hstate *h = hstate_vma(walk->vma);
   555		struct page *page;
   556		spinlock_t *ptl;
   557		pte_t entry;
   558	
   559		ptl = huge_pte_lock(h, walk->mm, pte);
 > 560		entry = huge_ptep_get_access_flags(pte, huge_page_size(h));
   561		if (!pte_present(entry))
   562			goto out;
   563	
   564		page = pte_page(entry);
   565		get_page(page);
   566	
   567		if (pte_young(entry) || !page_is_idle(page) ||
   568		    mmu_notifier_test_young(walk->mm, addr)) {
   569			*priv->page_sz = huge_page_size(h);
   570			priv->young = true;
   571		}
   572	
   573		put_page(page);
   574	
   575	out:
   576		spin_unlock(ptl);
   577		return 0;
   578	}
   579	#else
   580	#define damon_young_hugetlb_entry NULL
   581	#endif /* CONFIG_HUGETLB_PAGE */
   582	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  2022-05-08  8:58   ` Baolin Wang
@ 2022-05-08 13:14     ` nh26223
  -1 siblings, 0 replies; 27+ messages in thread
From: nh26223 @ 2022-05-08 13:14 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj, Baolin Wang
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

On 2022年5月8日星期日 CST 下午4:58:52 Baolin Wang wrote:
> Now we use huge_ptep_get() to get the pte value of a hugetlb page,
> however it will only return one specific pte value for the CONT-PTE
> or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
> continuous pte or pmd entries with same page table attributes. And it
> will not take into account the subpages' dirty or young bits of a
> CONT-PTE/PMD size hugetlb page.
> 
> So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
> which already takes account the dirty or young bits for any subpages
> in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
> young flags statistics for hugetlb pages with current huge_ptep_get(),
> such as the gather_hugetlb_stats() function.
> 
> Thus introduce a new huge_ptep_get_access_flags() interface and define
> an ARM64 specific implementation, that will take into account any subpages'
> dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
> that want to check the dirty and young flags of a hugetlb page.
> 
> [1]
> https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracl
> e.com/
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  arch/arm64/include/asm/hugetlb.h |  2 ++
>  arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>  include/asm-generic/hugetlb.h    |  7 +++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/hugetlb.h
> b/arch/arm64/include/asm/hugetlb.h index 616b2ca..a473544 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct
> *vma, #define __HAVE_ARCH_HUGE_PTE_CLEAR
>  extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>  			   pte_t *ptep, unsigned long sz);
> +#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> +extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
>  extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>  				 pte_t *ptep, pte_t pte, unsigned long 
sz);
>  #define set_huge_swap_pte_at set_huge_swap_pte_at
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index ca8e65c..ce39699 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size,
> size_t *pgsize) return contig_ptes;
>  }
> 
> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
The function name looks to me that it returns access flags of PTE.

> +{
> +	int ncontig, i;
> +	size_t pgsize;
> +	pte_t orig_pte = ptep_get(ptep);
> +
> +	if (!pte_cont(orig_pte))
> +		return orig_pte;
> +
> +	ncontig = num_contig_ptes(sz, &pgsize);
> +
> +	for (i = 0; i < ncontig; i++, ptep++) {
> +		pte_t pte = ptep_get(ptep);
> +
> +		if (pte_dirty(pte))
> +			orig_pte = pte_mkdirty(orig_pte);
> +
> +		if (pte_young(pte))
> +			orig_pte = pte_mkyoung(orig_pte);
> +	}
> +
> +	return orig_pte;
> +}
Not sure whether it's worthy being changed to:

        bool dirty = false, young = false;

        for (i = 0; i < ncontig; i++, ptep++) {
                pte_t pte = ptep_get(ptep);

                if (pte_dirty(pte))
                        dirty = true;

                if (pte_young(pte))
                        young = true;

                if (dirty && young)
                        break;
        }

        if (dirty)
                orig_pte = pte_mkdirty(orig_pte);

        if (young)
                orig_pte = pte_mkyoung(orit_pte);

        return orig_pte;

> +
>  /*
>   * Changing some bits of contiguous entries requires us to follow a
>   * Break-Before-Make approach, breaking the whole contiguous set
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index a57d667..bb77fb0 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>  }
>  #endif
> 
> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long
> sz) +{
> +	return ptep_get(ptep);
Should be:
	return huge_ptep_get(ptep) ?


Regards
Yin, Fengwei

> +}
> +#endif
> +
>  #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED
>  static inline bool gigantic_page_runtime_supported(void)
>  {






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08 13:14     ` nh26223
  0 siblings, 0 replies; 27+ messages in thread
From: nh26223 @ 2022-05-08 13:14 UTC (permalink / raw)
  To: catalin.marinas, will, arnd, mike.kravetz, akpm, sj, Baolin Wang
  Cc: baolin.wang, linux-arm-kernel, linux-kernel, linux-arch,
	linux-fsdevel, linux-mm

On 2022年5月8日星期日 CST 下午4:58:52 Baolin Wang wrote:
> Now we use huge_ptep_get() to get the pte value of a hugetlb page,
> however it will only return one specific pte value for the CONT-PTE
> or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
> continuous pte or pmd entries with same page table attributes. And it
> will not take into account the subpages' dirty or young bits of a
> CONT-PTE/PMD size hugetlb page.
> 
> So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
> which already takes account the dirty or young bits for any subpages
> in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
> young flags statistics for hugetlb pages with current huge_ptep_get(),
> such as the gather_hugetlb_stats() function.
> 
> Thus introduce a new huge_ptep_get_access_flags() interface and define
> an ARM64 specific implementation, that will take into account any subpages'
> dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
> that want to check the dirty and young flags of a hugetlb page.
> 
> [1]
> https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracl
> e.com/
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  arch/arm64/include/asm/hugetlb.h |  2 ++
>  arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>  include/asm-generic/hugetlb.h    |  7 +++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/hugetlb.h
> b/arch/arm64/include/asm/hugetlb.h index 616b2ca..a473544 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct
> *vma, #define __HAVE_ARCH_HUGE_PTE_CLEAR
>  extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>  			   pte_t *ptep, unsigned long sz);
> +#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> +extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
>  extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>  				 pte_t *ptep, pte_t pte, unsigned long 
sz);
>  #define set_huge_swap_pte_at set_huge_swap_pte_at
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index ca8e65c..ce39699 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size,
> size_t *pgsize) return contig_ptes;
>  }
> 
> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
The function name looks to me that it returns access flags of PTE.

> +{
> +	int ncontig, i;
> +	size_t pgsize;
> +	pte_t orig_pte = ptep_get(ptep);
> +
> +	if (!pte_cont(orig_pte))
> +		return orig_pte;
> +
> +	ncontig = num_contig_ptes(sz, &pgsize);
> +
> +	for (i = 0; i < ncontig; i++, ptep++) {
> +		pte_t pte = ptep_get(ptep);
> +
> +		if (pte_dirty(pte))
> +			orig_pte = pte_mkdirty(orig_pte);
> +
> +		if (pte_young(pte))
> +			orig_pte = pte_mkyoung(orig_pte);
> +	}
> +
> +	return orig_pte;
> +}
Not sure whether it's worthy being changed to:

        bool dirty = false, young = false;

        for (i = 0; i < ncontig; i++, ptep++) {
                pte_t pte = ptep_get(ptep);

                if (pte_dirty(pte))
                        dirty = true;

                if (pte_young(pte))
                        young = true;

                if (dirty && young)
                        break;
        }

        if (dirty)
                orig_pte = pte_mkdirty(orig_pte);

        if (young)
                orig_pte = pte_mkyoung(orit_pte);

        return orig_pte;

> +
>  /*
>   * Changing some bits of contiguous entries requires us to follow a
>   * Break-Before-Make approach, breaking the whole contiguous set
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index a57d667..bb77fb0 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>  }
>  #endif
> 
> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long
> sz) +{
> +	return ptep_get(ptep);
Should be:
	return huge_ptep_get(ptep) ?


Regards
Yin, Fengwei

> +}
> +#endif
> +
>  #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED
>  static inline bool gigantic_page_runtime_supported(void)
>  {





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
  2022-05-08  8:58 ` Baolin Wang
@ 2022-05-08 15:26   ` Muchun Song
  -1 siblings, 0 replies; 27+ messages in thread
From: Muchun Song @ 2022-05-08 15:26 UTC (permalink / raw)
  To: Baolin Wang
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> Hi,
> 
> As Mike pointed out [1], the huge_ptep_get() will only return one specific
> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> size hugetlb page for those functions that want to check the dirty or
> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> will also get inaccurate access statistics.
> 
> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> implementation, which will take into account any subpages' dirty or young bits.

IIUC, we could get the page size by page_size(pte_page(pte)).
So, how about the following implementation of huge_ptep_get()?
Does this work for you?

pte_t huge_ptep_get(pte_t *ptep)
{
	int ncontig, i;
	size_t pgsize;
	pte_t orig_pte = ptep_get(ptep);

	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
		return orig_pte;

	ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);

	for (i = 0; i < ncontig; i++, ptep++) {
		pte_t pte = ptep_get(ptep);

		if (pte_dirty(pte))
			orig_pte = pte_mkdirty(orig_pte);

		if (pte_young(pte))
			orig_pte = pte_mkyoung(orig_pte);
	}

	return orig_pte;
}

> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> should convert all the places using huge_ptep_get(), meanwhile most places using
> huge_ptep_get() did not care about the dirty or young flags at all.
> 
> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> functions that care about the dirty or young flags of a hugetlb page.
> 
> [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
> 
> Baolin Wang (3):
>   arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
>   fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
>   mm/damon/vaddr: Change to use huge_ptep_get_access_flags()
> 
>  arch/arm64/include/asm/hugetlb.h |  2 ++
>  arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>  fs/proc/task_mmu.c               |  3 ++-
>  include/asm-generic/hugetlb.h    |  7 +++++++
>  mm/damon/vaddr.c                 |  5 +++--
>  5 files changed, 38 insertions(+), 3 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08 15:26   ` Muchun Song
  0 siblings, 0 replies; 27+ messages in thread
From: Muchun Song @ 2022-05-08 15:26 UTC (permalink / raw)
  To: Baolin Wang
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> Hi,
> 
> As Mike pointed out [1], the huge_ptep_get() will only return one specific
> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> size hugetlb page for those functions that want to check the dirty or
> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> will also get inaccurate access statistics.
> 
> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> implementation, which will take into account any subpages' dirty or young bits.

IIUC, we could get the page size by page_size(pte_page(pte)).
So, how about the following implementation of huge_ptep_get()?
Does this work for you?

pte_t huge_ptep_get(pte_t *ptep)
{
	int ncontig, i;
	size_t pgsize;
	pte_t orig_pte = ptep_get(ptep);

	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
		return orig_pte;

	ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);

	for (i = 0; i < ncontig; i++, ptep++) {
		pte_t pte = ptep_get(ptep);

		if (pte_dirty(pte))
			orig_pte = pte_mkdirty(orig_pte);

		if (pte_young(pte))
			orig_pte = pte_mkyoung(orig_pte);
	}

	return orig_pte;
}

> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> should convert all the places using huge_ptep_get(), meanwhile most places using
> huge_ptep_get() did not care about the dirty or young flags at all.
> 
> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> functions that care about the dirty or young flags of a hugetlb page.
> 
> [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
> 
> Baolin Wang (3):
>   arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
>   fs/proc/task_mmu: Change to use huge_ptep_get_access_flags()
>   mm/damon/vaddr: Change to use huge_ptep_get_access_flags()
> 
>  arch/arm64/include/asm/hugetlb.h |  2 ++
>  arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>  fs/proc/task_mmu.c               |  3 ++-
>  include/asm-generic/hugetlb.h    |  7 +++++++
>  mm/damon/vaddr.c                 |  5 +++--
>  5 files changed, 38 insertions(+), 3 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
  2022-05-08  8:58 ` Baolin Wang
@ 2022-05-08 17:08   ` Matthew Wilcox
  -1 siblings, 0 replies; 27+ messages in thread
From: Matthew Wilcox @ 2022-05-08 17:08 UTC (permalink / raw)
  To: Baolin Wang
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> As Mike pointed out [1], the huge_ptep_get() will only return one specific
> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> size hugetlb page for those functions that want to check the dirty or
> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> will also get inaccurate access statistics.
> 
> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> implementation, which will take into account any subpages' dirty or young bits.
> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> should convert all the places using huge_ptep_get(), meanwhile most places using
> huge_ptep_get() did not care about the dirty or young flags at all.
> 
> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> functions that care about the dirty or young flags of a hugetlb page.

I question whether this is the right approach.  I understand that
different hardware implementations have different requirements here,
but at least one that I'm aware of (AMD Zen 2/3) requires that all
PTEs that are part of a contig PTE must have identical A/D bits.  Now,
you could say that's irrelevant because it's x86 and we don't currently
support contPTE on x86, but I wouldn't be surprised to see that other
hardware has the same requirement.

So what if we make that a Linux requirement?  Setting a contPTE dirty or
accessed becomes a bit more expensive (although still one/two cachelines,
so not really much more expensive than a single write).  Then there's no
need to change the "get" side of things because they're always identical.

It does mean that we can't take advantage of hardware setting A/D bits,
unless hardware can be persuaded to behave this way.  I don't have any
ARM specs in front of me to check.

I don't have a hard objection to your approach, I just want to discuss
other possibilities.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-08 17:08   ` Matthew Wilcox
  0 siblings, 0 replies; 27+ messages in thread
From: Matthew Wilcox @ 2022-05-08 17:08 UTC (permalink / raw)
  To: Baolin Wang
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> As Mike pointed out [1], the huge_ptep_get() will only return one specific
> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> size hugetlb page for those functions that want to check the dirty or
> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> will also get inaccurate access statistics.
> 
> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> implementation, which will take into account any subpages' dirty or young bits.
> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> should convert all the places using huge_ptep_get(), meanwhile most places using
> huge_ptep_get() did not care about the dirty or young flags at all.
> 
> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> functions that care about the dirty or young flags of a hugetlb page.

I question whether this is the right approach.  I understand that
different hardware implementations have different requirements here,
but at least one that I'm aware of (AMD Zen 2/3) requires that all
PTEs that are part of a contig PTE must have identical A/D bits.  Now,
you could say that's irrelevant because it's x86 and we don't currently
support contPTE on x86, but I wouldn't be surprised to see that other
hardware has the same requirement.

So what if we make that a Linux requirement?  Setting a contPTE dirty or
accessed becomes a bit more expensive (although still one/two cachelines,
so not really much more expensive than a single write).  Then there's no
need to change the "get" side of things because they're always identical.

It does mean that we can't take advantage of hardware setting A/D bits,
unless hardware can be persuaded to behave this way.  I don't have any
ARM specs in front of me to check.

I don't have a hard objection to your approach, I just want to discuss
other possibilities.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  2022-05-08 13:14     ` nh26223
@ 2022-05-09  1:19       ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:19 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel, linux-mm



On 5/8/2022 9:14 PM, nh26223@qq.com wrote:
> On 2022年5月8日星期日 CST 下午4:58:52 Baolin Wang wrote:
>> Now we use huge_ptep_get() to get the pte value of a hugetlb page,
>> however it will only return one specific pte value for the CONT-PTE
>> or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
>> continuous pte or pmd entries with same page table attributes. And it
>> will not take into account the subpages' dirty or young bits of a
>> CONT-PTE/PMD size hugetlb page.
>>
>> So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
>> which already takes account the dirty or young bits for any subpages
>> in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
>> young flags statistics for hugetlb pages with current huge_ptep_get(),
>> such as the gather_hugetlb_stats() function.
>>
>> Thus introduce a new huge_ptep_get_access_flags() interface and define
>> an ARM64 specific implementation, that will take into account any subpages'
>> dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
>> that want to check the dirty and young flags of a hugetlb page.
>>
>> [1]
>> https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracl
>> e.com/
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   arch/arm64/include/asm/hugetlb.h |  2 ++
>>   arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>>   include/asm-generic/hugetlb.h    |  7 +++++++
>>   3 files changed, 33 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/hugetlb.h
>> b/arch/arm64/include/asm/hugetlb.h index 616b2ca..a473544 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct
>> *vma, #define __HAVE_ARCH_HUGE_PTE_CLEAR
>>   extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>>   			   pte_t *ptep, unsigned long sz);
>> +#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>> +extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
>>   extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>>   				 pte_t *ptep, pte_t pte, unsigned long
> sz);
>>   #define set_huge_swap_pte_at set_huge_swap_pte_at
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index ca8e65c..ce39699 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size,
>> size_t *pgsize) return contig_ptes;
>>   }
>>
>> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
> The function name looks to me that it returns access flags of PTE.

Yes, not a good name. That's why this is a RFC patch set to get more 
suggestion :)

Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?

> 
>> +{
>> +	int ncontig, i;
>> +	size_t pgsize;
>> +	pte_t orig_pte = ptep_get(ptep);
>> +
>> +	if (!pte_cont(orig_pte))
>> +		return orig_pte;
>> +
>> +	ncontig = num_contig_ptes(sz, &pgsize);
>> +
>> +	for (i = 0; i < ncontig; i++, ptep++) {
>> +		pte_t pte = ptep_get(ptep);
>> +
>> +		if (pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +
>> +		if (pte_young(pte))
>> +			orig_pte = pte_mkyoung(orig_pte);
>> +	}
>> +
>> +	return orig_pte;
>> +}
> Not sure whether it's worthy being changed to:
> 
>          bool dirty = false, young = false;
> 
>          for (i = 0; i < ncontig; i++, ptep++) {
>                  pte_t pte = ptep_get(ptep);
> 
>                  if (pte_dirty(pte))
>                          dirty = true;
> 
>                  if (pte_young(pte))
>                          young = true;
> 
>                  if (dirty && young)
>                          break;
>          }
> 
>          if (dirty)
>                  orig_pte = pte_mkdirty(orig_pte);
> 
>          if (young)
>                  orig_pte = pte_mkyoung(orit_pte);
> 
>          return orig_pte;

I followed the same logics in get_clear_flush(), which is more readable 
I think. Yes, your approach can save some cycles, I can change to use it 
in next version if arm64 maintainers have no objection.

>> +
>>   /*
>>    * Changing some bits of contiguous entries requires us to follow a
>>    * Break-Before-Make approach, breaking the whole contiguous set
>> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
>> index a57d667..bb77fb0 100644
>> --- a/include/asm-generic/hugetlb.h
>> +++ b/include/asm-generic/hugetlb.h
>> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>>   }
>>   #endif
>>
>> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long
>> sz) +{
>> +	return ptep_get(ptep);
> Should be:
> 	return huge_ptep_get(ptep) ?

I don't think so. If no ARCH-specific definition, the 
huge_ptep_get_access_flags() implementation should be same as 
huge_ptep_get(). Thanks for your comments.

#ifndef __HAVE_ARCH_HUGE_PTEP_GET
static inline pte_t huge_ptep_get(pte_t *ptep)
{
         return ptep_get(ptep);
}
#endif

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  1:19       ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:19 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel, linux-mm



On 5/8/2022 9:14 PM, nh26223@qq.com wrote:
> On 2022年5月8日星期日 CST 下午4:58:52 Baolin Wang wrote:
>> Now we use huge_ptep_get() to get the pte value of a hugetlb page,
>> however it will only return one specific pte value for the CONT-PTE
>> or CONT-PMD size hugetlb on ARM64 system, which can contain seravel
>> continuous pte or pmd entries with same page table attributes. And it
>> will not take into account the subpages' dirty or young bits of a
>> CONT-PTE/PMD size hugetlb page.
>>
>> So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
>> which already takes account the dirty or young bits for any subpages
>> in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
>> young flags statistics for hugetlb pages with current huge_ptep_get(),
>> such as the gather_hugetlb_stats() function.
>>
>> Thus introduce a new huge_ptep_get_access_flags() interface and define
>> an ARM64 specific implementation, that will take into account any subpages'
>> dirty or young bits for CONT-PTE/PMD size hugetlb page, for those functions
>> that want to check the dirty and young flags of a hugetlb page.
>>
>> [1]
>> https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracl
>> e.com/
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   arch/arm64/include/asm/hugetlb.h |  2 ++
>>   arch/arm64/mm/hugetlbpage.c      | 24 ++++++++++++++++++++++++
>>   include/asm-generic/hugetlb.h    |  7 +++++++
>>   3 files changed, 33 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/hugetlb.h
>> b/arch/arm64/include/asm/hugetlb.h index 616b2ca..a473544 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -44,6 +44,8 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct
>> *vma, #define __HAVE_ARCH_HUGE_PTE_CLEAR
>>   extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>>   			   pte_t *ptep, unsigned long sz);
>> +#define __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>> +extern pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz);
>>   extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
>>   				 pte_t *ptep, pte_t pte, unsigned long
> sz);
>>   #define set_huge_swap_pte_at set_huge_swap_pte_at
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index ca8e65c..ce39699 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long size,
>> size_t *pgsize) return contig_ptes;
>>   }
>>
>> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
> The function name looks to me that it returns access flags of PTE.

Yes, not a good name. That's why this is a RFC patch set to get more 
suggestion :)

Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?

> 
>> +{
>> +	int ncontig, i;
>> +	size_t pgsize;
>> +	pte_t orig_pte = ptep_get(ptep);
>> +
>> +	if (!pte_cont(orig_pte))
>> +		return orig_pte;
>> +
>> +	ncontig = num_contig_ptes(sz, &pgsize);
>> +
>> +	for (i = 0; i < ncontig; i++, ptep++) {
>> +		pte_t pte = ptep_get(ptep);
>> +
>> +		if (pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +
>> +		if (pte_young(pte))
>> +			orig_pte = pte_mkyoung(orig_pte);
>> +	}
>> +
>> +	return orig_pte;
>> +}
> Not sure whether it's worthy being changed to:
> 
>          bool dirty = false, young = false;
> 
>          for (i = 0; i < ncontig; i++, ptep++) {
>                  pte_t pte = ptep_get(ptep);
> 
>                  if (pte_dirty(pte))
>                          dirty = true;
> 
>                  if (pte_young(pte))
>                          young = true;
> 
>                  if (dirty && young)
>                          break;
>          }
> 
>          if (dirty)
>                  orig_pte = pte_mkdirty(orig_pte);
> 
>          if (young)
>                  orig_pte = pte_mkyoung(orit_pte);
> 
>          return orig_pte;

I followed the same logics in get_clear_flush(), which is more readable 
I think. Yes, your approach can save some cycles, I can change to use it 
in next version if arm64 maintainers have no objection.

>> +
>>   /*
>>    * Changing some bits of contiguous entries requires us to follow a
>>    * Break-Before-Make approach, breaking the whole contiguous set
>> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
>> index a57d667..bb77fb0 100644
>> --- a/include/asm-generic/hugetlb.h
>> +++ b/include/asm-generic/hugetlb.h
>> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>>   }
>>   #endif
>>
>> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long
>> sz) +{
>> +	return ptep_get(ptep);
> Should be:
> 	return huge_ptep_get(ptep) ?

I don't think so. If no ARCH-specific definition, the 
huge_ptep_get_access_flags() implementation should be same as 
huge_ptep_get(). Thanks for your comments.

#ifndef __HAVE_ARCH_HUGE_PTEP_GET
static inline pte_t huge_ptep_get(pte_t *ptep)
{
         return ptep_get(ptep);
}
#endif

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
  2022-05-08 15:26   ` Muchun Song
@ 2022-05-09  1:34     ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm



On 5/8/2022 11:26 PM, Muchun Song wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
>> Hi,
>>
>> As Mike pointed out [1], the huge_ptep_get() will only return one specific
>> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
>> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
>> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
>> size hugetlb page for those functions that want to check the dirty or
>> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
>> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
>> will also get inaccurate access statistics.
>>
>> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
>> implementation, which will take into account any subpages' dirty or young bits.
> 
> IIUC, we could get the page size by page_size(pte_page(pte)).
> So, how about the following implementation of huge_ptep_get()?
> Does this work for you?
> 
> pte_t huge_ptep_get(pte_t *ptep)
> {
> 	int ncontig, i;
> 	size_t pgsize;
> 	pte_t orig_pte = ptep_get(ptep);
> 
> 	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
> 		return orig_pte;
> 
> 	ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);
> 
> 	for (i = 0; i < ncontig; i++, ptep++) {
> 		pte_t pte = ptep_get(ptep);
> 
> 		if (pte_dirty(pte))
> 			orig_pte = pte_mkdirty(orig_pte);
> 
> 		if (pte_young(pte))
> 			orig_pte = pte_mkyoung(orig_pte);
> 	}
> 
> 	return orig_pte;
> }

Thanks for your suggestion, and I think this works for me and looks more 
straight forward in case some functions using huge_ptep_get() will care 
about the young or dirty bits in future.

My only concern is that all the functions using huge_ptep_get() will set 
a contPTE dirty or accessed bit, however most functions do not care 
about the dirty and accessed bit, which becomes a bit more expensive for 
them? Also mentioned by Matthew in his comments. Anyway, I still think 
your suggestion is straight forward and I can change in next version if 
no other objections.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  1:34     ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm



On 5/8/2022 11:26 PM, Muchun Song wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
>> Hi,
>>
>> As Mike pointed out [1], the huge_ptep_get() will only return one specific
>> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
>> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
>> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
>> size hugetlb page for those functions that want to check the dirty or
>> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
>> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
>> will also get inaccurate access statistics.
>>
>> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
>> implementation, which will take into account any subpages' dirty or young bits.
> 
> IIUC, we could get the page size by page_size(pte_page(pte)).
> So, how about the following implementation of huge_ptep_get()?
> Does this work for you?
> 
> pte_t huge_ptep_get(pte_t *ptep)
> {
> 	int ncontig, i;
> 	size_t pgsize;
> 	pte_t orig_pte = ptep_get(ptep);
> 
> 	if (!pte_present(orig_pte) || !pte_cont(orig_pte))
> 		return orig_pte;
> 
> 	ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);
> 
> 	for (i = 0; i < ncontig; i++, ptep++) {
> 		pte_t pte = ptep_get(ptep);
> 
> 		if (pte_dirty(pte))
> 			orig_pte = pte_mkdirty(orig_pte);
> 
> 		if (pte_young(pte))
> 			orig_pte = pte_mkyoung(orig_pte);
> 	}
> 
> 	return orig_pte;
> }

Thanks for your suggestion, and I think this works for me and looks more 
straight forward in case some functions using huge_ptep_get() will care 
about the young or dirty bits in future.

My only concern is that all the functions using huge_ptep_get() will set 
a contPTE dirty or accessed bit, however most functions do not care 
about the dirty and accessed bit, which becomes a bit more expensive for 
them? Also mentioned by Matthew in his comments. Anyway, I still think 
your suggestion is straight forward and I can change in next version if 
no other objections.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
  2022-05-08 17:08   ` Matthew Wilcox
@ 2022-05-09  1:53     ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm



On 5/9/2022 1:08 AM, Matthew Wilcox wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
>> As Mike pointed out [1], the huge_ptep_get() will only return one specific
>> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
>> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
>> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
>> size hugetlb page for those functions that want to check the dirty or
>> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
>> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
>> will also get inaccurate access statistics.
>>
>> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
>> implementation, which will take into account any subpages' dirty or young bits.
>> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
>> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
>> should convert all the places using huge_ptep_get(), meanwhile most places using
>> huge_ptep_get() did not care about the dirty or young flags at all.
>>
>> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
>> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
>> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
>> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
>> functions that care about the dirty or young flags of a hugetlb page.
> 
> I question whether this is the right approach.  I understand that
> different hardware implementations have different requirements here,
> but at least one that I'm aware of (AMD Zen 2/3) requires that all
> PTEs that are part of a contig PTE must have identical A/D bits.  Now,
> you could say that's irrelevant because it's x86 and we don't currently
> support contPTE on x86, but I wouldn't be surprised to see that other
> hardware has the same requirement.

Yes, so on x86, we can use the default huge_ptep_get(). But for ARM64, 
unfortunately the A/D bits of a contig PTE is independent, that's why we 
want a ARM64 specific huge_ptep_get().

> So what if we make that a Linux requirement?  Setting a contPTE dirty or
> accessed becomes a bit more expensive (although still one/two cachelines,
> so not really much more expensive than a single write).  Then there's no
> need to change the "get" side of things because they're always identical.
> 
> It does mean that we can't take advantage of hardware setting A/D bits,
> unless hardware can be persuaded to behave this way.  I don't have any
> ARM specs in front of me to check.

I hope the hardware can make sure the contPTE are always identical, 
however in fact like I said the A/D bits setting of a contig PTE by 
hardware is independent in a contig-PTE size hugetlb page, they are not 
always identical.

 From my testing, if I monitored a contig-PTE size hugetlb page with 
DAMON, and I only modified the subpages of the contig-PTE size hugetlb 
page. The result is I can not monitor any accesses, but actually there are.

So I think an ARM64 specific huge_ptep_get() implementation seems the 
right way as Muchun suggested?

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  1:53     ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  1:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm



On 5/9/2022 1:08 AM, Matthew Wilcox wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
>> As Mike pointed out [1], the huge_ptep_get() will only return one specific
>> pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
>> will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
>> size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
>> size hugetlb page for those functions that want to check the dirty or
>> young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
>> get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
>> will also get inaccurate access statistics.
>>
>> To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
>> implementation, which will take into account any subpages' dirty or young bits.
>> However we should add a new parameter for ARM64 specific huge_ptep_get() to check
>> how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
>> should convert all the places using huge_ptep_get(), meanwhile most places using
>> huge_ptep_get() did not care about the dirty or young flags at all.
>>
>> So instead of changing the prototype of huge_ptep_get(), this patch set introduces
>> a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
>> that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
>> hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
>> functions that care about the dirty or young flags of a hugetlb page.
> 
> I question whether this is the right approach.  I understand that
> different hardware implementations have different requirements here,
> but at least one that I'm aware of (AMD Zen 2/3) requires that all
> PTEs that are part of a contig PTE must have identical A/D bits.  Now,
> you could say that's irrelevant because it's x86 and we don't currently
> support contPTE on x86, but I wouldn't be surprised to see that other
> hardware has the same requirement.

Yes, so on x86, we can use the default huge_ptep_get(). But for ARM64, 
unfortunately the A/D bits of a contig PTE is independent, that's why we 
want a ARM64 specific huge_ptep_get().

> So what if we make that a Linux requirement?  Setting a contPTE dirty or
> accessed becomes a bit more expensive (although still one/two cachelines,
> so not really much more expensive than a single write).  Then there's no
> need to change the "get" side of things because they're always identical.
> 
> It does mean that we can't take advantage of hardware setting A/D bits,
> unless hardware can be persuaded to behave this way.  I don't have any
> ARM specs in front of me to check.

I hope the hardware can make sure the contPTE are always identical, 
however in fact like I said the A/D bits setting of a contig PTE by 
hardware is independent in a contig-PTE size hugetlb page, they are not 
always identical.

 From my testing, if I monitored a contig-PTE size hugetlb page with 
DAMON, and I only modified the subpages of the contig-PTE size hugetlb 
page. The result is I can not monitor any accesses, but actually there are.

So I think an ARM64 specific huge_ptep_get() implementation seems the 
right way as Muchun suggested?

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
  2022-05-08 17:08   ` Matthew Wilcox
@ 2022-05-09  2:54     ` Muchun Song
  -1 siblings, 0 replies; 27+ messages in thread
From: Muchun Song @ 2022-05-09  2:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Baolin Wang, catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 06:08:18PM +0100, Matthew Wilcox wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> > As Mike pointed out [1], the huge_ptep_get() will only return one specific
> > pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> > will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> > size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> > size hugetlb page for those functions that want to check the dirty or
> > young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> > get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> > will also get inaccurate access statistics.
> > 
> > To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> > implementation, which will take into account any subpages' dirty or young bits.
> > However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> > how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> > should convert all the places using huge_ptep_get(), meanwhile most places using
> > huge_ptep_get() did not care about the dirty or young flags at all.
> > 
> > So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> > a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> > that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> > hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> > functions that care about the dirty or young flags of a hugetlb page.
> 
> I question whether this is the right approach.  I understand that
> different hardware implementations have different requirements here,
> but at least one that I'm aware of (AMD Zen 2/3) requires that all
> PTEs that are part of a contig PTE must have identical A/D bits.  Now,
> you could say that's irrelevant because it's x86 and we don't currently
> support contPTE on x86, but I wouldn't be surprised to see that other
> hardware has the same requirement.
> 
> So what if we make that a Linux requirement?  Setting a contPTE dirty or
> accessed becomes a bit more expensive (although still one/two cachelines,
> so not really much more expensive than a single write).  Then there's no
> need to change the "get" side of things because they're always identical.
> 
> It does mean that we can't take advantage of hardware setting A/D bits,
> unless hardware can be persuaded to behave this way.  I don't have any
> ARM specs in front of me to check.
>

I have looked at the comments in get_clear_flush() (in arch/arm64/mm/hugetlbpage.c).
That says:

	/*
	 * If HW_AFDBM is enabled, then the HW could turn on
	 * the dirty or accessed bit for any page in the set,
	 * so check them all.
	 */

Unfortunately, the AD bits are not identical in all subpages.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  2:54     ` Muchun Song
  0 siblings, 0 replies; 27+ messages in thread
From: Muchun Song @ 2022-05-09  2:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Baolin Wang, catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm

On Sun, May 08, 2022 at 06:08:18PM +0100, Matthew Wilcox wrote:
> On Sun, May 08, 2022 at 04:58:51PM +0800, Baolin Wang wrote:
> > As Mike pointed out [1], the huge_ptep_get() will only return one specific
> > pte value for the CONT-PTE or CONT-PMD size hugetlb on ARM64 system, which
> > will not take into account the subpages' dirty or young bits of a CONT-PTE/PMD
> > size hugetlb page. That will make us miss dirty or young flags of a CONT-PTE/PMD
> > size hugetlb page for those functions that want to check the dirty or
> > young flags of a hugetlb page. For example, the gather_hugetlb_stats() will
> > get inaccurate dirty hugetlb page statistics, and the DAMON for hugetlb monitoring
> > will also get inaccurate access statistics.
> > 
> > To fix this issue, one approach is that we can define an ARM64 specific huge_ptep_get()
> > implementation, which will take into account any subpages' dirty or young bits.
> > However we should add a new parameter for ARM64 specific huge_ptep_get() to check
> > how many continuous PTEs or PMDs in this CONT-PTE/PMD size hugetlb, that means we
> > should convert all the places using huge_ptep_get(), meanwhile most places using
> > huge_ptep_get() did not care about the dirty or young flags at all.
> > 
> > So instead of changing the prototype of huge_ptep_get(), this patch set introduces
> > a new huge_ptep_get_access_flags() interface and define an ARM64 specific implementation,
> > that will take into account any subpages' dirty or young bits for CONT-PTE/PMD size
> > hugetlb page. And we can only change to use huge_ptep_get_access_flags() for those
> > functions that care about the dirty or young flags of a hugetlb page.
> 
> I question whether this is the right approach.  I understand that
> different hardware implementations have different requirements here,
> but at least one that I'm aware of (AMD Zen 2/3) requires that all
> PTEs that are part of a contig PTE must have identical A/D bits.  Now,
> you could say that's irrelevant because it's x86 and we don't currently
> support contPTE on x86, but I wouldn't be surprised to see that other
> hardware has the same requirement.
> 
> So what if we make that a Linux requirement?  Setting a contPTE dirty or
> accessed becomes a bit more expensive (although still one/two cachelines,
> so not really much more expensive than a single write).  Then there's no
> need to change the "get" side of things because they're always identical.
> 
> It does mean that we can't take advantage of hardware setting A/D bits,
> unless hardware can be persuaded to behave this way.  I don't have any
> ARM specs in front of me to check.
>

I have looked at the comments in get_clear_flush() (in arch/arm64/mm/hugetlbpage.c).
That says:

	/*
	 * If HW_AFDBM is enabled, then the HW could turn on
	 * the dirty or accessed bit for any page in the set,
	 * so check them all.
	 */

Unfortunately, the AD bits are not identical in all subpages.

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  2022-05-09  1:19       ` Baolin Wang
@ 2022-05-09  4:10         ` nh26223
  -1 siblings, 0 replies; 27+ messages in thread
From: nh26223 @ 2022-05-09  4:10 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	Baolin Wang
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel, linux-mm

----------------8<---------------
> >> 
> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> >> index ca8e65c..ce39699 100644
> >> --- a/arch/arm64/mm/hugetlbpage.c
> >> +++ b/arch/arm64/mm/hugetlbpage.c
> >> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long
> >> size,
> >> size_t *pgsize) return contig_ptes;
> >> 
> >>   }
> >> 
> >> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
> > 
> > The function name looks to me that it returns access flags of PTE.
> 
> Yes, not a good name. That's why this is a RFC patch set to get more
> suggestion :)
> 
> Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?
I don't have either. "Naming is hard". :)

> >> diff --git a/include/asm-generic/hugetlb.h
> >> b/include/asm-generic/hugetlb.h
> >> index a57d667..bb77fb0 100644
> >> --- a/include/asm-generic/hugetlb.h
> >> +++ b/include/asm-generic/hugetlb.h
> >> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
> >> 
> >>   }
> >>   #endif
> >> 
> >> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> >> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned
> >> long
> >> sz) +{
> >> +	return ptep_get(ptep);
> > 
> > Should be:
> > 	return huge_ptep_get(ptep) ?
> 
> I don't think so. If no ARCH-specific definition, the
> huge_ptep_get_access_flags() implementation should be same as
> huge_ptep_get(). Thanks for your comments.
If no __HAVE_ARCH_HUGE_PTEP_GET, huge_ptep_get() is same as
ptep_get().

Or it's not possible no __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
but with __HAVE_ARCH_HUGE_PTEP_GET?


Regards
Yin, Fengwei

> 
> #ifndef __HAVE_ARCH_HUGE_PTEP_GET
> static inline pte_t huge_ptep_get(pte_t *ptep)
> {
>          return ptep_get(ptep);
> }
> #endif






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  4:10         ` nh26223
  0 siblings, 0 replies; 27+ messages in thread
From: nh26223 @ 2022-05-09  4:10 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj,
	Baolin Wang
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel, linux-mm

----------------8<---------------
> >> 
> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> >> index ca8e65c..ce39699 100644
> >> --- a/arch/arm64/mm/hugetlbpage.c
> >> +++ b/arch/arm64/mm/hugetlbpage.c
> >> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long
> >> size,
> >> size_t *pgsize) return contig_ptes;
> >> 
> >>   }
> >> 
> >> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
> > 
> > The function name looks to me that it returns access flags of PTE.
> 
> Yes, not a good name. That's why this is a RFC patch set to get more
> suggestion :)
> 
> Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?
I don't have either. "Naming is hard". :)

> >> diff --git a/include/asm-generic/hugetlb.h
> >> b/include/asm-generic/hugetlb.h
> >> index a57d667..bb77fb0 100644
> >> --- a/include/asm-generic/hugetlb.h
> >> +++ b/include/asm-generic/hugetlb.h
> >> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
> >> 
> >>   }
> >>   #endif
> >> 
> >> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> >> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned
> >> long
> >> sz) +{
> >> +	return ptep_get(ptep);
> > 
> > Should be:
> > 	return huge_ptep_get(ptep) ?
> 
> I don't think so. If no ARCH-specific definition, the
> huge_ptep_get_access_flags() implementation should be same as
> huge_ptep_get(). Thanks for your comments.
If no __HAVE_ARCH_HUGE_PTEP_GET, huge_ptep_get() is same as
ptep_get().

Or it's not possible no __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
but with __HAVE_ARCH_HUGE_PTEP_GET?


Regards
Yin, Fengwei

> 
> #ifndef __HAVE_ARCH_HUGE_PTEP_GET
> static inline pte_t huge_ptep_get(pte_t *ptep)
> {
>          return ptep_get(ptep);
> }
> #endif





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
  2022-05-09  4:10         ` nh26223
@ 2022-05-09  4:19           ` Baolin Wang
  -1 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  4:19 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm, Muchun Song



On 5/9/2022 12:10 PM, nh26223@qq.com write:
> ----------------8<---------------
>>>>
>>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>>> index ca8e65c..ce39699 100644
>>>> --- a/arch/arm64/mm/hugetlbpage.c
>>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>>> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long
>>>> size,
>>>> size_t *pgsize) return contig_ptes;
>>>>
>>>>    }
>>>>
>>>> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
>>>
>>> The function name looks to me that it returns access flags of PTE.
>>
>> Yes, not a good name. That's why this is a RFC patch set to get more
>> suggestion :)
>>
>> Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?
> I don't have either. "Naming is hard". :)
> 
>>>> diff --git a/include/asm-generic/hugetlb.h
>>>> b/include/asm-generic/hugetlb.h
>>>> index a57d667..bb77fb0 100644
>>>> --- a/include/asm-generic/hugetlb.h
>>>> +++ b/include/asm-generic/hugetlb.h
>>>> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>>>>
>>>>    }
>>>>    #endif
>>>>
>>>> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>>>> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned
>>>> long
>>>> sz) +{
>>>> +	return ptep_get(ptep);
>>>
>>> Should be:
>>> 	return huge_ptep_get(ptep) ?
>>
>> I don't think so. If no ARCH-specific definition, the
>> huge_ptep_get_access_flags() implementation should be same as
>> huge_ptep_get(). Thanks for your comments.
> If no __HAVE_ARCH_HUGE_PTEP_GET, huge_ptep_get() is same as
> ptep_get().
> 
> Or it's not possible no __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> but with __HAVE_ARCH_HUGE_PTEP_GET?

Yes, I am wrong, shoule be huge_ptep_get(). Thanks for pointing out 
issues :)

PS: I think I will follow Muchun's suggestion in next version, so no 
need to add a new interface.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/3] arm64/hugetlb: Introduce new huge_ptep_get_access_flags() interface
@ 2022-05-09  4:19           ` Baolin Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Baolin Wang @ 2022-05-09  4:19 UTC (permalink / raw)
  To: nh26223, catalin.marinas, will, arnd, mike.kravetz, akpm, sj
  Cc: linux-arm-kernel, linux-kernel, linux-arch, linux-fsdevel,
	linux-mm, Muchun Song



On 5/9/2022 12:10 PM, nh26223@qq.com write:
> ----------------8<---------------
>>>>
>>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>>> index ca8e65c..ce39699 100644
>>>> --- a/arch/arm64/mm/hugetlbpage.c
>>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>>> @@ -158,6 +158,30 @@ static inline int num_contig_ptes(unsigned long
>>>> size,
>>>> size_t *pgsize) return contig_ptes;
>>>>
>>>>    }
>>>>
>>>> +pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned long sz)
>>>
>>> The function name looks to me that it returns access flags of PTE.
>>
>> Yes, not a good name. That's why this is a RFC patch set to get more
>> suggestion :)
>>
>> Maybe huge_ptep_get_with_access_flags()? or do you have some better idea?
> I don't have either. "Naming is hard". :)
> 
>>>> diff --git a/include/asm-generic/hugetlb.h
>>>> b/include/asm-generic/hugetlb.h
>>>> index a57d667..bb77fb0 100644
>>>> --- a/include/asm-generic/hugetlb.h
>>>> +++ b/include/asm-generic/hugetlb.h
>>>> @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>>>>
>>>>    }
>>>>    #endif
>>>>
>>>> +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
>>>> +static inline pte_t huge_ptep_get_access_flags(pte_t *ptep, unsigned
>>>> long
>>>> sz) +{
>>>> +	return ptep_get(ptep);
>>>
>>> Should be:
>>> 	return huge_ptep_get(ptep) ?
>>
>> I don't think so. If no ARCH-specific definition, the
>> huge_ptep_get_access_flags() implementation should be same as
>> huge_ptep_get(). Thanks for your comments.
> If no __HAVE_ARCH_HUGE_PTEP_GET, huge_ptep_get() is same as
> ptep_get().
> 
> Or it's not possible no __HAVE_ARCH_HUGE_PTEP_GET_ACCESS_FLAGS
> but with __HAVE_ARCH_HUGE_PTEP_GET?

Yes, I am wrong, shoule be huge_ptep_get(). Thanks for pointing out 
issues :)

PS: I think I will follow Muchun's suggestion in next version, so no 
need to add a new interface.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-05-09  4:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08  8:58 [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface Baolin Wang
2022-05-08  8:58 ` Baolin Wang
2022-05-08  8:58 ` [RFC PATCH 1/3] arm64/hugetlb: " Baolin Wang
2022-05-08  8:58   ` Baolin Wang
2022-05-08 13:14   ` nh26223
2022-05-08 13:14     ` nh26223
2022-05-09  1:19     ` Baolin Wang
2022-05-09  1:19       ` Baolin Wang
2022-05-09  4:10       ` nh26223
2022-05-09  4:10         ` nh26223
2022-05-09  4:19         ` Baolin Wang
2022-05-09  4:19           ` Baolin Wang
2022-05-08  8:58 ` [RFC PATCH 2/3] fs/proc/task_mmu: Change to use huge_ptep_get_access_flags() Baolin Wang
2022-05-08  8:58   ` Baolin Wang
2022-05-08  8:58 ` [RFC PATCH 3/3] mm/damon/vaddr: " Baolin Wang
2022-05-08  8:58   ` Baolin Wang
2022-05-08 12:41   ` kernel test robot
2022-05-08 15:26 ` [RFC PATCH 0/3] Introduce new huge_ptep_get_access_flags() interface Muchun Song
2022-05-08 15:26   ` Muchun Song
2022-05-09  1:34   ` Baolin Wang
2022-05-09  1:34     ` Baolin Wang
2022-05-08 17:08 ` Matthew Wilcox
2022-05-08 17:08   ` Matthew Wilcox
2022-05-09  1:53   ` Baolin Wang
2022-05-09  1:53     ` Baolin Wang
2022-05-09  2:54   ` Muchun Song
2022-05-09  2:54     ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.