linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC V2 0/2] arm64/mm: Enable THP migration
@ 2020-06-15 13:15 Anshuman Khandual
  2020-06-15 13:15 ` [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics Anshuman Khandual
  2020-06-15 13:15 ` [RFC V2 2/2] arm64/mm: Enable THP migration Anshuman Khandual
  0 siblings, 2 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-06-15 13:15 UTC (permalink / raw)
  To: linux-mm, linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, ziy, Anshuman Khandual,
	Marc Zyngier, Suzuki Poulose, linux-kernel

This series enables THP migration on arm64 via ARCH_ENABLE_THP_MIGRATION.
But first this modifies all existing THP helpers like pmd_present() and
pmd_trans_huge() etc per expected generic memory semantics as concluded
from a previous discussion here.

https://lkml.org/lkml/2018/10/9/220

This series is based on v5.8-rc1.

Changes in RFC V2:

- Used PMD_TABLE_BIT to represent splitting PMD state per Catalin

Changes in RFC V1: (https://patchwork.kernel.org/project/linux-mm/list/?series=138797)

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (2):
  arm64/mm: Change THP helpers per generic memory semantics
  arm64/mm: Enable THP migration

 arch/arm64/Kconfig               |   4 ++
 arch/arm64/include/asm/pgtable.h | 102 +++++++++++++++++++++++++++----
 arch/arm64/mm/hugetlbpage.c      |   2 +-
 arch/arm64/mm/mmu.c              |  20 ++++++
 4 files changed, 116 insertions(+), 12 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics
  2020-06-15 13:15 [RFC V2 0/2] arm64/mm: Enable THP migration Anshuman Khandual
@ 2020-06-15 13:15 ` Anshuman Khandual
  2020-07-02 12:11   ` Catalin Marinas
  2020-06-15 13:15 ` [RFC V2 2/2] arm64/mm: Enable THP migration Anshuman Khandual
  1 sibling, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2020-06-15 13:15 UTC (permalink / raw)
  To: linux-mm, linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, ziy, Anshuman Khandual,
	Marc Zyngier, Suzuki Poulose, linux-kernel

pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a PMD entry. This table is derived from a
previous discussion on this topic [1] and available THP documentation [2].

pmd_present(pmd):

- True if PMD has a mapped huge page i.e valid pmd_page(pmd)
- False if PMD does not have a mapped huge page i.e invalid pmd_page(pmd)

pmd_trans_huge(pmd):

- True if PMD has a mapped huge page and is a THP

-------------------------------------------------------------------------
|	PMD states	|	pmd_present	|	pmd_trans_huge	|
-------------------------------------------------------------------------
|	Mapped		|	Yes		|	Yes		|
-------------------------------------------------------------------------
|	Splitting	|	Yes		|	Yes		|
-------------------------------------------------------------------------
|	Migration/Swap	|	No		|	No		|
-------------------------------------------------------------------------

Current Problem:-

PMD is invalidated with pmdp_invalidate() before it's splitting. It clears
PMD_SECT_VALID as below.

PMD Split -> pmdp_invalidate() -> pmd_mkinvalid() -> Clears PMD_SECT_VALID

Once PMD_SECT_VALID gets cleared, pmd_present() returns false. It will need
a separate page table bit apart from PMD_SECT_VALID, in order to reaffirm
pmd_present() as true during this THP split process. To comply with above
mentioned semantics, pmd_trans_huge() should also check pmd_present() first
before testing presence of an actual transparent huge page mapping.

PMD_TYPE_SECT should have been used here. But it shares bit position with
PMD_SECT_VALID which gets cleared during THP invalidation. Hence, it cannot
be used for pmd_present() after pmdp_invalidate().

Proposed Solution:-

PMD_TABLE_BIT can be set during pmdp_invalidate() process which will make
pmd_present() return true as required. But then, PMD_TABLE_BIT needs to be
cleared when PMD gets mapped again.

This replaces pmd_present() with pte_present() in huge_pte_offset() just to
preserve the existing semantics.

[1]: https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/
[2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 92 ++++++++++++++++++++++++++++----
 arch/arm64/mm/hugetlbpage.c      |  2 +-
 arch/arm64/mm/mmu.c              | 20 +++++++
 3 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6dbd267ab931..560be593a8dc 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -353,15 +353,92 @@ static inline int pmd_protnone(pmd_t pmd)
 }
 #endif
 
+#define pmd_table(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_TABLE)
+#define pmd_sect(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_SECT)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
- * THP definitions.
+ * PMD Level Encoding (THP Enabled)
+ *
+ * 0b00 - Not valid	Not present	NA
+ * 0b10 - Not valid	Present		Huge  (Splitting)
+ * 0b01 - Valid		Present		Huge  (Mapped)
+ * 0b11 - Valid		Present		Table (Mapped)
  */
+static inline pmd_t pmd_mksplitting(pmd_t pmd)
+{
+	unsigned long val = pmd_val(pmd);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
+	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TABLE_BIT);
+}
+
+static inline pmd_t pmd_clrsplitting(pmd_t pmd)
+{
+	unsigned long val = pmd_val(pmd);
+
+	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
+}
+
+static inline bool pmd_splitting(pmd_t pmd)
+{
+	unsigned long val = pmd_val(pmd);
+
+	if ((val & PMD_TYPE_MASK) == PMD_TABLE_BIT)
+		return true;
+	return false;
+}
+
+static inline bool pmd_mapped(pmd_t pmd)
+{
+	return pmd_sect(pmd);
+}
+
+static inline pmd_t pmd_mkinvalid(pmd_t pmd)
+{
+	/*
+	 * Invalidation should not have been invoked on
+	 * a PMD table entry. Just warn here otherwise.
+	 */
+	WARN_ON(pmd_table(pmd));
+	return pmd_mksplitting(pmd);
+}
+
+static inline int pmd_present(pmd_t pmd);
+
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+	if (!pmd_present(pmd))
+		return 0;
+
+	if (!pmd_val(pmd))
+		return 0;
+
+	if (pmd_mapped(pmd))
+		return 1;
+
+	if (pmd_splitting(pmd))
+		return 1;
+	return 0;
+}
+
+void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+		pmd_t *pmdp, pmd_t pmd);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-#define pmd_present(pmd)	pte_present(pmd_pte(pmd))
+static inline int pmd_present(pmd_t pmd)
+{
+	pte_t pte = pmd_pte(pmd);
+
+	if (pte_present(pte))
+		return 1;
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	if (pmd_splitting(pmd))
+		return 1;
+#endif
+	return 0;
+}
+
 #define pmd_dirty(pmd)		pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_valid(pmd)		pte_valid(pmd_pte(pmd))
@@ -371,7 +448,6 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_mkclean(pmd)	pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)	pte_pmd(pte_mkdirty(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)	pte_pmd(pte_mkyoung(pmd_pte(pmd)))
-#define pmd_mkinvalid(pmd)	(__pmd(pmd_val(pmd) & ~PMD_SECT_VALID))
 
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
 
@@ -404,8 +480,6 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
 #define pud_pfn(pud)		((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
 #define pfn_pud(pfn,prot)	__pud(__phys_to_pud_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
-#define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
-
 #define __p4d_to_phys(p4d)	__pte_to_phys(p4d_pte(p4d))
 #define __phys_to_p4d_val(phys)	__phys_to_pte_val(phys)
 
@@ -448,10 +522,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 
 #define pmd_bad(pmd)		(!(pmd_val(pmd) & PMD_TABLE_BIT))
 
-#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
-				 PMD_TYPE_TABLE)
-#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
-				 PMD_TYPE_SECT)
 #define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 0a52ce46f020..79a8d5c8d4f4 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -294,7 +294,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (!(sz == PMD_SIZE || sz == CONT_PMD_SIZE) &&
 	    pmd_none(pmd))
 		return NULL;
-	if (pmd_huge(pmd) || !pmd_present(pmd))
+	if (pmd_huge(pmd) || !pte_present(pmd_pte(pmd)))
 		return (pte_t *)pmdp;
 
 	if (sz == CONT_PTE_SIZE)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 990929c8837e..337519031115 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,8 @@
 #include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
 
 #include <asm/barrier.h>
 #include <asm/cputype.h>
@@ -1483,3 +1485,21 @@ static int __init prevent_bootmem_remove_init(void)
 }
 device_initcall(prevent_bootmem_remove_init);
 #endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+		pmd_t *pmdp, pmd_t pmd)
+{
+	/*
+	 * PMD migration entries need to retain splitting PMD
+	 * representation created with pmdp_invalidate(). But
+	 * any non-migration entry which just might have been
+	 * invalidated previously, still need be a normal huge
+	 * page. Hence selectively clear splitting entries.
+	 */
+	if (!is_migration_entry(pmd_to_swp_entry(pmd)))
+		pmd = pmd_clrsplitting(pmd);
+
+	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd));
+}
+#endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC V2 2/2] arm64/mm: Enable THP migration
  2020-06-15 13:15 [RFC V2 0/2] arm64/mm: Enable THP migration Anshuman Khandual
  2020-06-15 13:15 ` [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics Anshuman Khandual
@ 2020-06-15 13:15 ` Anshuman Khandual
  1 sibling, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-06-15 13:15 UTC (permalink / raw)
  To: linux-mm, linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, ziy, Anshuman Khandual,
	Marc Zyngier, Suzuki Poulose, linux-kernel

In certain page migration situations, a THP page can be migrated without
being split into it's constituent subpages. This saves time required to
split a THP and put it back together when required. But it also saves an
wider address range translation covered by a single TLB entry, reducing
future page fault costs.

A previous patch changed platform THP helpers per generic memory semantics,
clearing the path for THP migration support. This adds two more THP helpers
required to create PMD migration swap entries. Now just enable HP migration
via ARCH_ENABLE_THP_MIGRATION.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/Kconfig               |  4 ++++
 arch/arm64/include/asm/pgtable.h | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 31380da53689..01d432dc813e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1874,6 +1874,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	def_bool y
 	depends on HUGETLB_PAGE && MIGRATION
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on TRANSPARENT_HUGEPAGE
+
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 560be593a8dc..892c30129cd2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -907,6 +907,16 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(swp)	((pte_t) { (swp).val })
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#define __pmd_to_swp_entry(pmd)		((swp_entry_t) { pmd_val(pmd) })
+
+/*
+ * pmd_present() must return false for a swap PMD entry.
+ * Just explicitly clear PMD_TABLE_BIT while converting.
+ */
+#define __swp_entry_to_pmd(swp)		__pmd((swp).val & ~PMD_TABLE_BIT)
+#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
+
 /*
  * Ensure that there are not more swap files than can be encoded in the kernel
  * PTEs.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics
  2020-06-15 13:15 ` [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics Anshuman Khandual
@ 2020-07-02 12:11   ` Catalin Marinas
  2020-07-06  3:57     ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2020-07-02 12:11 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-arm-kernel, will, mark.rutland, ziy,
	Marc Zyngier, Suzuki Poulose, linux-kernel

Hi Anshuman,

On Mon, Jun 15, 2020 at 06:45:17PM +0530, Anshuman Khandual wrote:
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -353,15 +353,92 @@ static inline int pmd_protnone(pmd_t pmd)
>  }
>  #endif
>  
> +#define pmd_table(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_TABLE)
> +#define pmd_sect(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_SECT)
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
> - * THP definitions.
> + * PMD Level Encoding (THP Enabled)
> + *
> + * 0b00 - Not valid	Not present	NA
> + * 0b10 - Not valid	Present		Huge  (Splitting)
> + * 0b01 - Valid		Present		Huge  (Mapped)
> + * 0b11 - Valid		Present		Table (Mapped)
>   */

I wonder whether it would be easier to read if we add a dedicated
PMD_SPLITTING bit, only when bit 0 is cleared. This bit can be high (say
59), it doesn't really matter as the entry is not valid.

The only doubt I have is that pmd_mkinvalid() is used in other contexts
when it's not necessarily splitting a pmd (search for the
pmdp_invalidate() calls). So maybe a better name like PMD_PRESENT with a
comment that pmd_to_page() is valid (i.e. no migration or swap entry).
Feel free to suggest a better name.

> +static inline pmd_t pmd_mksplitting(pmd_t pmd)
> +{
> +	unsigned long val = pmd_val(pmd);
>  
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TABLE_BIT);
> +}
> +
> +static inline pmd_t pmd_clrsplitting(pmd_t pmd)
> +{
> +	unsigned long val = pmd_val(pmd);
> +
> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
> +}
> +
> +static inline bool pmd_splitting(pmd_t pmd)
> +{
> +	unsigned long val = pmd_val(pmd);
> +
> +	if ((val & PMD_TYPE_MASK) == PMD_TABLE_BIT)
> +		return true;
> +	return false;
> +}
> +
> +static inline bool pmd_mapped(pmd_t pmd)
> +{
> +	return pmd_sect(pmd);
> +}
> +
> +static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> +{
> +	/*
> +	 * Invalidation should not have been invoked on
> +	 * a PMD table entry. Just warn here otherwise.
> +	 */
> +	WARN_ON(pmd_table(pmd));
> +	return pmd_mksplitting(pmd);
> +}

And here we wouldn't need t worry about table checks.

> +static inline int pmd_present(pmd_t pmd);
> +
> +static inline int pmd_trans_huge(pmd_t pmd)
> +{
> +	if (!pmd_present(pmd))
> +		return 0;
> +
> +	if (!pmd_val(pmd))
> +		return 0;
> +
> +	if (pmd_mapped(pmd))
> +		return 1;
> +
> +	if (pmd_splitting(pmd))
> +		return 1;
> +	return 0;

Doesn't your new pmd_present() already check for splitting? I think
checking for bit 0 and the new PMD_PRESENT. That would be similar to
what we do with PTE_PROT_NONE. Actually, you could use the same bit for
both.

> +}
> +
> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> +		pmd_t *pmdp, pmd_t pmd);
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> -#define pmd_present(pmd)	pte_present(pmd_pte(pmd))
> +static inline int pmd_present(pmd_t pmd)
> +{
> +	pte_t pte = pmd_pte(pmd);
> +
> +	if (pte_present(pte))
> +		return 1;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (pmd_splitting(pmd))
> +		return 1;
> +#endif
> +	return 0;
> +}

[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 990929c8837e..337519031115 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -22,6 +22,8 @@
>  #include <linux/io.h>
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/cputype.h>
> @@ -1483,3 +1485,21 @@ static int __init prevent_bootmem_remove_init(void)
>  }
>  device_initcall(prevent_bootmem_remove_init);
>  #endif
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> +		pmd_t *pmdp, pmd_t pmd)
> +{
> +	/*
> +	 * PMD migration entries need to retain splitting PMD
> +	 * representation created with pmdp_invalidate(). But
> +	 * any non-migration entry which just might have been
> +	 * invalidated previously, still need be a normal huge
> +	 * page. Hence selectively clear splitting entries.
> +	 */
> +	if (!is_migration_entry(pmd_to_swp_entry(pmd)))
> +		pmd = pmd_clrsplitting(pmd);
> +
> +	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd));
> +}
> +#endif

So a pmdp_invalidate() returns the old pmd. Do we ever need to rebuild a
pmd based on the actual bits in the new invalidated pmdp? Wondering how
the table bit ends up here that we need to pmd_clrsplitting().

-- 
Catalin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics
  2020-07-02 12:11   ` Catalin Marinas
@ 2020-07-06  3:57     ` Anshuman Khandual
  2020-07-07 17:44       ` Catalin Marinas
  0 siblings, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2020-07-06  3:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, linux-arm-kernel, will, mark.rutland, ziy,
	Marc Zyngier, Suzuki Poulose, linux-kernel



On 07/02/2020 05:41 PM, Catalin Marinas wrote:
> Hi Anshuman,

Hi Catalin,

> 
> On Mon, Jun 15, 2020 at 06:45:17PM +0530, Anshuman Khandual wrote:
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -353,15 +353,92 @@ static inline int pmd_protnone(pmd_t pmd)
>>  }
>>  #endif
>>  
>> +#define pmd_table(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_TABLE)
>> +#define pmd_sect(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_SECT)
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  /*
>> - * THP definitions.
>> + * PMD Level Encoding (THP Enabled)
>> + *
>> + * 0b00 - Not valid	Not present	NA
>> + * 0b10 - Not valid	Present		Huge  (Splitting)
>> + * 0b01 - Valid		Present		Huge  (Mapped)
>> + * 0b11 - Valid		Present		Table (Mapped)
>>   */
> 
> I wonder whether it would be easier to read if we add a dedicated
> PMD_SPLITTING bit, only when bit 0 is cleared. This bit can be high (say
> 59), it doesn't really matter as the entry is not valid.

Could make (PMD[0b00] = 0b10) be represented as PMD_SPLITTING just for
better reading purpose. But if possible, IMHO it is efficient and less
vulnerable to use HW defined PTE attribute bit positions including SW
usable ones than the reserved bits, for a PMD state representation.

Earlier proposal used PTE_SPECIAL (bit 56) instead. Using PMD_TABLE_BIT
helps save bit 56 for later. Thinking about it again, would not these
unused higher bits [59..63] create any problem ? For example while
enabling THP swapping without split via ARCH_WANTS_THP_SWAP or something
else later when these higher bits might be required. I am not sure, just
speculating.

But, do you see any particular problem with PMD_TABLE_BIT ?

> 
> The only doubt I have is that pmd_mkinvalid() is used in other contexts
> when it's not necessarily splitting a pmd (search for the
> pmdp_invalidate() calls). So maybe a better name like PMD_PRESENT with a
> comment that pmd_to_page() is valid (i.e. no migration or swap entry).
> Feel free to suggest a better name.

PMD_INVALID_PRESENT sounds better ?

> 
>> +static inline pmd_t pmd_mksplitting(pmd_t pmd)
>> +{
>> +	unsigned long val = pmd_val(pmd);
>>  
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
>> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TABLE_BIT);
>> +}
>> +
>> +static inline pmd_t pmd_clrsplitting(pmd_t pmd)
>> +{
>> +	unsigned long val = pmd_val(pmd);
>> +
>> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>> +}
>> +
>> +static inline bool pmd_splitting(pmd_t pmd)
>> +{
>> +	unsigned long val = pmd_val(pmd);
>> +
>> +	if ((val & PMD_TYPE_MASK) == PMD_TABLE_BIT)
>> +		return true;
>> +	return false;
>> +}
>> +
>> +static inline bool pmd_mapped(pmd_t pmd)
>> +{
>> +	return pmd_sect(pmd);
>> +}
>> +
>> +static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>> +{
>> +	/*
>> +	 * Invalidation should not have been invoked on
>> +	 * a PMD table entry. Just warn here otherwise.
>> +	 */
>> +	WARN_ON(pmd_table(pmd));
>> +	return pmd_mksplitting(pmd);
>> +}
> 
> And here we wouldn't need t worry about table checks.> 
This is just a temporary sanity check validating the assumption
that a table entry would never be called with pmdp_invalidate().
This can be dropped later on if required.

>> +static inline int pmd_present(pmd_t pmd);
>> +
>> +static inline int pmd_trans_huge(pmd_t pmd)
>> +{
>> +	if (!pmd_present(pmd))
>> +		return 0;
>> +
>> +	if (!pmd_val(pmd))
>> +		return 0;
>> +
>> +	if (pmd_mapped(pmd))
>> +		return 1;
>> +
>> +	if (pmd_splitting(pmd))
>> +		return 1;
>> +	return 0;
> 
> Doesn't your new pmd_present() already check for splitting? I think

I actually meant pte_present() here instead, my bad.

> checking for bit 0 and the new PMD_PRESENT. That would be similar to
> what we do with PTE_PROT_NONE. Actually, you could use the same bit for
> both.

IIUC PROT NONE is supported at PMD level as well. Hence with valid bit
cleared, there is a chance for misinterpretation between pmd_protnone()
and pmd_splitting() if the same bit (PTE_PROT_NONE) is used.

> 
>> +}
>> +
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>  
>> -#define pmd_present(pmd)	pte_present(pmd_pte(pmd))
>> +static inline int pmd_present(pmd_t pmd)
>> +{
>> +	pte_t pte = pmd_pte(pmd);
>> +
>> +	if (pte_present(pte))
>> +		return 1;
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +	if (pmd_splitting(pmd))
>> +		return 1;
>> +#endif
>> +	return 0;
>> +}
> 
> [...]
> 
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 990929c8837e..337519031115 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -22,6 +22,8 @@
>>  #include <linux/io.h>
>>  #include <linux/mm.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/swap.h>
>> +#include <linux/swapops.h>
>>  
>>  #include <asm/barrier.h>
>>  #include <asm/cputype.h>
>> @@ -1483,3 +1485,21 @@ static int __init prevent_bootmem_remove_init(void)
>>  }
>>  device_initcall(prevent_bootmem_remove_init);
>>  #endif
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	/*
>> +	 * PMD migration entries need to retain splitting PMD
>> +	 * representation created with pmdp_invalidate(). But
>> +	 * any non-migration entry which just might have been
>> +	 * invalidated previously, still need be a normal huge
>> +	 * page. Hence selectively clear splitting entries.
>> +	 */
>> +	if (!is_migration_entry(pmd_to_swp_entry(pmd)))
>> +		pmd = pmd_clrsplitting(pmd);
>> +
>> +	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd));
>> +}
>> +#endif
> 
> So a pmdp_invalidate() returns the old pmd. Do we ever need to rebuild a
> pmd based on the actual bits in the new invalidated pmdp? Wondering how
> the table bit ends up here that we need to pmd_clrsplitting().

Yes, a pmd is always rebuilt via set_pmd_at() with the old value as
returned from an earlier pmdp_invalidate() but which may have been
changed with standard page table entry transformations. Basically,
it will not be created afresh from the pfn and VMA flags.

Some example here:

1. dax_entry_mkclean (fs/dax.c)

	pmd = pmdp_invalidate(vma, address, pmdp);
	pmd = pmd_wrprotect(pmd);
	pmd = pmd_mkclean(pmd);
	set_pmd_at(vma->vm_mm, address, pmdp, pmd);

2. clear_soft_dirty_pmd (fs/proc/task_mmu.c)

	old = pmdp_invalidate(vma, addr, pmdp);
	if (pmd_dirty(old))
		pmd = pmd_mkdirty(pmd);
	if (pmd_young(old))
		pmd = pmd_mkyoung(pmd);
	pmd = pmd_wrprotect(pmd);
	pmd = pmd_clear_soft_dirty(pmd);
	set_pmd_at(vma->vm_mm, addr, pmdp, pmd);

3. madvise_free_huge_pmd (mm/huge_memory.c)

	orig_pmd = *pmd;
	....
	pmdp_invalidate(vma, addr, pmd);
	orig_pmd = pmd_mkold(orig_pmd);
	orig_pmd = pmd_mkclean(orig_pmd);
        set_pmd_at(mm, addr, pmd, orig_pmd);

4. page_mkclean_one (mm/rmap.c)

	entry = pmdp_invalidate(vma, address, pmd);
	entry = pmd_wrprotect(entry);
	entry = pmd_mkclean(entry);
	set_pmd_at(vma->vm_mm, address, pmd, entry);

Any additional bit set in PMD via pmdp_invalidate() needs to be
cleared off in set_pmd_at(), unless it is a migration entry.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics
  2020-07-06  3:57     ` Anshuman Khandual
@ 2020-07-07 17:44       ` Catalin Marinas
  2020-08-17  5:43         ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2020-07-07 17:44 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-arm-kernel, will, mark.rutland, ziy,
	Marc Zyngier, Suzuki Poulose, linux-kernel

On Mon, Jul 06, 2020 at 09:27:04AM +0530, Anshuman Khandual wrote:
> On 07/02/2020 05:41 PM, Catalin Marinas wrote:
> > On Mon, Jun 15, 2020 at 06:45:17PM +0530, Anshuman Khandual wrote:
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -353,15 +353,92 @@ static inline int pmd_protnone(pmd_t pmd)
> >>  }
> >>  #endif
> >>  
> >> +#define pmd_table(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_TABLE)
> >> +#define pmd_sect(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_SECT)
> >> +
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>  /*
> >> - * THP definitions.
> >> + * PMD Level Encoding (THP Enabled)
> >> + *
> >> + * 0b00 - Not valid	Not present	NA
> >> + * 0b10 - Not valid	Present		Huge  (Splitting)
> >> + * 0b01 - Valid		Present		Huge  (Mapped)
> >> + * 0b11 - Valid		Present		Table (Mapped)
> >>   */
> > 
> > I wonder whether it would be easier to read if we add a dedicated
> > PMD_SPLITTING bit, only when bit 0 is cleared. This bit can be high (say
> > 59), it doesn't really matter as the entry is not valid.
> 
> Could make (PMD[0b00] = 0b10) be represented as PMD_SPLITTING just for
> better reading purpose. But if possible, IMHO it is efficient and less
> vulnerable to use HW defined PTE attribute bit positions including SW
> usable ones than the reserved bits, for a PMD state representation.
> 
> Earlier proposal used PTE_SPECIAL (bit 56) instead. Using PMD_TABLE_BIT
> helps save bit 56 for later. Thinking about it again, would not these
> unused higher bits [59..63] create any problem ? For example while
> enabling THP swapping without split via ARCH_WANTS_THP_SWAP or something
> else later when these higher bits might be required. I am not sure, just
> speculating.

The swap encoding goes to bit 57, so going higher shouldn't be an issue.

> But, do you see any particular problem with PMD_TABLE_BIT ?

No. Only that we have some precedent like PTE_PROT_NONE (bit 58) and
wondering whether we could use a high bit as well here. If we can get
them to overlap, it simplifies this patch further.

> > The only doubt I have is that pmd_mkinvalid() is used in other contexts
> > when it's not necessarily splitting a pmd (search for the
> > pmdp_invalidate() calls). So maybe a better name like PMD_PRESENT with a
> > comment that pmd_to_page() is valid (i.e. no migration or swap entry).
> > Feel free to suggest a better name.
> 
> PMD_INVALID_PRESENT sounds better ?

No strong opinion either way. Yours is clearer.

> >> +static inline pmd_t pmd_mksplitting(pmd_t pmd)
> >> +{
> >> +	unsigned long val = pmd_val(pmd);
> >>  
> >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> -#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
> >> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TABLE_BIT);
> >> +}
> >> +
> >> +static inline pmd_t pmd_clrsplitting(pmd_t pmd)
> >> +{
> >> +	unsigned long val = pmd_val(pmd);
> >> +
> >> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
> >> +}
> >> +
> >> +static inline bool pmd_splitting(pmd_t pmd)
> >> +{
> >> +	unsigned long val = pmd_val(pmd);
> >> +
> >> +	if ((val & PMD_TYPE_MASK) == PMD_TABLE_BIT)
> >> +		return true;
> >> +	return false;
> >> +}
> >> +
> >> +static inline bool pmd_mapped(pmd_t pmd)
> >> +{
> >> +	return pmd_sect(pmd);
> >> +}
> >> +
> >> +static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> >> +{
> >> +	/*
> >> +	 * Invalidation should not have been invoked on
> >> +	 * a PMD table entry. Just warn here otherwise.
> >> +	 */
> >> +	WARN_ON(pmd_table(pmd));
> >> +	return pmd_mksplitting(pmd);
> >> +}
> > 
> > And here we wouldn't need t worry about table checks.
> 
> This is just a temporary sanity check validating the assumption
> that a table entry would never be called with pmdp_invalidate().
> This can be dropped later on if required.

You could use a VM_WARN_ON.

> >> +static inline int pmd_present(pmd_t pmd);
> >> +
> >> +static inline int pmd_trans_huge(pmd_t pmd)
> >> +{
> >> +	if (!pmd_present(pmd))
> >> +		return 0;
> >> +
> >> +	if (!pmd_val(pmd))
> >> +		return 0;
> >> +
> >> +	if (pmd_mapped(pmd))
> >> +		return 1;
> >> +
> >> +	if (pmd_splitting(pmd))
> >> +		return 1;
> >> +	return 0;
> > 
> > Doesn't your new pmd_present() already check for splitting? I think
> 
> I actually meant pte_present() here instead, my bad.
> 
> > checking for bit 0 and the new PMD_PRESENT. That would be similar to
> > what we do with PTE_PROT_NONE. Actually, you could use the same bit for
> > both.
> 
> IIUC PROT NONE is supported at PMD level as well. Hence with valid bit
> cleared, there is a chance for misinterpretation between pmd_protnone()
> and pmd_splitting() if the same bit (PTE_PROT_NONE) is used.

We can indeed have a PROT_NONE pmd but does it matter? All you need is
that pmdp_invalidate() returns the original (present pmd) and writes a
value that is still pmd_present() while invalid. You never modify the
new value again AFAICT (only the old one to rebuild the pmd).

It is indeed a problem if set_pmd_at() clears the new
PMD_INVALID_PRESENT bit but my understanding is that it doesn't need to
(see below).

> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 990929c8837e..337519031115 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -22,6 +22,8 @@
> >>  #include <linux/io.h>
> >>  #include <linux/mm.h>
> >>  #include <linux/vmalloc.h>
> >> +#include <linux/swap.h>
> >> +#include <linux/swapops.h>
> >>  
> >>  #include <asm/barrier.h>
> >>  #include <asm/cputype.h>
> >> @@ -1483,3 +1485,21 @@ static int __init prevent_bootmem_remove_init(void)
> >>  }
> >>  device_initcall(prevent_bootmem_remove_init);
> >>  #endif
> >> +
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> >> +		pmd_t *pmdp, pmd_t pmd)
> >> +{
> >> +	/*
> >> +	 * PMD migration entries need to retain splitting PMD
> >> +	 * representation created with pmdp_invalidate(). But
> >> +	 * any non-migration entry which just might have been
> >> +	 * invalidated previously, still need be a normal huge
> >> +	 * page. Hence selectively clear splitting entries.
> >> +	 */
> >> +	if (!is_migration_entry(pmd_to_swp_entry(pmd)))
> >> +		pmd = pmd_clrsplitting(pmd);
> >> +
> >> +	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd));
> >> +}
> >> +#endif
> > 
> > So a pmdp_invalidate() returns the old pmd. Do we ever need to rebuild a
> > pmd based on the actual bits in the new invalidated pmdp? Wondering how
> > the table bit ends up here that we need to pmd_clrsplitting().
> 
> Yes, a pmd is always rebuilt via set_pmd_at() with the old value as
> returned from an earlier pmdp_invalidate() but which may have been
> changed with standard page table entry transformations. Basically,
> it will not be created afresh from the pfn and VMA flags.

My point is that pmdp_invalidate() is never called on an already invalid
pmd. A valid pmd should never have the PMD_INVALID_PRESENT bit set.
Therefore, set_pmd_at() does not need to clear any such bit as it wasn't
in the old value returned by pmdp_invalidate().

> Any additional bit set in PMD via pmdp_invalidate() needs to be
> cleared off in set_pmd_at(), unless it is a migration entry.

I'm not convinced we need to unless we nest pmdp_invalidate() calls
(have you seen any evidence of this?).

-- 
Catalin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics
  2020-07-07 17:44       ` Catalin Marinas
@ 2020-08-17  5:43         ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2020-08-17  5:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, linux-arm-kernel, will, mark.rutland, ziy,
	Marc Zyngier, Suzuki Poulose, linux-kernel



On 07/07/2020 11:14 PM, Catalin Marinas wrote:
> On Mon, Jul 06, 2020 at 09:27:04AM +0530, Anshuman Khandual wrote:
>> On 07/02/2020 05:41 PM, Catalin Marinas wrote:
>>> On Mon, Jun 15, 2020 at 06:45:17PM +0530, Anshuman Khandual wrote:
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -353,15 +353,92 @@ static inline int pmd_protnone(pmd_t pmd)
>>>>  }
>>>>  #endif
>>>>  
>>>> +#define pmd_table(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_TABLE)
>>>> +#define pmd_sect(pmd)	((pmd_val(pmd) & PMD_TYPE_MASK) ==  PMD_TYPE_SECT)
>>>> +
>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>  /*
>>>> - * THP definitions.
>>>> + * PMD Level Encoding (THP Enabled)
>>>> + *
>>>> + * 0b00 - Not valid	Not present	NA
>>>> + * 0b10 - Not valid	Present		Huge  (Splitting)
>>>> + * 0b01 - Valid		Present		Huge  (Mapped)
>>>> + * 0b11 - Valid		Present		Table (Mapped)
>>>>   */
>>>
>>> I wonder whether it would be easier to read if we add a dedicated
>>> PMD_SPLITTING bit, only when bit 0 is cleared. This bit can be high (say
>>> 59), it doesn't really matter as the entry is not valid.
>>
>> Could make (PMD[0b00] = 0b10) be represented as PMD_SPLITTING just for
>> better reading purpose. But if possible, IMHO it is efficient and less
>> vulnerable to use HW defined PTE attribute bit positions including SW
>> usable ones than the reserved bits, for a PMD state representation.
>>
>> Earlier proposal used PTE_SPECIAL (bit 56) instead. Using PMD_TABLE_BIT
>> helps save bit 56 for later. Thinking about it again, would not these
>> unused higher bits [59..63] create any problem ? For example while
>> enabling THP swapping without split via ARCH_WANTS_THP_SWAP or something
>> else later when these higher bits might be required. I am not sure, just
>> speculating.
> 
> The swap encoding goes to bit 57, so going higher shouldn't be an issue.
> 
>> But, do you see any particular problem with PMD_TABLE_BIT ?
> 
> No. Only that we have some precedent like PTE_PROT_NONE (bit 58) and
> wondering whether we could use a high bit as well here. If we can get
> them to overlap, it simplifies this patch further.
> 
>>> The only doubt I have is that pmd_mkinvalid() is used in other contexts
>>> when it's not necessarily splitting a pmd (search for the
>>> pmdp_invalidate() calls). So maybe a better name like PMD_PRESENT with a
>>> comment that pmd_to_page() is valid (i.e. no migration or swap entry).
>>> Feel free to suggest a better name.
>>
>> PMD_INVALID_PRESENT sounds better ?
> 
> No strong opinion either way. Yours is clearer.
> 
>>>> +static inline pmd_t pmd_mksplitting(pmd_t pmd)
>>>> +{
>>>> +	unsigned long val = pmd_val(pmd);
>>>>  
>>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> -#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
>>>> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TABLE_BIT);
>>>> +}
>>>> +
>>>> +static inline pmd_t pmd_clrsplitting(pmd_t pmd)
>>>> +{
>>>> +	unsigned long val = pmd_val(pmd);
>>>> +
>>>> +	return __pmd((val & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>>>> +}
>>>> +
>>>> +static inline bool pmd_splitting(pmd_t pmd)
>>>> +{
>>>> +	unsigned long val = pmd_val(pmd);
>>>> +
>>>> +	if ((val & PMD_TYPE_MASK) == PMD_TABLE_BIT)
>>>> +		return true;
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static inline bool pmd_mapped(pmd_t pmd)
>>>> +{
>>>> +	return pmd_sect(pmd);
>>>> +}
>>>> +
>>>> +static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>>> +{
>>>> +	/*
>>>> +	 * Invalidation should not have been invoked on
>>>> +	 * a PMD table entry. Just warn here otherwise.
>>>> +	 */
>>>> +	WARN_ON(pmd_table(pmd));
>>>> +	return pmd_mksplitting(pmd);
>>>> +}
>>>
>>> And here we wouldn't need t worry about table checks.
>>
>> This is just a temporary sanity check validating the assumption
>> that a table entry would never be called with pmdp_invalidate().
>> This can be dropped later on if required.
> 
> You could use a VM_WARN_ON.
> 
>>>> +static inline int pmd_present(pmd_t pmd);
>>>> +
>>>> +static inline int pmd_trans_huge(pmd_t pmd)
>>>> +{
>>>> +	if (!pmd_present(pmd))
>>>> +		return 0;
>>>> +
>>>> +	if (!pmd_val(pmd))
>>>> +		return 0;
>>>> +
>>>> +	if (pmd_mapped(pmd))
>>>> +		return 1;
>>>> +
>>>> +	if (pmd_splitting(pmd))
>>>> +		return 1;
>>>> +	return 0;
>>>
>>> Doesn't your new pmd_present() already check for splitting? I think
>>
>> I actually meant pte_present() here instead, my bad.
>>
>>> checking for bit 0 and the new PMD_PRESENT. That would be similar to
>>> what we do with PTE_PROT_NONE. Actually, you could use the same bit for
>>> both.
>>
>> IIUC PROT NONE is supported at PMD level as well. Hence with valid bit
>> cleared, there is a chance for misinterpretation between pmd_protnone()
>> and pmd_splitting() if the same bit (PTE_PROT_NONE) is used.
> 
> We can indeed have a PROT_NONE pmd but does it matter? All you need is
> that pmdp_invalidate() returns the original (present pmd) and writes a
> value that is still pmd_present() while invalid. You never modify the
> new value again AFAICT (only the old one to rebuild the pmd).

But during the time when PMD entry remains invalidated but still present,
it will be identical to pmd_protnone() if we choose to use PROT_NONE bit
here to have pmd_present() return positive. Because invalidated PMD entry
is not necessarily a pmd_protnone() entry.

> 
> It is indeed a problem if set_pmd_at() clears the new
> PMD_INVALID_PRESENT bit but my understanding is that it doesn't need to
> (see below).
> 
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index 990929c8837e..337519031115 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -22,6 +22,8 @@
>>>>  #include <linux/io.h>
>>>>  #include <linux/mm.h>
>>>>  #include <linux/vmalloc.h>
>>>> +#include <linux/swap.h>
>>>> +#include <linux/swapops.h>
>>>>  
>>>>  #include <asm/barrier.h>
>>>>  #include <asm/cputype.h>
>>>> @@ -1483,3 +1485,21 @@ static int __init prevent_bootmem_remove_init(void)
>>>>  }
>>>>  device_initcall(prevent_bootmem_remove_init);
>>>>  #endif
>>>> +
>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>>>> +		pmd_t *pmdp, pmd_t pmd)
>>>> +{
>>>> +	/*
>>>> +	 * PMD migration entries need to retain splitting PMD
>>>> +	 * representation created with pmdp_invalidate(). But
>>>> +	 * any non-migration entry which just might have been
>>>> +	 * invalidated previously, still need be a normal huge
>>>> +	 * page. Hence selectively clear splitting entries.
>>>> +	 */
>>>> +	if (!is_migration_entry(pmd_to_swp_entry(pmd)))
>>>> +		pmd = pmd_clrsplitting(pmd);
>>>> +
>>>> +	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd));
>>>> +}
>>>> +#endif
>>>
>>> So a pmdp_invalidate() returns the old pmd. Do we ever need to rebuild a
>>> pmd based on the actual bits in the new invalidated pmdp? Wondering how
>>> the table bit ends up here that we need to pmd_clrsplitting().
>>
>> Yes, a pmd is always rebuilt via set_pmd_at() with the old value as
>> returned from an earlier pmdp_invalidate() but which may have been
>> changed with standard page table entry transformations. Basically,
>> it will not be created afresh from the pfn and VMA flags.
> 
> My point is that pmdp_invalidate() is never called on an already invalid
> pmd. A valid pmd should never have the PMD_INVALID_PRESENT bit set.
> Therefore, set_pmd_at() does not need to clear any such bit as it wasn't
> in the old value returned by pmdp_invalidate().
> 
>> Any additional bit set in PMD via pmdp_invalidate() needs to be
>> cleared off in set_pmd_at(), unless it is a migration entry.
> 
> I'm not convinced we need to unless we nest pmdp_invalidate() calls
> (have you seen any evidence of this?).

You are right, set_pmd_at() does not need to clear that extra bit. As you
had suggested earlier, using bit 59 as PMD_PRESENT_INVALID here does work.
Will send out the next version soon.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-08-17  5:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 13:15 [RFC V2 0/2] arm64/mm: Enable THP migration Anshuman Khandual
2020-06-15 13:15 ` [RFC V2 1/2] arm64/mm: Change THP helpers per generic memory semantics Anshuman Khandual
2020-07-02 12:11   ` Catalin Marinas
2020-07-06  3:57     ` Anshuman Khandual
2020-07-07 17:44       ` Catalin Marinas
2020-08-17  5:43         ` Anshuman Khandual
2020-06-15 13:15 ` [RFC V2 2/2] arm64/mm: Enable THP migration Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).