linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] arm64/mm: Enable THP migration
@ 2019-06-27 12:48 Anshuman Khandual
  2019-06-27 12:48 ` [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Anshuman Khandual
  2019-06-27 12:48 ` [RFC 2/2] arm64/mm: Enable THP migration without split Anshuman Khandual
  0 siblings, 2 replies; 10+ messages in thread
From: Anshuman Khandual @ 2019-06-27 12:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Suzuki Poulose, linux-arm-kernel, linux-kernel

This series enables THP migration without split on arm64 by subscribing
to ARCH_ENABLE_THP_MIGRATION. Before that it modifies arm64 platform THP
helpers like pmd_present() and pmd_trans_huge() to comply with expected
generic MM semantics as concluded from a previous discussion [1].

Initial THP migration and stress tests look good for various THP sizes. I
will continue testing this further. But meanwhile looking for some early
reviews, feedbacks and suggestions on the approach.

This is based on linux-next tree (next-20190626).

Question:

Instead of directly using PTE_SPECIAL, would it be better to override the
same bit as PMD_SPLITTING and create it's associated helpers to make this
more clear and explicit ?

[1] https://lkml.org/lkml/2018/10/9/220

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (2):
  arm64/mm: Change THP helpers to comply with generic MM semantics
  arm64/mm: Enable THP migration without split

 arch/arm64/Kconfig               |  4 ++++
 arch/arm64/include/asm/pgtable.h | 32 +++++++++++++++++++++++++++++---
 2 files changed, 33 insertions(+), 3 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-06-27 12:48 [RFC 0/2] arm64/mm: Enable THP migration Anshuman Khandual
@ 2019-06-27 12:48 ` Anshuman Khandual
  2019-06-27 15:31   ` Zi Yan
  2019-06-28 10:20   ` Catalin Marinas
  2019-06-27 12:48 ` [RFC 2/2] arm64/mm: Enable THP migration without split Anshuman Khandual
  1 sibling, 2 replies; 10+ messages in thread
From: Anshuman Khandual @ 2019-06-27 12:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Suzuki Poulose, linux-arm-kernel, linux-kernel

pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a given PMD. It is derived from a previous
detailed discussion on this topic [1] and present THP documentation [2].

pmd_present(pmd):

- Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
- Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)

pmd_trans_huge(pmd):

- Returns true if pmd refers to system RAM and is a trans huge mapping

-------------------------------------------------------------------------
|	PMD states	|	pmd_present	|	pmd_trans_huge	|
-------------------------------------------------------------------------
|	Mapped		|	Yes		|	Yes		|
-------------------------------------------------------------------------
|	Splitting	|	Yes		|	Yes		|
-------------------------------------------------------------------------
|	Migration/Swap	|	No		|	No		|
-------------------------------------------------------------------------

The problem:

PMD is first invalidated with pmdp_invalidate() before it's splitting. This
invalidation clears PMD_SECT_VALID as below.

PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID

Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
affirm pmd_present() as true during the THP split process. To comply with
above mentioned semantics, pmd_trans_huge() should also check pmd_present()
first before testing presence of an actual transparent huge mapping.

The solution:

Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
it will not be there for pmd_present() check after pmdp_invalidate().

PTE_SPECIAL never gets used for PMD mapping i.e there is no pmd_special().
Hence this bit can be set on the PMD entry during invalidation which can
help in making pmd_present() return true and in recognizing the fact that
it still points to memory.

This bit is transient. During the split is process it will be overridden
by a page table page representing the normal pages in place of erstwhile
huge page. Other pmdp_invalidate() callers always write a fresh PMD value
on the entry overriding this transient PTE_SPECIAL making it safe. In the
past former pmd_[mk]splitting() functions used PTE_SPECIAL.

[1]: https://lkml.org/lkml/2018/10/17/231
[2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 87a4b2d..90d4e24 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -368,15 +368,31 @@ static inline int pmd_protnone(pmd_t pmd)
 }
 #endif
 
+static inline int pmd_present(pmd_t pmd)
+{
+	if (pte_present(pmd_pte(pmd)))
+		return 1;
+
+	return pte_special(pmd_pte(pmd));
+}
+
 /*
  * THP definitions.
  */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+	if (!pmd_val(pmd))
+		return 0;
+
+	if (!pmd_present(pmd))
+		return 0;
+
+	return !(pmd_val(pmd) & PMD_TABLE_BIT);
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-#define pmd_present(pmd)	pte_present(pmd_pte(pmd))
 #define pmd_dirty(pmd)		pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_valid(pmd)		pte_valid(pmd_pte(pmd))
@@ -386,7 +402,12 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_mkclean(pmd)	pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)	pte_pmd(pte_mkdirty(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)	pte_pmd(pte_mkyoung(pmd_pte(pmd)))
-#define pmd_mknotpresent(pmd)	(__pmd(pmd_val(pmd) & ~PMD_SECT_VALID))
+
+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+	pmd = pte_pmd(pte_mkspecial(pmd_pte(pmd)));
+	return __pmd(pmd_val(pmd) & ~PMD_SECT_VALID);
+}
 
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/2] arm64/mm: Enable THP migration without split
  2019-06-27 12:48 [RFC 0/2] arm64/mm: Enable THP migration Anshuman Khandual
  2019-06-27 12:48 ` [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Anshuman Khandual
@ 2019-06-27 12:48 ` Anshuman Khandual
  1 sibling, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2019-06-27 12:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Suzuki Poulose, linux-arm-kernel, linux-kernel

In certain page migration situations a THP page can be migrated without
being split into it's constituent normal pages. This not only saves time
required to split the THP page later to be put back when required but also
saves an wider address range translation from the existing huge TLB entry
reducing future page fault costs.

The previous patch changed the THP helper functions which now complies with
the generic MM semantics clearing the path for THP migration support. Hence
just enable it.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/Kconfig               | 4 ++++
 arch/arm64/include/asm/pgtable.h | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0758d89..27bd8c4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1589,6 +1589,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	def_bool y
 	depends on HUGETLB_PAGE && MIGRATION
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on TRANSPARENT_HUGEPAGE
+
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 90d4e24..4860573 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -851,6 +851,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(swp)	((pte_t) { (swp).val })
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#define __pmd_to_swp_entry(pmd)        ((swp_entry_t) { pmd_val(pmd) })
+#define __swp_entry_to_pmd(swp)        __pmd((swp).val)
+#endif
+
 /*
  * Ensure that there are not more swap files than can be encoded in the kernel
  * PTEs.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-06-27 12:48 ` [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Anshuman Khandual
@ 2019-06-27 15:31   ` Zi Yan
  2019-06-28  3:46     ` Anshuman Khandual
  2019-06-28 10:20   ` Catalin Marinas
  1 sibling, 1 reply; 10+ messages in thread
From: Zi Yan @ 2019-06-27 15:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Suzuki Poulose, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]

On 27 Jun 2019, at 8:48, Anshuman Khandual wrote:

> pmd_present() and pmd_trans_huge() are expected to behave in the following
> manner during various phases of a given PMD. It is derived from a previous
> detailed discussion on this topic [1] and present THP documentation [2].
>
> pmd_present(pmd):
>
> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
>
> pmd_trans_huge(pmd):
>
> - Returns true if pmd refers to system RAM and is a trans huge mapping
>
> -------------------------------------------------------------------------
> |	PMD states	|	pmd_present	|	pmd_trans_huge	|
> -------------------------------------------------------------------------
> |	Mapped		|	Yes		|	Yes		|
> -------------------------------------------------------------------------
> |	Splitting	|	Yes		|	Yes		|
> -------------------------------------------------------------------------
> |	Migration/Swap	|	No		|	No		|
> -------------------------------------------------------------------------
>
> The problem:
>
> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
> invalidation clears PMD_SECT_VALID as below.
>
> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
>
> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
> on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
> affirm pmd_present() as true during the THP split process. To comply with
> above mentioned semantics, pmd_trans_huge() should also check pmd_present()
> first before testing presence of an actual transparent huge mapping.
>
> The solution:
>
> Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
> bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
> it will not be there for pmd_present() check after pmdp_invalidate().
>
> PTE_SPECIAL never gets used for PMD mapping i.e there is no pmd_special().
> Hence this bit can be set on the PMD entry during invalidation which can
> help in making pmd_present() return true and in recognizing the fact that
> it still points to memory.
>
> This bit is transient. During the split is process it will be overridden
> by a page table page representing the normal pages in place of erstwhile
> huge page. Other pmdp_invalidate() callers always write a fresh PMD value
> on the entry overriding this transient PTE_SPECIAL making it safe. In the
> past former pmd_[mk]splitting() functions used PTE_SPECIAL.
>
> [1]: https://lkml.org/lkml/2018/10/17/231

Just want to point out that lkml.org link might not be stable.
This one would be better: https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-06-27 15:31   ` Zi Yan
@ 2019-06-28  3:46     ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2019-06-28  3:46 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Suzuki Poulose, linux-arm-kernel, linux-kernel



On 06/27/2019 09:01 PM, Zi Yan wrote:
> On 27 Jun 2019, at 8:48, Anshuman Khandual wrote:
> 
>> pmd_present() and pmd_trans_huge() are expected to behave in the following
>> manner during various phases of a given PMD. It is derived from a previous
>> detailed discussion on this topic [1] and present THP documentation [2].
>>
>> pmd_present(pmd):
>>
>> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
>> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
>>
>> pmd_trans_huge(pmd):
>>
>> - Returns true if pmd refers to system RAM and is a trans huge mapping
>>
>> -------------------------------------------------------------------------
>> |	PMD states	|	pmd_present	|	pmd_trans_huge	|
>> -------------------------------------------------------------------------
>> |	Mapped		|	Yes		|	Yes		|
>> -------------------------------------------------------------------------
>> |	Splitting	|	Yes		|	Yes		|
>> -------------------------------------------------------------------------
>> |	Migration/Swap	|	No		|	No		|
>> -------------------------------------------------------------------------
>>
>> The problem:
>>
>> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
>> invalidation clears PMD_SECT_VALID as below.
>>
>> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
>>
>> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
>> on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
>> affirm pmd_present() as true during the THP split process. To comply with
>> above mentioned semantics, pmd_trans_huge() should also check pmd_present()
>> first before testing presence of an actual transparent huge mapping.
>>
>> The solution:
>>
>> Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
>> bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
>> it will not be there for pmd_present() check after pmdp_invalidate().
>>
>> PTE_SPECIAL never gets used for PMD mapping i.e there is no pmd_special().
>> Hence this bit can be set on the PMD entry during invalidation which can
>> help in making pmd_present() return true and in recognizing the fact that
>> it still points to memory.
>>
>> This bit is transient. During the split is process it will be overridden
>> by a page table page representing the normal pages in place of erstwhile
>> huge page. Other pmdp_invalidate() callers always write a fresh PMD value
>> on the entry overriding this transient PTE_SPECIAL making it safe. In the
>> past former pmd_[mk]splitting() functions used PTE_SPECIAL.
>>
>> [1]: https://lkml.org/lkml/2018/10/17/231
> 
> Just want to point out that lkml.org link might not be stable.
> This one would be better: https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/

Sure will update the link in the commit. Thanks !

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-06-27 12:48 ` [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Anshuman Khandual
  2019-06-27 15:31   ` Zi Yan
@ 2019-06-28 10:20   ` Catalin Marinas
  2019-07-02  3:37     ` Anshuman Khandual
  1 sibling, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2019-06-28 10:20 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, Will Deacon, Mark Rutland, Marc Zyngier,
	Suzuki Poulose, linux-arm-kernel, linux-kernel

Hi Anshuman,

On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
> pmd_present() and pmd_trans_huge() are expected to behave in the following
> manner during various phases of a given PMD. It is derived from a previous
> detailed discussion on this topic [1] and present THP documentation [2].
> 
> pmd_present(pmd):
> 
> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
> 
> pmd_trans_huge(pmd):
> 
> - Returns true if pmd refers to system RAM and is a trans huge mapping
> 
> -------------------------------------------------------------------------
> |	PMD states	|	pmd_present	|	pmd_trans_huge	|
> -------------------------------------------------------------------------
> |	Mapped		|	Yes		|	Yes		|
> -------------------------------------------------------------------------
> |	Splitting	|	Yes		|	Yes		|
> -------------------------------------------------------------------------
> |	Migration/Swap	|	No		|	No		|
> -------------------------------------------------------------------------

Before we actually start fixing this, I would strongly suggest that you
add a boot selftest (see lib/Kconfig.debug for other similar cases)
which checks the consistency of the page table macros w.r.t. the
expected mm semantics. Once the mm maintainers agreed with the
semantics, it will really help architecture maintainers in implementing
them correctly.

You wouldn't need actual page tables, just things like assertions on
pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
dummy_* variables on the stack.

> The problem:
> 
> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
> invalidation clears PMD_SECT_VALID as below.
> 
> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
> 
> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
> on the PMD entry.

I think that's an inconsistency in the expected semantics here. Do you
mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do
we need to implement our own pmdp_invalidate() or change the generic one
to set a "special" bit instead of just a pmd_mknotpresent?

> +static inline int pmd_present(pmd_t pmd)
> +{
> +	if (pte_present(pmd_pte(pmd)))
> +		return 1;
> +
> +	return pte_special(pmd_pte(pmd));
> +}
[...]
> +static inline pmd_t pmd_mknotpresent(pmd_t pmd)
> +{
> +	pmd = pte_pmd(pte_mkspecial(pmd_pte(pmd)));
> +	return __pmd(pmd_val(pmd) & ~PMD_SECT_VALID);
> +}

I'm not sure I agree with the semantics here where pmd_mknotpresent()
does not actually make pmd_present() == false.

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-06-28 10:20   ` Catalin Marinas
@ 2019-07-02  3:37     ` Anshuman Khandual
  2019-07-03 17:52       ` Catalin Marinas
  0 siblings, 1 reply; 10+ messages in thread
From: Anshuman Khandual @ 2019-07-02  3:37 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, Will Deacon, Mark Rutland, Marc Zyngier,
	Suzuki Poulose, linux-arm-kernel, linux-kernel, Andrea Arcangeli


On 06/28/2019 03:50 PM, Catalin Marinas wrote:
> Hi Anshuman,

Hello Catalin,

> 
> On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
>> pmd_present() and pmd_trans_huge() are expected to behave in the following
>> manner during various phases of a given PMD. It is derived from a previous
>> detailed discussion on this topic [1] and present THP documentation [2].
>>
>> pmd_present(pmd):
>>
>> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
>> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
>>
>> pmd_trans_huge(pmd):
>>
>> - Returns true if pmd refers to system RAM and is a trans huge mapping
>>
>> -------------------------------------------------------------------------
>> |	PMD states	|	pmd_present	|	pmd_trans_huge	|
>> -------------------------------------------------------------------------
>> |	Mapped		|	Yes		|	Yes		|
>> -------------------------------------------------------------------------
>> |	Splitting	|	Yes		|	Yes		|
>> -------------------------------------------------------------------------
>> |	Migration/Swap	|	No		|	No		|
>> -------------------------------------------------------------------------
> 
> Before we actually start fixing this, I would strongly suggest that you
> add a boot selftest (see lib/Kconfig.debug for other similar cases)
> which checks the consistency of the page table macros w.r.t. the
> expected mm semantics. Once the mm maintainers agreed with the
> semantics, it will really help architecture maintainers in implementing
> them correctly.

Sure and it will help all architectures to be in sync wrt semantics.

> 
> You wouldn't need actual page tables, just things like assertions on
> pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
> checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
> dummy_* variables on the stack.

Hmm. I guess macros which operate directly on a page table entry will be
okay but the ones which check on specific states for VMA or MM might be
bit tricky. Try to emulate VMA/MM states while on stack ?. But sure, will
explore adding such a test.

> 
>> The problem:
>>
>> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
>> invalidation clears PMD_SECT_VALID as below.
>>
>> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
>>
>> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
>> on the PMD entry.
> 
> I think that's an inconsistency in the expected semantics here. Do you
> mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do

Actually that is true (more so if we are using generic pmdp_invalidate). Else
in general pmd_present(pmdp_invalidate(pmd)) needs to be true to successfully
represent a splitting THP. That is what Andrea explained back on this thread
(https://lkml.org/lkml/2018/10/17/231).

Extracting relevant sections from that thread -

"pmd_present never meant the real present bit in the pte was set, it just means
the pmd points to RAM. It means it doesn't point to swap or migration entry and
you can do pmd_to_page and it works fine."

"The clear of the real present bit during pmd (virtual) splitting is done with
pmdp_invalidate, that is created specifically to keeps pmd_trans_huge=true,
pmd_present=true despite the present bit is not set. So you could imagine
_PAGE_PSE as the real present bit."

pmd_present() and pmd_mknotpresent() are not exact inverse.

Problem is all platforms using generic pmdp_invalidate() calls pmd_mknotpresent()
which invariably across platforms remove the valid or present bit from the entry.
The point to note here is that pmd_mknotpresent() invalidates the entry from MMU
point of view but pmd_present() does not check for a MMU valid PMD entry. Hence
pmd_present(pmd_mknotpresent(pmd)) can still be true.

In absence of a positive section mapping bit on arm64, PTE_SPECIAL is being set
temporarily to remember that it was a mapped PMD which got invalidated recently
but which still points to memory. Hence pmd_present() must evaluate true.

pmd_mknotpresent() does not make !pmd_present() it just invalidates the entry.

> we need to implement our own pmdp_invalidate() or change the generic one
> to set a "special" bit instead of just a pmd_mknotpresent?

Though arm64 can subscribe __HAVE_ARCH_PMDP_INVALIDATE and implement it's own
pmdp_invalidate() in order to not call pmd_mknotpresent() and instead operate
on the invalid and special bits directly. But its not going to alter relevant
semantics here. AFAICS it might be bit better as it saves pmd_mknotpresent()
from putting in that special bit in there which it is not supposed do.

IFAICS there is no compelling reason for generic pmdp_invalidate() to change
either. It calls pmd_mknotpresent() which invalidates the entry through valid
or present bit and platforms which have dedicated huge page bit can still test
positive for pmd_present() after it's invalidation. It works for such platforms.
Platform specific override is required when invalidation via pmd_mknotpresent()
is not enough.

> 
>> +static inline int pmd_present(pmd_t pmd)
>> +{
>> +	if (pte_present(pmd_pte(pmd)))
>> +		return 1;
>> +
>> +	return pte_special(pmd_pte(pmd));
>> +}
> [...]
>> +static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>> +{
>> +	pmd = pte_pmd(pte_mkspecial(pmd_pte(pmd)));
>> +	return __pmd(pmd_val(pmd) & ~PMD_SECT_VALID);
>> +}
> 
> I'm not sure I agree with the semantics here where pmd_mknotpresent()
> does not actually make pmd_present() == false.

As Andrea explained, pmd_present() does not check validity of the PMD entry
from MMU perspective but the presence of a valid pmd_page() which still refers
to a valid struct page in the memory. It is irrespective of whether the entry
in itself is valid for MMU walk or not.

+ Cc: Andrea Arcangeli <aarcange@redhat.com>

I have added Andrea on this thread if he would like to add something.

- Anshuman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-07-02  3:37     ` Anshuman Khandual
@ 2019-07-03 17:52       ` Catalin Marinas
  2019-07-08  4:27         ` Anshuman Khandual
  0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2019-07-03 17:52 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, Andrea Arcangeli, Suzuki Poulose, Marc Zyngier,
	linux-kernel, linux-mm, Will Deacon, linux-arm-kernel

On Tue, Jul 02, 2019 at 09:07:28AM +0530, Anshuman Khandual wrote:
> On 06/28/2019 03:50 PM, Catalin Marinas wrote:
> > On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
> >> pmd_present() and pmd_trans_huge() are expected to behave in the following
> >> manner during various phases of a given PMD. It is derived from a previous
> >> detailed discussion on this topic [1] and present THP documentation [2].
> >>
> >> pmd_present(pmd):
> >>
> >> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
> >> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
> >>
> >> pmd_trans_huge(pmd):
> >>
> >> - Returns true if pmd refers to system RAM and is a trans huge mapping
[...]
> > Before we actually start fixing this, I would strongly suggest that you
> > add a boot selftest (see lib/Kconfig.debug for other similar cases)
> > which checks the consistency of the page table macros w.r.t. the
> > expected mm semantics. Once the mm maintainers agreed with the
> > semantics, it will really help architecture maintainers in implementing
> > them correctly.
> 
> Sure and it will help all architectures to be in sync wrt semantics.
> 
> > You wouldn't need actual page tables, just things like assertions on
> > pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
> > checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
> > dummy_* variables on the stack.
> 
> Hmm. I guess macros which operate directly on a page table entry will be
> okay but the ones which check on specific states for VMA or MM might be
> bit tricky. Try to emulate VMA/MM states while on stack ?. But sure, will
> explore adding such a test.

You can pretend that the page table is on the stack. See the _pmd
variable in do_huge_pmd_wp_page_fallback() and
__split_huge_zero_page_pmd(). Similarly, the vma and even the mm can be
faked on the stack (see the arm64 tlb_flush()).

> >> The problem:
> >>
> >> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
> >> invalidation clears PMD_SECT_VALID as below.
> >>
> >> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
> >>
> >> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
> >> on the PMD entry.
> > 
> > I think that's an inconsistency in the expected semantics here. Do you
> > mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do
[...]
> pmd_present() and pmd_mknotpresent() are not exact inverse.

I find this very confusing (not your fault, just the semantics expected
by the core code). I can see that x86 is using _PAGE_PSE to make
pmd_present(pmd_mknotpresent()) == true. However, for pud that's not the
case (because it's not used for transhuge).

I'd rather have this renamed to pmd_mknotvalid().

> In absence of a positive section mapping bit on arm64, PTE_SPECIAL is being set
> temporarily to remember that it was a mapped PMD which got invalidated recently
> but which still points to memory. Hence pmd_present() must evaluate true.

I wonder if we can encode this safely for arm64 in the bottom two bits
of a pmd :

0b00 - not valid, not present
0b10 - not valid, present, huge
0b01 - valid, present, huge
0b11 - valid, table (not huge)

Do we ever call pmdp_invalidate() on a table entry? I don't think we do.

So a pte_mknotvalid would set bit 1 and I think swp_entry_to_pmd() would
have to clear it so that pmd_present() actually returns false for a swp
pmd entry.

> > we need to implement our own pmdp_invalidate() or change the generic one
> > to set a "special" bit instead of just a pmd_mknotpresent?
> 
> Though arm64 can subscribe __HAVE_ARCH_PMDP_INVALIDATE and implement it's own
> pmdp_invalidate() in order to not call pmd_mknotpresent() and instead operate
> on the invalid and special bits directly. But its not going to alter relevant
> semantics here. AFAICS it might be bit better as it saves pmd_mknotpresent()
> from putting in that special bit in there which it is not supposed do.
> 
> IFAICS there is no compelling reason for generic pmdp_invalidate() to change
> either. It calls pmd_mknotpresent() which invalidates the entry through valid
> or present bit and platforms which have dedicated huge page bit can still test
> positive for pmd_present() after it's invalidation. It works for such platforms.
> Platform specific override is required when invalidation via pmd_mknotpresent()
> is not enough.

I'd really like the mknotpresent to be renamed to mknotvalid and then we
can keep pmdp_invalidate unchanged (well, calling mknotvalid instead).

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-07-03 17:52       ` Catalin Marinas
@ 2019-07-08  4:27         ` Anshuman Khandual
  2020-04-01  8:14           ` Anshuman Khandual
  0 siblings, 1 reply; 10+ messages in thread
From: Anshuman Khandual @ 2019-07-08  4:27 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Andrea Arcangeli, Suzuki Poulose, Marc Zyngier,
	linux-kernel, linux-mm, Will Deacon, linux-arm-kernel



On 07/03/2019 11:22 PM, Catalin Marinas wrote:
> On Tue, Jul 02, 2019 at 09:07:28AM +0530, Anshuman Khandual wrote:
>> On 06/28/2019 03:50 PM, Catalin Marinas wrote:
>>> On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
>>>> pmd_present() and pmd_trans_huge() are expected to behave in the following
>>>> manner during various phases of a given PMD. It is derived from a previous
>>>> detailed discussion on this topic [1] and present THP documentation [2].
>>>>
>>>> pmd_present(pmd):
>>>>
>>>> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
>>>> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
>>>>
>>>> pmd_trans_huge(pmd):
>>>>
>>>> - Returns true if pmd refers to system RAM and is a trans huge mapping
> [...]
>>> Before we actually start fixing this, I would strongly suggest that you
>>> add a boot selftest (see lib/Kconfig.debug for other similar cases)
>>> which checks the consistency of the page table macros w.r.t. the
>>> expected mm semantics. Once the mm maintainers agreed with the
>>> semantics, it will really help architecture maintainers in implementing
>>> them correctly.
>>
>> Sure and it will help all architectures to be in sync wrt semantics.
>>
>>> You wouldn't need actual page tables, just things like assertions on
>>> pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
>>> checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
>>> dummy_* variables on the stack.
>>
>> Hmm. I guess macros which operate directly on a page table entry will be
>> okay but the ones which check on specific states for VMA or MM might be
>> bit tricky. Try to emulate VMA/MM states while on stack ?. But sure, will
>> explore adding such a test.
> 
> You can pretend that the page table is on the stack. See the _pmd
> variable in do_huge_pmd_wp_page_fallback() and
> __split_huge_zero_page_pmd(). Similarly, the vma and even the mm can be
> faked on the stack (see the arm64 tlb_flush()).

Sure will explore them and other similar examples. I am already working on a module
which will test various architecture page table accessors semantics as expected from
generic MM. This should help us making sure that all architectures are on same page.

> 
>>>> The problem:
>>>>
>>>> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
>>>> invalidation clears PMD_SECT_VALID as below.
>>>>
>>>> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
>>>>
>>>> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
>>>> on the PMD entry.
>>>
>>> I think that's an inconsistency in the expected semantics here. Do you
>>> mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do
> [...]
>> pmd_present() and pmd_mknotpresent() are not exact inverse.
> 
> I find this very confusing (not your fault, just the semantics expected
> by the core code). I can see that x86 is using _PAGE_PSE to make
> pmd_present(pmd_mknotpresent()) == true. However, for pud that's not the
> case (because it's not used for transhuge).
> 
> I'd rather have this renamed to pmd_mknotvalid().

Right, it makes sense to do the renaming even without considering this proposal.

> 
>> In absence of a positive section mapping bit on arm64, PTE_SPECIAL is being set
>> temporarily to remember that it was a mapped PMD which got invalidated recently
>> but which still points to memory. Hence pmd_present() must evaluate true.
> 
> I wonder if we can encode this safely for arm64 in the bottom two bits
> of a pmd :
> 
> 0b00 - not valid, not present
> 0b10 - not valid, present, huge
> 0b01 - valid, present, huge
> 0b11 - valid, table (not huge)
> 
> Do we ever call pmdp_invalidate() on a table entry? I don't think we do.
> 
> So a pte_mknotvalid would set bit 1 and I think swp_entry_to_pmd() would
> have to clear it so that pmd_present() actually returns false for a swp
> pmd entry.

All these makes it riskier for collision with other core MM paths as compared to
using a an isolated SW bit like PTE_SPECIAL exclusively for this purpose. This
is in line with using PTE_PROTNONE. PTE_SPECIAL seems to be well away from core
PMD path. Is there any particular concern about using PTE_SPECIAL ? Nonetheless
I will evaluate above proposal of using (0b10) to represent invalid but present
huge PMD entry during splitting.

> 
>>> we need to implement our own pmdp_invalidate() or change the generic one
>>> to set a "special" bit instead of just a pmd_mknotpresent?
>>
>> Though arm64 can subscribe __HAVE_ARCH_PMDP_INVALIDATE and implement it's own
>> pmdp_invalidate() in order to not call pmd_mknotpresent() and instead operate
>> on the invalid and special bits directly. But its not going to alter relevant
>> semantics here. AFAICS it might be bit better as it saves pmd_mknotpresent()
>> from putting in that special bit in there which it is not supposed do.
>>
>> IFAICS there is no compelling reason for generic pmdp_invalidate() to change
>> either. It calls pmd_mknotpresent() which invalidates the entry through valid
>> or present bit and platforms which have dedicated huge page bit can still test
>> positive for pmd_present() after it's invalidation. It works for such platforms.
>> Platform specific override is required when invalidation via pmd_mknotpresent()
>> is not enough.
> 
> I'd really like the mknotpresent to be renamed to mknotvalid and then we
> can keep pmdp_invalidate unchanged (well, calling mknotvalid instead).
> 

Though this change really makes sense just from fixing generic pmdp_invalidate()
perspective as all it asks is to invalidate the PMD entry not mark them non-present
and currently calling pmd_mknotpresent() in that sense is bit misleading.

But for arm64 I believe implementing arch specific pmdp_invalidate() via subscribing
__HAVE_ARCH_PMDP_INVALIDATE is bit better. Because the implementation needs more than
just a PMD entry invalidation even with above proposed 0b10 method or with PTE_SPECIAL.
pmd_mknotvalid() should not do that additional stuff but instead a platform specific 
pmdp_invalidate() can incorporate that after doing the real invalidation i.e clearing
the bit 0 in pmd_mknotvalid().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  2019-07-08  4:27         ` Anshuman Khandual
@ 2020-04-01  8:14           ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2020-04-01  8:14 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Andrea Arcangeli, Suzuki Poulose, Marc Zyngier,
	linux-kernel, linux-mm, Will Deacon, linux-arm-kernel



On 07/08/2019 09:57 AM, Anshuman Khandual wrote:
> 
> On 07/03/2019 11:22 PM, Catalin Marinas wrote:
>> On Tue, Jul 02, 2019 at 09:07:28AM +0530, Anshuman Khandual wrote:
>>> On 06/28/2019 03:50 PM, Catalin Marinas wrote:
>>>> On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
>>>>> pmd_present() and pmd_trans_huge() are expected to behave in the following
>>>>> manner during various phases of a given PMD. It is derived from a previous
>>>>> detailed discussion on this topic [1] and present THP documentation [2].
>>>>>
>>>>> pmd_present(pmd):
>>>>>
>>>>> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
>>>>> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)
>>>>>
>>>>> pmd_trans_huge(pmd):
>>>>>
>>>>> - Returns true if pmd refers to system RAM and is a trans huge mapping
>> [...]
>>>> Before we actually start fixing this, I would strongly suggest that you
>>>> add a boot selftest (see lib/Kconfig.debug for other similar cases)
>>>> which checks the consistency of the page table macros w.r.t. the
>>>> expected mm semantics. Once the mm maintainers agreed with the
>>>> semantics, it will really help architecture maintainers in implementing
>>>> them correctly.
>>> Sure and it will help all architectures to be in sync wrt semantics.
>>>
>>>> You wouldn't need actual page tables, just things like assertions on
>>>> pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
>>>> checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
>>>> dummy_* variables on the stack.
>>> Hmm. I guess macros which operate directly on a page table entry will be
>>> okay but the ones which check on specific states for VMA or MM might be
>>> bit tricky. Try to emulate VMA/MM states while on stack ?. But sure, will
>>> explore adding such a test.
>> You can pretend that the page table is on the stack. See the _pmd
>> variable in do_huge_pmd_wp_page_fallback() and
>> __split_huge_zero_page_pmd(). Similarly, the vma and even the mm can be
>> faked on the stack (see the arm64 tlb_flush()).
> Sure will explore them and other similar examples. I am already working on a module
> which will test various architecture page table accessors semantics as expected from
> generic MM. This should help us making sure that all architectures are on same page.
> 
>>>>> The problem:
>>>>>
>>>>> PMD is first invalidated with pmdp_invalidate() before it's splitting. This
>>>>> invalidation clears PMD_SECT_VALID as below.
>>>>>
>>>>> PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID
>>>>>
>>>>> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
>>>>> on the PMD entry.
>>>> I think that's an inconsistency in the expected semantics here. Do you
>>>> mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do
>> [...]
>>> pmd_present() and pmd_mknotpresent() are not exact inverse.
>> I find this very confusing (not your fault, just the semantics expected
>> by the core code). I can see that x86 is using _PAGE_PSE to make
>> pmd_present(pmd_mknotpresent()) == true. However, for pud that's not the
>> case (because it's not used for transhuge).
>>
>> I'd rather have this renamed to pmd_mknotvalid().
> Right, it makes sense to do the renaming even without considering this proposal.
> 
>>> In absence of a positive section mapping bit on arm64, PTE_SPECIAL is being set
>>> temporarily to remember that it was a mapped PMD which got invalidated recently
>>> but which still points to memory. Hence pmd_present() must evaluate true.
>> I wonder if we can encode this safely for arm64 in the bottom two bits
>> of a pmd :
>>
>> 0b00 - not valid, not present
>> 0b10 - not valid, present, huge
>> 0b01 - valid, present, huge
>> 0b11 - valid, table (not huge)
>>
>> Do we ever call pmdp_invalidate() on a table entry? I don't think we do.
>>
>> So a pte_mknotvalid would set bit 1 and I think swp_entry_to_pmd() would
>> have to clear it so that pmd_present() actually returns false for a swp
>> pmd entry.
> All these makes it riskier for collision with other core MM paths as compared to
> using a an isolated SW bit like PTE_SPECIAL exclusively for this purpose. This
> is in line with using PTE_PROTNONE. PTE_SPECIAL seems to be well away from core
> PMD path. Is there any particular concern about using PTE_SPECIAL ? Nonetheless
> I will evaluate above proposal of using (0b10) to represent invalid but present
> huge PMD entry during splitting.

Tried to implement the proposed encoding scheme from Catalin and it does
seem to work with (or even without) clearing the PMD_TABLE_BIT bit during
__swp_entry_to_pmd(). It clears basic memory stress test with THP enabled.

0b00 - not valid, not present
0b10 - not valid, present, huge		/* Invalidated splitting PMD */
0b01 - valid, present, huge		/* Valid mapped PMD */
0b11 - valid, table (not huge)

Will continue testing this change.

----------->
From 9593fb80eb41984de484fd151cd1140f4cbead7e Mon Sep 17 00:00:00 2001
From: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Tue, 31 Mar 2020 11:38:53 +0100
Subject: [PATCH] arm64/mm: Change THP helpers to comply with generic MM
 semantics

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 74 ++++++++++++++++++++++++++++----
 arch/arm64/mm/hugetlbpage.c      |  2 +-
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 44883038dbe6..86c22a4fa427 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -348,15 +348,72 @@ static inline int pmd_protnone(pmd_t pmd)
 }
 #endif
 
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+				 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+				 PMD_TYPE_SECT)
+static pmd_t pmd_mksplitting(pmd_t pmd)
+{
+	unsigned long mask = PMD_TYPE_MASK;
+	unsigned long val = pmd_val(pmd);
+
+	val = (val & ~mask) | PMD_TABLE_BIT;
+
+	return __pmd(val);
+}
+
+static bool pmd_splitting(pmd_t pmd)
+{
+	unsigned long mask = PMD_TYPE_MASK;
+	unsigned long val = pmd_val(pmd);
+
+	if ((val & mask) == PMD_TABLE_BIT)
+		return true;
+
+	return false;
+}
+
+static bool pmd_mapped(pmd_t pmd)
+{
+	return pmd_sect(pmd);
+}
+
+static inline int pmd_present(pmd_t pmd)
+{
+	pte_t pte = pmd_pte(pmd);
+
+	if (pte_present(pte))
+		return 1;
+
+	if (pmd_splitting(pmd))
+		return 1;
+
+	return 0;
+}
+
 /*
  * THP definitions.
  */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+	if (!pmd_present(pmd))
+		return 0;
+
+	if (!pmd_val(pmd))
+		return 0;
+
+	if (pmd_mapped(pmd))
+		return 1;
+
+	if (pmd_splitting(pmd))
+		return 1;
+
+	return 0;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-#define pmd_present(pmd)	pte_present(pmd_pte(pmd))
 #define pmd_dirty(pmd)		pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_valid(pmd)		pte_valid(pmd_pte(pmd))
@@ -366,7 +423,12 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_mkclean(pmd)	pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)	pte_pmd(pte_mkdirty(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)	pte_pmd(pte_mkyoung(pmd_pte(pmd)))
-#define pmd_mknotvalid(pmd)	(__pmd(pmd_val(pmd) & ~PMD_SECT_VALID))
+
+static inline pmd_t pmd_mknotvalid(pmd_t pmd)
+{
+	BUG_ON(pmd_table(pmd));
+	return pmd_mksplitting(pmd);
+}
 
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
 
@@ -437,10 +499,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 
 #define pmd_bad(pmd)		(!(pmd_val(pmd) & PMD_TABLE_BIT))
 
-#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
-				 PMD_TYPE_TABLE)
-#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
-				 PMD_TYPE_SECT)
 #define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
@@ -834,7 +892,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
 #define __pmd_to_swp_entry(pmd)        ((swp_entry_t) { pmd_val(pmd) })
-#define __swp_entry_to_pmd(swp)        __pmd((swp).val)
+#define __swp_entry_to_pmd(swp)        __pmd((swp).val & ~PMD_TABLE_BIT)
 #endif
 
 /*
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..056fe716b9f8 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -283,7 +283,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (!(sz == PMD_SIZE || sz == CONT_PMD_SIZE) &&
 	    pmd_none(pmd))
 		return NULL;
-	if (pmd_huge(pmd) || !pmd_present(pmd))
+	if (pmd_huge(pmd) || !pte_present(pmd_pte(pmd)))
 		return (pte_t *)pmdp;
 
 	if (sz == CONT_PTE_SIZE)
-- 
2.20.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-04-01  8:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27 12:48 [RFC 0/2] arm64/mm: Enable THP migration Anshuman Khandual
2019-06-27 12:48 ` [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Anshuman Khandual
2019-06-27 15:31   ` Zi Yan
2019-06-28  3:46     ` Anshuman Khandual
2019-06-28 10:20   ` Catalin Marinas
2019-07-02  3:37     ` Anshuman Khandual
2019-07-03 17:52       ` Catalin Marinas
2019-07-08  4:27         ` Anshuman Khandual
2020-04-01  8:14           ` Anshuman Khandual
2019-06-27 12:48 ` [RFC 2/2] arm64/mm: Enable THP migration without split Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).