linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support
@ 2017-08-02  9:48 Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Punit Agrawal
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:48 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Punit Agrawal, linux-mm, steve.capper, linux-arm-kernel, mark.rutland

Hi,

This series re-enables contiguous hugepage support for arm64. Support
for contiguous hugepages is useful on systems where the PMD hugepage
size is too large (512MB hugepage when using 64k page granule) and
contiguous hugepages can be used to provide reasonable hugepage sizes
to the user.

Previous postings can be found at [0], [1], [4], [5].

The patches can be split as

* Patches 1-3, 9 cleanups and improvements

* Patch 4 addresses the break-before-make requirement of the
  architecture for contiguous hugepages.

* Patch 5-7 add support for handling swap entries for contiguous pte
  hugepages.

* Patch 8 enables contiguous hugepage support for arm64

All the dependent series ([2], [3]) for enabling contiguous hugepage
support have been merged in the previous cycle. Additionally, a patch
to clarify the semantics of huge_pte_offset() in generic code[6] is
currently in the -mm tree.

The patches are based on v4.13-rc3. If there are no further comments
please consider merging for the next release cycle.

Thanks,
Punit

[0] https://www.spinics.net/lists/arm-kernel/msg570422.html
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/497027.html
[2] https://www.spinics.net/lists/arm-kernel/msg581657.html
[3] https://www.spinics.net/lists/arm-kernel/msg583342.html
[4] https://www.spinics.net/lists/arm-kernel/msg583367.html
[5] https://www.spinics.net/lists/arm-kernel/msg582758.html
[6] https://lkml.org/lkml/2017/7/25/536

Changes v4 -> v5
* Rebased on v4.13-rc3

Changes v3 -> v4
* Moved Patches 2 and 4 to [3] due to dependencies

Changes v2 -> v3
* Rebased on v4.12-rc2
* Included swap related fixes in this series
* Enable contiguous pte hugepages

Changes v1 -> v2:
* Marked patch 2 for stable
* Fixed comment issues in patch 7
* Added tags
  
Punit Agrawal (4):
  arm64: hugetlb: Handle swap entries in huge_pte_offset() for
    contiguous hugepages
  arm64: hugetlb: Override huge_pte_clear() to support contiguous
    hugepages
  arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous
    hugepages
  arm64: Re-enable support for contiguous hugepages

Steve Capper (5):
  arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present
  arm64: hugetlb: Introduce pte_pgprot helper
  arm64: hugetlb: Spring clean huge pte accessors
  arm64: hugetlb: Add break-before-make logic for contiguous entries
  arm64: hugetlb: Cleanup setup_hugepagesz

 arch/arm64/include/asm/hugetlb.h |   9 +-
 arch/arm64/mm/hugetlbpage.c      | 287 ++++++++++++++++++++++++++++-----------
 2 files changed, 213 insertions(+), 83 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
@ 2017-08-02  9:48 ` Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 2/9] arm64: hugetlb: Introduce pte_pgprot helper Punit Agrawal
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:48 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Steve Capper, linux-mm, linux-arm-kernel, mark.rutland,
	David Woods, Punit Agrawal

From: Steve Capper <steve.capper@arm.com>

This patch adds a WARN_ON to set_huge_pte_at as the accessor assumes
that entries to be written down are all present. (There are separate
accessors to clear huge ptes).

We will need to handle the !pte_present case where memory offlining
is used on hugetlb pages. swap and migration entries will be supplied
to set_huge_pte_at in this case.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 656e0ece2289..7b61e4833432 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -67,6 +67,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long pfn;
 	pgprot_t hugeprot;
 
+	/*
+	 * Code needs to be expanded to handle huge swap and migration
+	 * entries. Needed for HUGETLB and MEMORY_FAILURE.
+	 */
+	WARN_ON(!pte_present(pte));
+
 	if (!pte_cont(pte)) {
 		set_pte_at(mm, addr, ptep, pte);
 		return;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/9] arm64: hugetlb: Introduce pte_pgprot helper
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Punit Agrawal
@ 2017-08-02  9:48 ` Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 3/9] arm64: hugetlb: Spring clean huge pte accessors Punit Agrawal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:48 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Steve Capper, linux-mm, linux-arm-kernel, mark.rutland,
	David Woods, Punit Agrawal

From: Steve Capper <steve.capper@arm.com>

Rather than xor pte bits in various places, use this helper function.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 7b61e4833432..cb84ca33bc6b 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -41,6 +41,16 @@ int pud_huge(pud_t pud)
 #endif
 }
 
+/*
+ * Select all bits except the pfn
+ */
+static inline pgprot_t pte_pgprot(pte_t pte)
+{
+	unsigned long pfn = pte_pfn(pte);
+
+	return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
+}
+
 static int find_num_contig(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, size_t *pgsize)
 {
@@ -80,7 +90,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
 	pfn = pte_pfn(pte);
-	hugeprot = __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
+	hugeprot = pte_pgprot(pte);
 	for (i = 0; i < ncontig; i++) {
 		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
 			 pte_val(pfn_pte(pfn, hugeprot)));
@@ -223,9 +233,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 		size_t pgsize = 0;
 		unsigned long pfn = pte_pfn(pte);
 		/* Select all bits except the pfn */
-		pgprot_t hugeprot =
-			__pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^
-				 pte_val(pte));
+		pgprot_t hugeprot = pte_pgprot(pte);
 
 		pfn = pte_pfn(pte);
 		ncontig = find_num_contig(vma->vm_mm, addr, ptep,
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 3/9] arm64: hugetlb: Spring clean huge pte accessors
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 2/9] arm64: hugetlb: Introduce pte_pgprot helper Punit Agrawal
@ 2017-08-02  9:48 ` Punit Agrawal
  2017-08-02  9:48 ` [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Punit Agrawal
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:48 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Steve Capper, linux-mm, linux-arm-kernel, mark.rutland,
	David Woods, Punit Agrawal

From: Steve Capper <steve.capper@arm.com>

This patch aims to re-structure the huge pte accessors without affecting
their functionality. Control flow is changed to reduce indentation and
expanded use is made of post for loop variable modification.

It is then much easier to add break-before-make semantics in a subsequent
patch.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 119 ++++++++++++++++++++------------------------
 1 file changed, 54 insertions(+), 65 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cb84ca33bc6b..08deed7c71f0 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -74,7 +74,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	size_t pgsize;
 	int i;
 	int ncontig;
-	unsigned long pfn;
+	unsigned long pfn, dpfn;
 	pgprot_t hugeprot;
 
 	/*
@@ -90,14 +90,13 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
 	pfn = pte_pfn(pte);
+	dpfn = pgsize >> PAGE_SHIFT;
 	hugeprot = pte_pgprot(pte);
-	for (i = 0; i < ncontig; i++) {
+
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
 		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
 			 pte_val(pfn_pte(pfn, hugeprot)));
 		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
-		ptep++;
-		pfn += pgsize >> PAGE_SHIFT;
-		addr += pgsize;
 	}
 }
 
@@ -195,91 +194,81 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep)
 {
-	pte_t pte;
-
-	if (pte_cont(*ptep)) {
-		int ncontig, i;
-		size_t pgsize;
-		bool is_dirty = false;
-
-		ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-		/* save the 1st pte to return */
-		pte = ptep_get_and_clear(mm, addr, ptep);
-		for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) {
-			/*
-			 * If HW_AFDBM is enabled, then the HW could
-			 * turn on the dirty bit for any of the page
-			 * in the set, so check them all.
-			 */
-			++ptep;
-			if (pte_dirty(ptep_get_and_clear(mm, addr, ptep)))
-				is_dirty = true;
-		}
-		if (is_dirty)
-			return pte_mkdirty(pte);
-		else
-			return pte;
-	} else {
+	int ncontig, i;
+	size_t pgsize;
+	pte_t orig_pte = huge_ptep_get(ptep);
+
+	if (!pte_cont(orig_pte))
 		return ptep_get_and_clear(mm, addr, ptep);
+
+	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
+	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
+		/*
+		 * If HW_AFDBM is enabled, then the HW could
+		 * turn on the dirty bit for any of the page
+		 * in the set, so check them all.
+		 */
+		if (pte_dirty(ptep_get_and_clear(mm, addr, ptep)))
+			orig_pte = pte_mkdirty(orig_pte);
 	}
+
+	return orig_pte;
 }
 
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       unsigned long addr, pte_t *ptep,
 			       pte_t pte, int dirty)
 {
-	if (pte_cont(pte)) {
-		int ncontig, i, changed = 0;
-		size_t pgsize = 0;
-		unsigned long pfn = pte_pfn(pte);
-		/* Select all bits except the pfn */
-		pgprot_t hugeprot = pte_pgprot(pte);
-
-		pfn = pte_pfn(pte);
-		ncontig = find_num_contig(vma->vm_mm, addr, ptep,
-					  &pgsize);
-		for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) {
-			changed |= ptep_set_access_flags(vma, addr, ptep,
-							pfn_pte(pfn,
-								hugeprot),
-							dirty);
-			pfn += pgsize >> PAGE_SHIFT;
-		}
-		return changed;
-	} else {
+	int ncontig, i, changed = 0;
+	size_t pgsize = 0;
+	unsigned long pfn = pte_pfn(pte), dpfn;
+	pgprot_t hugeprot;
+
+	if (!pte_cont(pte))
 		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
+
+	ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize);
+	dpfn = pgsize >> PAGE_SHIFT;
+	hugeprot = pte_pgprot(pte);
+
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
+		changed |= ptep_set_access_flags(vma, addr, ptep,
+				pfn_pte(pfn, hugeprot), dirty);
 	}
+
+	return changed;
 }
 
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
 			     unsigned long addr, pte_t *ptep)
 {
-	if (pte_cont(*ptep)) {
-		int ncontig, i;
-		size_t pgsize = 0;
+	int ncontig, i;
+	size_t pgsize;
 
-		ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-		for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize)
-			ptep_set_wrprotect(mm, addr, ptep);
-	} else {
+	if (!pte_cont(*ptep)) {
 		ptep_set_wrprotect(mm, addr, ptep);
+		return;
 	}
+
+	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
+		ptep_set_wrprotect(mm, addr, ptep);
 }
 
 void huge_ptep_clear_flush(struct vm_area_struct *vma,
 			   unsigned long addr, pte_t *ptep)
 {
-	if (pte_cont(*ptep)) {
-		int ncontig, i;
-		size_t pgsize = 0;
-
-		ncontig = find_num_contig(vma->vm_mm, addr, ptep,
-					  &pgsize);
-		for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize)
-			ptep_clear_flush(vma, addr, ptep);
-	} else {
+	int ncontig, i;
+	size_t pgsize;
+
+	if (!pte_cont(*ptep)) {
 		ptep_clear_flush(vma, addr, ptep);
+		return;
 	}
+
+	ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize);
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
+		ptep_clear_flush(vma, addr, ptep);
 }
 
 static __init int setup_hugepagesz(char *opt)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (2 preceding siblings ...)
  2017-08-02  9:48 ` [PATCH v5 3/9] arm64: hugetlb: Spring clean huge pte accessors Punit Agrawal
@ 2017-08-02  9:48 ` Punit Agrawal
  2017-08-07 17:19   ` Catalin Marinas
  2017-08-02  9:49 ` [PATCH v5 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Punit Agrawal
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:48 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Steve Capper, linux-mm, linux-arm-kernel, mark.rutland,
	David Woods, Punit Agrawal

From: Steve Capper <steve.capper@arm.com>

It has become apparent that one has to take special care when modifying
attributes of memory mappings that employ the contiguous bit.

Both the requirement and the architecturally correct "Break-Before-Make"
technique of updating contiguous entries can be found described in:
ARM DDI 0487A.k_iss10775, "Misprogramming of the Contiguous bit",
page D4-1762.

The huge pte accessors currently replace the attributes of contiguous
pte entries in place thus can, on certain platforms, lead to TLB
conflict aborts or even erroneous results returned from TLB lookups.

This patch adds a helper function get_clear_flush(.) that clears a
contiguous entry and returns the head pte (whilst taking care to
retain dirty bit information that could have been modified by DBM).
A tlb invalidate is performed to then ensure that there is no
possibility of multiple tlb entries being present for the same
region.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
(Fixed indentation and some comments cleanup)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 81 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 08deed7c71f0..f2c976464f39 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -68,6 +68,47 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
 	return CONT_PTES;
 }
 
+/*
+ * Changing some bits of contiguous entries requires us to follow a
+ * Break-Before-Make approach, breaking the whole contiguous set
+ * before we can change any entries. See ARM DDI 0487A.k_iss10775,
+ * "Misprogramming of the Contiguous bit", page D4-1762.
+ *
+ * This helper performs the break step.
+ */
+static pte_t get_clear_flush(struct mm_struct *mm,
+			     unsigned long addr,
+			     pte_t *ptep,
+			     unsigned long pgsize,
+			     unsigned long ncontig)
+{
+	unsigned long i, saddr = addr;
+	struct vm_area_struct vma = { .vm_mm = mm };
+	pte_t orig_pte = huge_ptep_get(ptep);
+
+	/*
+	 * If we already have a faulting entry then we don't need
+	 * to break before make (there won't be a tlb entry cached).
+	 */
+	if (!pte_present(orig_pte))
+		return orig_pte;
+
+	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
+		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
+
+		/*
+		 * If HW_AFDBM is enabled, then the HW could turn on
+		 * the dirty bit for any page in the set, so check
+		 * them all.  All hugetlb entries are already young.
+		 */
+		if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte))
+			orig_pte = pte_mkdirty(orig_pte);
+	}
+
+	flush_tlb_range(&vma, saddr, addr);
+	return orig_pte;
+}
+
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 			    pte_t *ptep, pte_t pte)
 {
@@ -93,6 +134,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	dpfn = pgsize >> PAGE_SHIFT;
 	hugeprot = pte_pgprot(pte);
 
+	get_clear_flush(mm, addr, ptep, pgsize, ncontig);
+
 	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
 		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
 			 pte_val(pfn_pte(pfn, hugeprot)));
@@ -194,7 +237,7 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep)
 {
-	int ncontig, i;
+	int ncontig;
 	size_t pgsize;
 	pte_t orig_pte = huge_ptep_get(ptep);
 
@@ -202,17 +245,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 		return ptep_get_and_clear(mm, addr, ptep);
 
 	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
-		/*
-		 * If HW_AFDBM is enabled, then the HW could
-		 * turn on the dirty bit for any of the page
-		 * in the set, so check them all.
-		 */
-		if (pte_dirty(ptep_get_and_clear(mm, addr, ptep)))
-			orig_pte = pte_mkdirty(orig_pte);
-	}
 
-	return orig_pte;
+	return get_clear_flush(mm, addr, ptep, pgsize, ncontig);
 }
 
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
@@ -222,6 +256,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	int ncontig, i, changed = 0;
 	size_t pgsize = 0;
 	unsigned long pfn = pte_pfn(pte), dpfn;
+	pte_t orig_pte;
 	pgprot_t hugeprot;
 
 	if (!pte_cont(pte))
@@ -231,10 +266,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	dpfn = pgsize >> PAGE_SHIFT;
 	hugeprot = pte_pgprot(pte);
 
-	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
-		changed |= ptep_set_access_flags(vma, addr, ptep,
-				pfn_pte(pfn, hugeprot), dirty);
-	}
+	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+	if (!pte_same(orig_pte, pte))
+		changed = 1;
+
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
+		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
 
 	return changed;
 }
@@ -244,6 +281,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 {
 	int ncontig, i;
 	size_t pgsize;
+	pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte;
+	unsigned long pfn = pte_pfn(pte), dpfn;
+	pgprot_t hugeprot;
 
 	if (!pte_cont(*ptep)) {
 		ptep_set_wrprotect(mm, addr, ptep);
@@ -251,8 +291,15 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
 	}
 
 	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
-	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
-		ptep_set_wrprotect(mm, addr, ptep);
+	dpfn = pgsize >> PAGE_SHIFT;
+
+	orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
+	if (pte_dirty(orig_pte))
+		pte = pte_mkdirty(pte);
+
+	hugeprot = pte_pgprot(pte);
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
+		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
 void huge_ptep_clear_flush(struct vm_area_struct *vma,
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (3 preceding siblings ...)
  2017-08-02  9:48 ` [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Punit Agrawal
@ 2017-08-02  9:49 ` Punit Agrawal
  2017-08-02  9:49 ` [PATCH v5 6/9] arm64: hugetlb: Override huge_pte_clear() to support " Punit Agrawal
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:49 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Punit Agrawal, linux-mm, steve.capper, linux-arm-kernel,
	mark.rutland, David Woods

huge_pte_offset() was updated to correctly handle swap entries for
hugepages. With the addition of the size parameter, it is now possible
to disambiguate whether the request is for a regular hugepage or a
contiguous hugepage.

Fix huge_pte_offset() for contiguous hugepages by using the size to find
the correct page table entry.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/mm/hugetlbpage.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f2c976464f39..e9061704aec3 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -195,6 +195,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
+	pte_t *pte;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
@@ -202,19 +203,29 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 		return NULL;
 
 	pud = pud_offset(pgd, addr);
-	if (pud_none(*pud))
+	if (pud_none(*pud) && sz != PUD_SIZE)
 		return NULL;
 	/* swap or huge page */
 	if (!pud_present(*pud) || pud_huge(*pud))
 		return (pte_t *)pud;
 	/* table; check the next level */
 
+	if (sz == CONT_PMD_SIZE)
+		addr &= CONT_PMD_MASK;
+
 	pmd = pmd_offset(pud, addr);
-	if (pmd_none(*pmd))
+	if (pmd_none(*pmd) &&
+	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
 		return NULL;
 	if (!pmd_present(*pmd) || pmd_huge(*pmd))
 		return (pte_t *)pmd;
 
+	if (sz == CONT_PTE_SIZE) {
+		pte = pte_offset_kernel(
+			pmd, (addr & CONT_PTE_MASK));
+		return pte;
+	}
+
 	return NULL;
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 6/9] arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (4 preceding siblings ...)
  2017-08-02  9:49 ` [PATCH v5 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Punit Agrawal
@ 2017-08-02  9:49 ` Punit Agrawal
  2017-08-02  9:49 ` [PATCH v5 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() " Punit Agrawal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:49 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Punit Agrawal, linux-mm, steve.capper, linux-arm-kernel,
	mark.rutland, David Woods

The default huge_pte_clear() implementation does not clear contiguous
page table entries when it encounters contiguous hugepages that are
supported on arm64.

Fix this by overriding the default implementation to clear all the
entries associated with contiguous hugepages.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/include/asm/hugetlb.h |  6 +++++-
 arch/arm64/mm/hugetlbpage.c      | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 793bd73b0d07..df8c0aea0917 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -18,7 +18,6 @@
 #ifndef __ASM_HUGETLB_H
 #define __ASM_HUGETLB_H
 
-#include <asm-generic/hugetlb.h>
 #include <asm/page.h>
 
 static inline pte_t huge_ptep_get(pte_t *ptep)
@@ -82,6 +81,11 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
 				    unsigned long addr, pte_t *ptep);
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
 				  unsigned long addr, pte_t *ptep);
+extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+			   pte_t *ptep, unsigned long sz);
+#define huge_pte_clear huge_pte_clear
+
+#include <asm-generic/hugetlb.h>
 
 #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
 static inline bool gigantic_page_supported(void) { return true; }
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e9061704aec3..240b2fd53266 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -68,6 +68,30 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
 	return CONT_PTES;
 }
 
+static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
+{
+	int contig_ptes = 0;
+
+	*pgsize = size;
+
+	switch (size) {
+	case PUD_SIZE:
+	case PMD_SIZE:
+		contig_ptes = 1;
+		break;
+	case CONT_PMD_SIZE:
+		*pgsize = PMD_SIZE;
+		contig_ptes = CONT_PMDS;
+		break;
+	case CONT_PTE_SIZE:
+		*pgsize = PAGE_SIZE;
+		contig_ptes = CONT_PTES;
+		break;
+	}
+
+	return contig_ptes;
+}
+
 /*
  * Changing some bits of contiguous entries requires us to follow a
  * Break-Before-Make approach, breaking the whole contiguous set
@@ -245,6 +269,18 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 	return entry;
 }
 
+void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+		    pte_t *ptep, unsigned long sz)
+{
+	int i, ncontig;
+	size_t pgsize;
+
+	ncontig = num_contig_ptes(sz, &pgsize);
+
+	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
+		pte_clear(mm, addr, ptep);
+}
+
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep)
 {
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (5 preceding siblings ...)
  2017-08-02  9:49 ` [PATCH v5 6/9] arm64: hugetlb: Override huge_pte_clear() to support " Punit Agrawal
@ 2017-08-02  9:49 ` Punit Agrawal
  2017-08-02  9:49 ` [PATCH v5 8/9] arm64: Re-enable support for " Punit Agrawal
  2017-08-02  9:49 ` [PATCH v5 9/9] arm64: hugetlb: Cleanup setup_hugepagesz Punit Agrawal
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:49 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Punit Agrawal, linux-mm, steve.capper, linux-arm-kernel,
	mark.rutland, David Woods

The default implementation of set_huge_swap_pte_at() does not support
hugepages consisting of contiguous ptes. Override it to add support for
contiguous hugepages.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/include/asm/hugetlb.h |  3 +++
 arch/arm64/mm/hugetlbpage.c      | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index df8c0aea0917..1dca41bea16a 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -84,6 +84,9 @@ extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 			   pte_t *ptep, unsigned long sz);
 #define huge_pte_clear huge_pte_clear
+extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
+				 pte_t *ptep, pte_t pte, unsigned long sz);
+#define set_huge_swap_pte_at set_huge_swap_pte_at
 
 #include <asm-generic/hugetlb.h>
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 240b2fd53266..3e78673b1bcb 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -167,6 +167,18 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
+			  pte_t *ptep, pte_t pte, unsigned long sz)
+{
+	int i, ncontig;
+	size_t pgsize;
+
+	ncontig = num_contig_ptes(sz, &pgsize);
+
+	for (i = 0; i < ncontig; i++, ptep++)
+		set_pte(ptep, pte);
+}
+
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 		      unsigned long addr, unsigned long sz)
 {
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 8/9] arm64: Re-enable support for contiguous hugepages
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (6 preceding siblings ...)
  2017-08-02  9:49 ` [PATCH v5 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() " Punit Agrawal
@ 2017-08-02  9:49 ` Punit Agrawal
  2017-08-02  9:49 ` [PATCH v5 9/9] arm64: hugetlb: Cleanup setup_hugepagesz Punit Agrawal
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:49 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Punit Agrawal, linux-mm, steve.capper, linux-arm-kernel, mark.rutland

also known as -

Revert "Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"""

Now that our hugetlb implementation is compliant with the
break-before-make requirements of the architecture and we have addressed
some of the issues in core code required for properly dealing with
hardware poisoning of contiguous hugepages let's re-enable support for
contiguous hugepages.

This reverts commit 6ae979ab39a368c18ceb0424bf824d172d6ab56f.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 3e78673b1bcb..a10dfbd0ffd5 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -385,6 +385,10 @@ static __init int setup_hugepagesz(char *opt)
 		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
 	} else if (ps == PUD_SIZE) {
 		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+	} else if (ps == (PAGE_SIZE * CONT_PTES)) {
+		hugetlb_add_hstate(CONT_PTE_SHIFT);
+	} else if (ps == (PMD_SIZE * CONT_PMDS)) {
+		hugetlb_add_hstate((PMD_SHIFT + CONT_PMD_SHIFT) - PAGE_SHIFT);
 	} else {
 		hugetlb_bad_size();
 		pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
@@ -393,3 +397,13 @@ static __init int setup_hugepagesz(char *opt)
 	return 1;
 }
 __setup("hugepagesz=", setup_hugepagesz);
+
+#ifdef CONFIG_ARM64_64K_PAGES
+static __init int add_default_hugepagesz(void)
+{
+	if (size_to_hstate(CONT_PTES * PAGE_SIZE) == NULL)
+		hugetlb_add_hstate(CONT_PTE_SHIFT);
+	return 0;
+}
+arch_initcall(add_default_hugepagesz);
+#endif
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 9/9] arm64: hugetlb: Cleanup setup_hugepagesz
  2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
                   ` (7 preceding siblings ...)
  2017-08-02  9:49 ` [PATCH v5 8/9] arm64: Re-enable support for " Punit Agrawal
@ 2017-08-02  9:49 ` Punit Agrawal
  8 siblings, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-02  9:49 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: Steve Capper, linux-mm, linux-arm-kernel, mark.rutland,
	David Woods, stable, Punit Agrawal

From: Steve Capper <steve.capper@arm.com>

Replace a lot of if statements with switch and case labels to make it
much clearer which huge page sizes are supported.

Also, we prevent PUD_SIZE from being used on systems not running with
4KB PAGE_SIZE. Before if one supplied PUD_SIZE in these circumstances,
then unusuable huge page sizes would be in use.

Fixes: 084bd29810a5 ("ARM64: mm: HugeTLB support.")
Cc: David Woods <dwoods@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index a10dfbd0ffd5..b5dc7cca3f45 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -381,20 +381,20 @@ static __init int setup_hugepagesz(char *opt)
 {
 	unsigned long ps = memparse(opt, &opt);
 
-	if (ps == PMD_SIZE) {
-		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
-	} else if (ps == PUD_SIZE) {
-		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
-	} else if (ps == (PAGE_SIZE * CONT_PTES)) {
-		hugetlb_add_hstate(CONT_PTE_SHIFT);
-	} else if (ps == (PMD_SIZE * CONT_PMDS)) {
-		hugetlb_add_hstate((PMD_SHIFT + CONT_PMD_SHIFT) - PAGE_SHIFT);
-	} else {
-		hugetlb_bad_size();
-		pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
-		return 0;
+	switch (ps) {
+#ifdef CONFIG_ARM64_4K_PAGES
+	case PUD_SIZE:
+#endif
+	case PMD_SIZE * CONT_PMDS:
+	case PMD_SIZE:
+	case PAGE_SIZE * CONT_PTES:
+		hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
+		return 1;
 	}
-	return 1;
+
+	hugetlb_bad_size();
+	pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
+	return 0;
 }
 __setup("hugepagesz=", setup_hugepagesz);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries
  2017-08-02  9:48 ` [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Punit Agrawal
@ 2017-08-07 17:19   ` Catalin Marinas
  2017-08-09 12:13     ` Steve Capper
  2017-08-09 13:29     ` Punit Agrawal
  0 siblings, 2 replies; 13+ messages in thread
From: Catalin Marinas @ 2017-08-07 17:19 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: will.deacon, mark.rutland, David Woods, Steve Capper, linux-mm,
	linux-arm-kernel

On Wed, Aug 02, 2017 at 10:48:59AM +0100, Punit Agrawal wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 08deed7c71f0..f2c976464f39 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -68,6 +68,47 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
>  	return CONT_PTES;
>  }
>  
> +/*
> + * Changing some bits of contiguous entries requires us to follow a
> + * Break-Before-Make approach, breaking the whole contiguous set
> + * before we can change any entries. See ARM DDI 0487A.k_iss10775,
> + * "Misprogramming of the Contiguous bit", page D4-1762.
> + *
> + * This helper performs the break step.
> + */
> +static pte_t get_clear_flush(struct mm_struct *mm,
> +			     unsigned long addr,
> +			     pte_t *ptep,
> +			     unsigned long pgsize,
> +			     unsigned long ncontig)
> +{
> +	unsigned long i, saddr = addr;
> +	struct vm_area_struct vma = { .vm_mm = mm };
> +	pte_t orig_pte = huge_ptep_get(ptep);
> +
> +	/*
> +	 * If we already have a faulting entry then we don't need
> +	 * to break before make (there won't be a tlb entry cached).
> +	 */
> +	if (!pte_present(orig_pte))
> +		return orig_pte;
> +
> +	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
> +
> +		/*
> +		 * If HW_AFDBM is enabled, then the HW could turn on
> +		 * the dirty bit for any page in the set, so check
> +		 * them all.  All hugetlb entries are already young.
> +		 */
> +		if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte))
> +			orig_pte = pte_mkdirty(orig_pte);
> +	}
> +
> +	flush_tlb_range(&vma, saddr, addr);
> +	return orig_pte;
> +}
> +
>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  			    pte_t *ptep, pte_t pte)
>  {
> @@ -93,6 +134,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  	dpfn = pgsize >> PAGE_SHIFT;
>  	hugeprot = pte_pgprot(pte);
>  
> +	get_clear_flush(mm, addr, ptep, pgsize, ncontig);
> +
>  	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
>  		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
>  			 pte_val(pfn_pte(pfn, hugeprot)));

Is there any risk of the huge pte being accessed (from user space on
another CPU) in the short break-before-make window? Not that we can do
much about it but just checking.

BTW, it seems a bit overkill to use ptep_get_and_clear() (via
get_clear_flush) when we just want to zero the entries. Probably not
much overhead though.

> @@ -222,6 +256,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>  	int ncontig, i, changed = 0;
>  	size_t pgsize = 0;
>  	unsigned long pfn = pte_pfn(pte), dpfn;
> +	pte_t orig_pte;
>  	pgprot_t hugeprot;
>  
>  	if (!pte_cont(pte))
> @@ -231,10 +266,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>  	dpfn = pgsize >> PAGE_SHIFT;
>  	hugeprot = pte_pgprot(pte);
>  
> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
> -		changed |= ptep_set_access_flags(vma, addr, ptep,
> -				pfn_pte(pfn, hugeprot), dirty);
> -	}
> +	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
> +	if (!pte_same(orig_pte, pte))
> +		changed = 1;
> +
> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
> +		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
>  
>  	return changed;
>  }

If hugeprot isn't dirty but orig_pte became dirty, it looks like we just
drop such information from the new pte.

Same comment here about the window. huge_ptep_set_access_flags() is
called on a present (huge) pte and we briefly make it invalid. Can the
mm subsystem cope with a fault on another CPU here? Same for the
huge_ptep_set_wrprotect() below.

> @@ -244,6 +281,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  {
>  	int ncontig, i;
>  	size_t pgsize;
> +	pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte;
> +	unsigned long pfn = pte_pfn(pte), dpfn;
> +	pgprot_t hugeprot;
>  
>  	if (!pte_cont(*ptep)) {
>  		ptep_set_wrprotect(mm, addr, ptep);
> @@ -251,8 +291,15 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  	}
>  
>  	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
> -		ptep_set_wrprotect(mm, addr, ptep);
> +	dpfn = pgsize >> PAGE_SHIFT;
> +
> +	orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
> +	if (pte_dirty(orig_pte))
> +		pte = pte_mkdirty(pte);
> +
> +	hugeprot = pte_pgprot(pte);
> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
> +		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
>  }
>  
>  void huge_ptep_clear_flush(struct vm_area_struct *vma,

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries
  2017-08-07 17:19   ` Catalin Marinas
@ 2017-08-09 12:13     ` Steve Capper
  2017-08-09 13:29     ` Punit Agrawal
  1 sibling, 0 replies; 13+ messages in thread
From: Steve Capper @ 2017-08-09 12:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Punit Agrawal, will.deacon, mark.rutland, David Woods,
	Steve Capper, linux-mm, linux-arm-kernel

Hi Catalin,

On Mon, Aug 07, 2017 at 06:19:17PM +0100, Catalin Marinas wrote:
> On Wed, Aug 02, 2017 at 10:48:59AM +0100, Punit Agrawal wrote:
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 08deed7c71f0..f2c976464f39 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -68,6 +68,47 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> >  	return CONT_PTES;
> >  }
> >  
> > +/*
> > + * Changing some bits of contiguous entries requires us to follow a
> > + * Break-Before-Make approach, breaking the whole contiguous set
> > + * before we can change any entries. See ARM DDI 0487A.k_iss10775,
> > + * "Misprogramming of the Contiguous bit", page D4-1762.
> > + *
> > + * This helper performs the break step.
> > + */
> > +static pte_t get_clear_flush(struct mm_struct *mm,
> > +			     unsigned long addr,
> > +			     pte_t *ptep,
> > +			     unsigned long pgsize,
> > +			     unsigned long ncontig)
> > +{
> > +	unsigned long i, saddr = addr;
> > +	struct vm_area_struct vma = { .vm_mm = mm };
> > +	pte_t orig_pte = huge_ptep_get(ptep);
> > +
> > +	/*
> > +	 * If we already have a faulting entry then we don't need
> > +	 * to break before make (there won't be a tlb entry cached).
> > +	 */
> > +	if (!pte_present(orig_pte))
> > +		return orig_pte;
> > +
> > +	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> > +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
> > +
> > +		/*
> > +		 * If HW_AFDBM is enabled, then the HW could turn on
> > +		 * the dirty bit for any page in the set, so check
> > +		 * them all.  All hugetlb entries are already young.
> > +		 */
> > +		if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte))
> > +			orig_pte = pte_mkdirty(orig_pte);
> > +	}
> > +
> > +	flush_tlb_range(&vma, saddr, addr);
> > +	return orig_pte;
> > +}
> > +
> >  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> >  			    pte_t *ptep, pte_t pte)
> >  {
> > @@ -93,6 +134,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> >  	dpfn = pgsize >> PAGE_SHIFT;
> >  	hugeprot = pte_pgprot(pte);
> >  
> > +	get_clear_flush(mm, addr, ptep, pgsize, ncontig);
> > +
> >  	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
> >  		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
> >  			 pte_val(pfn_pte(pfn, hugeprot)));
> 
> Is there any risk of the huge pte being accessed (from user space on
> another CPU) in the short break-before-make window? Not that we can do
> much about it but just checking.

IIUC we're protected by the huge_pte_lock(.).

> 
> BTW, it seems a bit overkill to use ptep_get_and_clear() (via
> get_clear_flush) when we just want to zero the entries. Probably not
> much overhead though.
>

I think we need the TLB invalidate here to ensure there's zero possibility of
conflicting TLB entries being in play.
 
> > @@ -222,6 +256,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> >  	int ncontig, i, changed = 0;
> >  	size_t pgsize = 0;
> >  	unsigned long pfn = pte_pfn(pte), dpfn;
> > +	pte_t orig_pte;
> >  	pgprot_t hugeprot;
> >  
> >  	if (!pte_cont(pte))
> > @@ -231,10 +266,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> >  	dpfn = pgsize >> PAGE_SHIFT;
> >  	hugeprot = pte_pgprot(pte);
> >  
> > -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
> > -		changed |= ptep_set_access_flags(vma, addr, ptep,
> > -				pfn_pte(pfn, hugeprot), dirty);
> > -	}
> > +	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
> > +	if (!pte_same(orig_pte, pte))
> > +		changed = 1;
> > +
> > +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
> > +		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
> >  
> >  	return changed;
> >  }
> 
> If hugeprot isn't dirty but orig_pte became dirty, it looks like we just
> drop such information from the new pte.

Ahhh... okay, I will have a think about this, thanks!

> 
> Same comment here about the window. huge_ptep_set_access_flags() is
> called on a present (huge) pte and we briefly make it invalid. Can the
> mm subsystem cope with a fault on another CPU here? Same for the
> huge_ptep_set_wrprotect() below.

I think the fault handler will wait for the huge_pte_lock(.).

> 
> > @@ -244,6 +281,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
> >  {
> >  	int ncontig, i;
> >  	size_t pgsize;
> > +	pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte;
> > +	unsigned long pfn = pte_pfn(pte), dpfn;
> > +	pgprot_t hugeprot;
> >  
> >  	if (!pte_cont(*ptep)) {
> >  		ptep_set_wrprotect(mm, addr, ptep);
> > @@ -251,8 +291,15 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
> >  	}
> >  
> >  	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
> > -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
> > -		ptep_set_wrprotect(mm, addr, ptep);
> > +	dpfn = pgsize >> PAGE_SHIFT;
> > +
> > +	orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
> > +	if (pte_dirty(orig_pte))
> > +		pte = pte_mkdirty(pte);
> > +
> > +	hugeprot = pte_pgprot(pte);
> > +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
> > +		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
> >  }
> >  
> >  void huge_ptep_clear_flush(struct vm_area_struct *vma,
> 

Cheers Catalin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries
  2017-08-07 17:19   ` Catalin Marinas
  2017-08-09 12:13     ` Steve Capper
@ 2017-08-09 13:29     ` Punit Agrawal
  1 sibling, 0 replies; 13+ messages in thread
From: Punit Agrawal @ 2017-08-09 13:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: will.deacon, mark.rutland, David Woods, Steve Capper, linux-mm,
	linux-arm-kernel

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Wed, Aug 02, 2017 at 10:48:59AM +0100, Punit Agrawal wrote:
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 08deed7c71f0..f2c976464f39 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -68,6 +68,47 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
>>  	return CONT_PTES;
>>  }
>>  
>> +/*
>> + * Changing some bits of contiguous entries requires us to follow a
>> + * Break-Before-Make approach, breaking the whole contiguous set
>> + * before we can change any entries. See ARM DDI 0487A.k_iss10775,
>> + * "Misprogramming of the Contiguous bit", page D4-1762.
>> + *
>> + * This helper performs the break step.
>> + */
>> +static pte_t get_clear_flush(struct mm_struct *mm,
>> +			     unsigned long addr,
>> +			     pte_t *ptep,
>> +			     unsigned long pgsize,
>> +			     unsigned long ncontig)
>> +{
>> +	unsigned long i, saddr = addr;
>> +	struct vm_area_struct vma = { .vm_mm = mm };
>> +	pte_t orig_pte = huge_ptep_get(ptep);
>> +
>> +	/*
>> +	 * If we already have a faulting entry then we don't need
>> +	 * to break before make (there won't be a tlb entry cached).
>> +	 */
>> +	if (!pte_present(orig_pte))
>> +		return orig_pte;
>> +
>> +	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
>> +
>> +		/*
>> +		 * If HW_AFDBM is enabled, then the HW could turn on
>> +		 * the dirty bit for any page in the set, so check
>> +		 * them all.  All hugetlb entries are already young.
>> +		 */
>> +		if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +	}
>> +
>> +	flush_tlb_range(&vma, saddr, addr);
>> +	return orig_pte;
>> +}
>> +
>>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  			    pte_t *ptep, pte_t pte)
>>  {
>> @@ -93,6 +134,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>>  	dpfn = pgsize >> PAGE_SHIFT;
>>  	hugeprot = pte_pgprot(pte);
>>  
>> +	get_clear_flush(mm, addr, ptep, pgsize, ncontig);
>> +
>>  	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
>>  		pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep,
>>  			 pte_val(pfn_pte(pfn, hugeprot)));
>
> Is there any risk of the huge pte being accessed (from user space on
> another CPU) in the short break-before-make window? Not that we can do
> much about it but just checking.

The calls to set_huge_pte_at are protected by a page table lock. If a
fault is taken on another CPU we'll end up running the following call
sequence

hugetlb_fault()
--> hugetlb_no_page()

which checks if the pte is none after acquiring the page table lock and
backs out of the fault if so.

>
> BTW, it seems a bit overkill to use ptep_get_and_clear() (via
> get_clear_flush) when we just want to zero the entries. Probably not
> much overhead though.

We missed converting huge_ptep_clear_flush() to follow break-before-make
requirement. I'll add a helper to zero out the entries and flush the
range which can be used here and in huge_ptep_clear_flush() as well.

>
>> @@ -222,6 +256,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>>  	int ncontig, i, changed = 0;
>>  	size_t pgsize = 0;
>>  	unsigned long pfn = pte_pfn(pte), dpfn;
>> +	pte_t orig_pte;
>>  	pgprot_t hugeprot;
>>  
>>  	if (!pte_cont(pte))
>> @@ -231,10 +266,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>>  	dpfn = pgsize >> PAGE_SHIFT;
>>  	hugeprot = pte_pgprot(pte);
>>  
>> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) {
>> -		changed |= ptep_set_access_flags(vma, addr, ptep,
>> -				pfn_pte(pfn, hugeprot), dirty);
>> -	}
>> +	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
>> +	if (!pte_same(orig_pte, pte))
>> +		changed = 1;
>> +
>> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
>> +		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
>>  
>>  	return changed;
>>  }
>
> If hugeprot isn't dirty but orig_pte became dirty, it looks like we just
> drop such information from the new pte.

We can avoid this by deriving hugeprot from orig_pte instead of
pte. I'll move update the patch to move setting hugeprot after the call
to get_clear_flush().

>
> Same comment here about the window. huge_ptep_set_access_flags() is
> called on a present (huge) pte and we briefly make it invalid. Can the
> mm subsystem cope with a fault on another CPU here? Same for the
> huge_ptep_set_wrprotect() below.

I've checked through the code and can confirm that callers to both
huge_ptep_set_access_flags() and huge_ptep_set_wrprotect() hold the page
table lock. So we should be safe here.

I also checked the get_user_pages_fast (based on offline discussion) and
can confirm that there are checks for p*d_none() in which case the slow
path is taken.

I'll update the patches with the two changes discussed above and
re-post.

Thanks,
Punit

>
>> @@ -244,6 +281,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>>  {
>>  	int ncontig, i;
>>  	size_t pgsize;
>> +	pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte;
>> +	unsigned long pfn = pte_pfn(pte), dpfn;
>> +	pgprot_t hugeprot;
>>  
>>  	if (!pte_cont(*ptep)) {
>>  		ptep_set_wrprotect(mm, addr, ptep);
>> @@ -251,8 +291,15 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
>>  	}
>>  
>>  	ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>> -	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>> -		ptep_set_wrprotect(mm, addr, ptep);
>> +	dpfn = pgsize >> PAGE_SHIFT;
>> +
>> +	orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
>> +	if (pte_dirty(orig_pte))
>> +		pte = pte_mkdirty(pte);
>> +
>> +	hugeprot = pte_pgprot(pte);
>> +	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
>> +		set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
>>  }
>>  
>>  void huge_ptep_clear_flush(struct vm_area_struct *vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-09 13:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  9:48 [PATCH v5 0/9] arm64: Enable contiguous pte hugepage support Punit Agrawal
2017-08-02  9:48 ` [PATCH v5 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Punit Agrawal
2017-08-02  9:48 ` [PATCH v5 2/9] arm64: hugetlb: Introduce pte_pgprot helper Punit Agrawal
2017-08-02  9:48 ` [PATCH v5 3/9] arm64: hugetlb: Spring clean huge pte accessors Punit Agrawal
2017-08-02  9:48 ` [PATCH v5 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Punit Agrawal
2017-08-07 17:19   ` Catalin Marinas
2017-08-09 12:13     ` Steve Capper
2017-08-09 13:29     ` Punit Agrawal
2017-08-02  9:49 ` [PATCH v5 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Punit Agrawal
2017-08-02  9:49 ` [PATCH v5 6/9] arm64: hugetlb: Override huge_pte_clear() to support " Punit Agrawal
2017-08-02  9:49 ` [PATCH v5 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() " Punit Agrawal
2017-08-02  9:49 ` [PATCH v5 8/9] arm64: Re-enable support for " Punit Agrawal
2017-08-02  9:49 ` [PATCH v5 9/9] arm64: hugetlb: Cleanup setup_hugepagesz Punit Agrawal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).