linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: dirty/accessed pte optimisations
@ 2018-08-28 11:20 Nicholas Piggin
  2018-08-28 11:20 ` Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

Here are some patches that didn't get much comment last time. It
looks like x86 might benefit too though, so that might get people
interested. 

I improved changelogs and added some comments, but no real logic
changes.

I hope I didn't get the x86 numbers wrong, they're more significant
than I expected so it could quite well be a problem with my test
(corrections welcome). Any data from other archs would be interesting
too.

Andrew perhaps if there aren't objections these could go in mm for
a while. 

Thanks,
Nick


Nicholas Piggin (3):
  mm/cow: don't bother write protectig already write-protected huge
    pages
  mm/cow: optimise pte dirty/accessed bits handling in fork
  mm: optimise pte dirty/accessed bit setting by demand based pte
    insertion

 mm/huge_memory.c | 24 +++++++++++++++---------
 mm/memory.c      | 18 ++++++++++--------
 mm/vmscan.c      |  8 ++++++++
 3 files changed, 33 insertions(+), 17 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] mm: dirty/accessed pte optimisations
  2018-08-28 11:20 [PATCH 0/3] mm: dirty/accessed pte optimisations Nicholas Piggin
@ 2018-08-28 11:20 ` Nicholas Piggin
  2018-08-28 11:20 ` [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages Nicholas Piggin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

Here are some patches that didn't get much comment last time. It
looks like x86 might benefit too though, so that might get people
interested. 

I improved changelogs and added some comments, but no real logic
changes.

I hope I didn't get the x86 numbers wrong, they're more significant
than I expected so it could quite well be a problem with my test
(corrections welcome). Any data from other archs would be interesting
too.

Andrew perhaps if there aren't objections these could go in mm for
a while. 

Thanks,
Nick


Nicholas Piggin (3):
  mm/cow: don't bother write protectig already write-protected huge
    pages
  mm/cow: optimise pte dirty/accessed bits handling in fork
  mm: optimise pte dirty/accessed bit setting by demand based pte
    insertion

 mm/huge_memory.c | 24 +++++++++++++++---------
 mm/memory.c      | 18 ++++++++++--------
 mm/vmscan.c      |  8 ++++++++
 3 files changed, 33 insertions(+), 17 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages
  2018-08-28 11:20 [PATCH 0/3] mm: dirty/accessed pte optimisations Nicholas Piggin
  2018-08-28 11:20 ` Nicholas Piggin
@ 2018-08-28 11:20 ` Nicholas Piggin
  2018-08-28 11:20   ` Nicholas Piggin
  2018-08-28 11:20 ` [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Nicholas Piggin
  2018-08-28 11:20 ` [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
  3 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

This is the THP equivalent for 1b2de5d039c8 ("mm/cow: don't bother write
protecting already write-protected pages").

Explicit hugetlb pages don't get the same treatment because they don't
appear to have the right accessor functions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9592cbd8530a..d9bae12978ef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -973,8 +973,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
 
-	pmdp_set_wrprotect(src_mm, addr, src_pmd);
-	pmd = pmd_mkold(pmd_wrprotect(pmd));
+	if (pmd_write(pmd)) {
+		pmdp_set_wrprotect(src_mm, addr, src_pmd);
+		pmd = pmd_wrprotect(pmd);
+	}
+	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1064,8 +1067,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		/* No huge zero pud yet */
 	}
 
-	pudp_set_wrprotect(src_mm, addr, src_pud);
-	pud = pud_mkold(pud_wrprotect(pud));
+	if (pud_write(pud)) {
+		pudp_set_wrprotect(src_mm, addr, src_pud);
+		pud = pud_wrprotect(pud);
+	}
+	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages
  2018-08-28 11:20 ` [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages Nicholas Piggin
@ 2018-08-28 11:20   ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

This is the THP equivalent for 1b2de5d039c8 ("mm/cow: don't bother write
protecting already write-protected pages").

Explicit hugetlb pages don't get the same treatment because they don't
appear to have the right accessor functions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9592cbd8530a..d9bae12978ef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -973,8 +973,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
 
-	pmdp_set_wrprotect(src_mm, addr, src_pmd);
-	pmd = pmd_mkold(pmd_wrprotect(pmd));
+	if (pmd_write(pmd)) {
+		pmdp_set_wrprotect(src_mm, addr, src_pmd);
+		pmd = pmd_wrprotect(pmd);
+	}
+	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1064,8 +1067,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		/* No huge zero pud yet */
 	}
 
-	pudp_set_wrprotect(src_mm, addr, src_pud);
-	pud = pud_mkold(pud_wrprotect(pud));
+	if (pud_write(pud)) {
+		pudp_set_wrprotect(src_mm, addr, src_pud);
+		pud = pud_wrprotect(pud);
+	}
+	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-28 11:20 [PATCH 0/3] mm: dirty/accessed pte optimisations Nicholas Piggin
  2018-08-28 11:20 ` Nicholas Piggin
  2018-08-28 11:20 ` [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages Nicholas Piggin
@ 2018-08-28 11:20 ` Nicholas Piggin
  2018-08-28 11:20   ` Nicholas Piggin
  2018-08-29 15:42   ` Linus Torvalds
  2018-08-28 11:20 ` [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
  3 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

fork clears dirty/accessed bits from new ptes in the child. This logic
has existed since mapped page reclaim was done by scanning ptes when
it may have been quite important. Today with physical based pte
scanning, there is less reason to clear these bits. Dirty bits are all
tested and cleared together and any dirty bit is the same as many
dirty bits. Any young bit is treated similarly to many young bits, but
not quite the same. A comment has been added where there is some
difference.

This eliminates a major source of faults powerpc/radix requires to set
dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by
about 5% on POWER9 (16600 -> 17500 fork/execs per second).

Skylake appears to have a micro-fault overhead too -- a test which
allocates 4GB anonymous memory, reads each page, then forks, and times
the child reading a byte from each page. The first pass over the pages
takes about 1000 cycles per page, the second pass takes about 27
cycles (TLB miss). With no additional minor faults measured due to
either child pass, and the page array well exceeding TLB capacity, the
large cost must be caused by micro faults caused by setting accessed
bit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c |  2 --
 mm/memory.c      | 10 +++++-----
 mm/vmscan.c      |  8 ++++++++
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d9bae12978ef..5fb1a43e12e0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -977,7 +977,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pmdp_set_wrprotect(src_mm, addr, src_pmd);
 		pmd = pmd_wrprotect(pmd);
 	}
-	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1071,7 +1070,6 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pudp_set_wrprotect(src_mm, addr, src_pud);
 		pud = pud_wrprotect(pud);
 	}
-	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
diff --git a/mm/memory.c b/mm/memory.c
index b616a69ad770..3d8bf8220bd0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1038,12 +1038,12 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	}
 
 	/*
-	 * If it's a shared mapping, mark it clean in
-	 * the child
+	 * Child inherits dirty and young bits from parent. There is no
+	 * point clearing them because any cleaning or aging has to walk
+	 * all ptes anyway, and it will notice the bits set in the parent.
+	 * Leaving them set avoids stalls and even page faults on CPUs that
+	 * handle these bits in software.
 	 */
-	if (vm_flags & VM_SHARED)
-		pte = pte_mkclean(pte);
-	pte = pte_mkold(pte);
 
 	page = vm_normal_page(vma, addr, pte);
 	if (page) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7e7d25504651..52fe64af3d80 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1021,6 +1021,14 @@ static enum page_references page_check_references(struct page *page,
 		 * to look twice if a mapped file page is used more
 		 * than once.
 		 *
+		 * fork() will set referenced bits in child ptes despite
+		 * not having been accessed, to avoid micro-faults of
+		 * setting accessed bits. This heuristic is not perfectly
+		 * accurate in other ways -- multiple map/unmap in the
+		 * same time window would be treated as multiple references
+		 * despite same number of actual memory accesses made by
+		 * the program.
+		 *
 		 * Mark it and spare it for another trip around the
 		 * inactive list.  Another page table reference will
 		 * lead to its activation.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-28 11:20 ` [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Nicholas Piggin
@ 2018-08-28 11:20   ` Nicholas Piggin
  2018-08-29 15:42   ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

fork clears dirty/accessed bits from new ptes in the child. This logic
has existed since mapped page reclaim was done by scanning ptes when
it may have been quite important. Today with physical based pte
scanning, there is less reason to clear these bits. Dirty bits are all
tested and cleared together and any dirty bit is the same as many
dirty bits. Any young bit is treated similarly to many young bits, but
not quite the same. A comment has been added where there is some
difference.

This eliminates a major source of faults powerpc/radix requires to set
dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by
about 5% on POWER9 (16600 -> 17500 fork/execs per second).

Skylake appears to have a micro-fault overhead too -- a test which
allocates 4GB anonymous memory, reads each page, then forks, and times
the child reading a byte from each page. The first pass over the pages
takes about 1000 cycles per page, the second pass takes about 27
cycles (TLB miss). With no additional minor faults measured due to
either child pass, and the page array well exceeding TLB capacity, the
large cost must be caused by micro faults caused by setting accessed
bit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c |  2 --
 mm/memory.c      | 10 +++++-----
 mm/vmscan.c      |  8 ++++++++
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d9bae12978ef..5fb1a43e12e0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -977,7 +977,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pmdp_set_wrprotect(src_mm, addr, src_pmd);
 		pmd = pmd_wrprotect(pmd);
 	}
-	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1071,7 +1070,6 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pudp_set_wrprotect(src_mm, addr, src_pud);
 		pud = pud_wrprotect(pud);
 	}
-	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
diff --git a/mm/memory.c b/mm/memory.c
index b616a69ad770..3d8bf8220bd0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1038,12 +1038,12 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	}
 
 	/*
-	 * If it's a shared mapping, mark it clean in
-	 * the child
+	 * Child inherits dirty and young bits from parent. There is no
+	 * point clearing them because any cleaning or aging has to walk
+	 * all ptes anyway, and it will notice the bits set in the parent.
+	 * Leaving them set avoids stalls and even page faults on CPUs that
+	 * handle these bits in software.
 	 */
-	if (vm_flags & VM_SHARED)
-		pte = pte_mkclean(pte);
-	pte = pte_mkold(pte);
 
 	page = vm_normal_page(vma, addr, pte);
 	if (page) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7e7d25504651..52fe64af3d80 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1021,6 +1021,14 @@ static enum page_references page_check_references(struct page *page,
 		 * to look twice if a mapped file page is used more
 		 * than once.
 		 *
+		 * fork() will set referenced bits in child ptes despite
+		 * not having been accessed, to avoid micro-faults of
+		 * setting accessed bits. This heuristic is not perfectly
+		 * accurate in other ways -- multiple map/unmap in the
+		 * same time window would be treated as multiple references
+		 * despite same number of actual memory accesses made by
+		 * the program.
+		 *
 		 * Mark it and spare it for another trip around the
 		 * inactive list.  Another page table reference will
 		 * lead to its activation.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-08-28 11:20 [PATCH 0/3] mm: dirty/accessed pte optimisations Nicholas Piggin
                   ` (2 preceding siblings ...)
  2018-08-28 11:20 ` [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Nicholas Piggin
@ 2018-08-28 11:20 ` Nicholas Piggin
  2018-08-28 11:20   ` Nicholas Piggin
  2018-09-05 14:29   ` Guenter Roeck
  3 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

Similarly to the previous patch, this tries to optimise dirty/accessed
bits in ptes to avoid access costs of hardware setting them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 12 +++++++-----
 mm/memory.c      |  8 +++++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5fb1a43e12e0..2c169041317f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1197,6 +1197,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
 		pte_t entry;
 		entry = mk_pte(pages[i], vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
@@ -2067,7 +2068,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct page *page;
 	pgtable_t pgtable;
 	pmd_t old_pmd, _pmd;
-	bool young, write, soft_dirty, pmd_migration = false;
+	bool young, write, dirty, soft_dirty, pmd_migration = false;
 	unsigned long addr;
 	int i;
 
@@ -2145,8 +2146,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	if (pmd_dirty(old_pmd))
-		SetPageDirty(page);
+	dirty = pmd_dirty(old_pmd);
 	write = pmd_write(old_pmd);
 	young = pmd_young(old_pmd);
 	soft_dirty = pmd_soft_dirty(old_pmd);
@@ -2176,8 +2176,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			entry = maybe_mkwrite(entry, vma);
 			if (!write)
 				entry = pte_wrprotect(entry);
-			if (!young)
-				entry = pte_mkold(entry);
+			if (young)
+				entry = pte_mkyoung(entry);
+			if (dirty)
+				entry = pte_mkdirty(entry);
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
diff --git a/mm/memory.c b/mm/memory.c
index 3d8bf8220bd0..d205ba69918c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1830,10 +1830,9 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		entry = pte_mkspecial(pfn_t_pte(pfn, prot));
 
 out_mkwrite:
-	if (mkwrite) {
-		entry = pte_mkyoung(entry);
+	entry = pte_mkyoung(entry);
+	if (mkwrite)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-	}
 
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
@@ -2560,6 +2559,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		/*
 		 * Clear the pte entry and flush it first, before updating the
@@ -3069,6 +3069,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
+	pte = pte_mkyoung(pte);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
@@ -3479,6 +3480,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
+	entry = pte_mkyoung(entry);
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	/* copy-on-write page */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-08-28 11:20 ` [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
@ 2018-08-28 11:20   ` Nicholas Piggin
  2018-09-05 14:29   ` Guenter Roeck
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-28 11:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds

Similarly to the previous patch, this tries to optimise dirty/accessed
bits in ptes to avoid access costs of hardware setting them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 12 +++++++-----
 mm/memory.c      |  8 +++++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5fb1a43e12e0..2c169041317f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1197,6 +1197,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
 		pte_t entry;
 		entry = mk_pte(pages[i], vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
@@ -2067,7 +2068,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct page *page;
 	pgtable_t pgtable;
 	pmd_t old_pmd, _pmd;
-	bool young, write, soft_dirty, pmd_migration = false;
+	bool young, write, dirty, soft_dirty, pmd_migration = false;
 	unsigned long addr;
 	int i;
 
@@ -2145,8 +2146,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	if (pmd_dirty(old_pmd))
-		SetPageDirty(page);
+	dirty = pmd_dirty(old_pmd);
 	write = pmd_write(old_pmd);
 	young = pmd_young(old_pmd);
 	soft_dirty = pmd_soft_dirty(old_pmd);
@@ -2176,8 +2176,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			entry = maybe_mkwrite(entry, vma);
 			if (!write)
 				entry = pte_wrprotect(entry);
-			if (!young)
-				entry = pte_mkold(entry);
+			if (young)
+				entry = pte_mkyoung(entry);
+			if (dirty)
+				entry = pte_mkdirty(entry);
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
diff --git a/mm/memory.c b/mm/memory.c
index 3d8bf8220bd0..d205ba69918c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1830,10 +1830,9 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		entry = pte_mkspecial(pfn_t_pte(pfn, prot));
 
 out_mkwrite:
-	if (mkwrite) {
-		entry = pte_mkyoung(entry);
+	entry = pte_mkyoung(entry);
+	if (mkwrite)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-	}
 
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
@@ -2560,6 +2559,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		/*
 		 * Clear the pte entry and flush it first, before updating the
@@ -3069,6 +3069,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
+	pte = pte_mkyoung(pte);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
@@ -3479,6 +3480,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
+	entry = pte_mkyoung(entry);
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	/* copy-on-write page */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-28 11:20 ` [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Nicholas Piggin
  2018-08-28 11:20   ` Nicholas Piggin
@ 2018-08-29 15:42   ` Linus Torvalds
  2018-08-29 15:42     ` Linus Torvalds
  2018-08-29 23:12     ` Nicholas Piggin
  1 sibling, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-08-29 15:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Tue, Aug 28, 2018 at 4:20 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> fork clears dirty/accessed bits from new ptes in the child. This logic
> has existed since mapped page reclaim was done by scanning ptes when
> it may have been quite important. Today with physical based pte
> scanning, there is less reason to clear these bits.

Can you humor me, and make the dirty/accessed bit patches separate?

There is actually a difference wrt the dirty bit: if we unmap an area
with dirty pages, we have to do the special synchronous flush.

So a clean page in the virtual mapping is _literally_ cheaper to have.

> This eliminates a major source of faults powerpc/radix requires to set
> dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by
> about 5% on POWER9 (16600 -> 17500 fork/execs per second).

I don't think the dirty bit matters.

The accessed bit I think may be worth keeping, so by all means remove the mkold.

                  Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 15:42   ` Linus Torvalds
@ 2018-08-29 15:42     ` Linus Torvalds
  2018-08-29 23:12     ` Nicholas Piggin
  1 sibling, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-08-29 15:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Tue, Aug 28, 2018 at 4:20 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> fork clears dirty/accessed bits from new ptes in the child. This logic
> has existed since mapped page reclaim was done by scanning ptes when
> it may have been quite important. Today with physical based pte
> scanning, there is less reason to clear these bits.

Can you humor me, and make the dirty/accessed bit patches separate?

There is actually a difference wrt the dirty bit: if we unmap an area
with dirty pages, we have to do the special synchronous flush.

So a clean page in the virtual mapping is _literally_ cheaper to have.

> This eliminates a major source of faults powerpc/radix requires to set
> dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by
> about 5% on POWER9 (16600 -> 17500 fork/execs per second).

I don't think the dirty bit matters.

The accessed bit I think may be worth keeping, so by all means remove the mkold.

                  Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 15:42   ` Linus Torvalds
  2018-08-29 15:42     ` Linus Torvalds
@ 2018-08-29 23:12     ` Nicholas Piggin
  2018-08-29 23:12       ` Nicholas Piggin
  2018-08-29 23:15       ` Linus Torvalds
  1 sibling, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-29 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, 29 Aug 2018 08:42:09 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Aug 28, 2018 at 4:20 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > fork clears dirty/accessed bits from new ptes in the child. This logic
> > has existed since mapped page reclaim was done by scanning ptes when
> > it may have been quite important. Today with physical based pte
> > scanning, there is less reason to clear these bits.  
> 
> Can you humor me, and make the dirty/accessed bit patches separate?

Yeah sure.

> There is actually a difference wrt the dirty bit: if we unmap an area
> with dirty pages, we have to do the special synchronous flush.
> 
> So a clean page in the virtual mapping is _literally_ cheaper to have.

Oh yeah true, that blasted thing. Good point.

Dirty micro fault seems to be the big one for my Skylake, takes 300
nanoseconds per access. Accessed takes about 100. (I think, have to
go over my benchmark a bit more carefully and re-test).

Dirty will happen less often though, particularly as most places we
do write to (stack, heap, etc) will be write protected for COW anyway,
I think. Worst case might be a big shared shm segment like a database
buffer cache, but those kind of forks should happen very very
infrequently I would hope.

Yes maybe we can do that. I'll split them up and try to get some
numbers for them individually.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 23:12     ` Nicholas Piggin
@ 2018-08-29 23:12       ` Nicholas Piggin
  2018-08-29 23:15       ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-29 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, 29 Aug 2018 08:42:09 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Aug 28, 2018 at 4:20 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > fork clears dirty/accessed bits from new ptes in the child. This logic
> > has existed since mapped page reclaim was done by scanning ptes when
> > it may have been quite important. Today with physical based pte
> > scanning, there is less reason to clear these bits.  
> 
> Can you humor me, and make the dirty/accessed bit patches separate?

Yeah sure.

> There is actually a difference wrt the dirty bit: if we unmap an area
> with dirty pages, we have to do the special synchronous flush.
> 
> So a clean page in the virtual mapping is _literally_ cheaper to have.

Oh yeah true, that blasted thing. Good point.

Dirty micro fault seems to be the big one for my Skylake, takes 300
nanoseconds per access. Accessed takes about 100. (I think, have to
go over my benchmark a bit more carefully and re-test).

Dirty will happen less often though, particularly as most places we
do write to (stack, heap, etc) will be write protected for COW anyway,
I think. Worst case might be a big shared shm segment like a database
buffer cache, but those kind of forks should happen very very
infrequently I would hope.

Yes maybe we can do that. I'll split them up and try to get some
numbers for them individually.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 23:12     ` Nicholas Piggin
  2018-08-29 23:12       ` Nicholas Piggin
@ 2018-08-29 23:15       ` Linus Torvalds
  2018-08-29 23:15         ` Linus Torvalds
  2018-08-29 23:57         ` Nicholas Piggin
  1 sibling, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-08-29 23:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, Aug 29, 2018 at 4:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Dirty micro fault seems to be the big one for my Skylake, takes 300
> nanoseconds per access. Accessed takes about 100. (I think, have to
> go over my benchmark a bit more carefully and re-test).

Yeah, but they only happen for shared areas after fork, which sounds
like it shouldn't be a big deal in most cases.

And I'm not entirely objecting to your patch per se, I just would want
to keep the accessed bit changes separate from the dirty bit ones.

*If* somebody has bisectable issues with it (performance or not), it
will then be clearer what the exact issue is.

            Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 23:15       ` Linus Torvalds
@ 2018-08-29 23:15         ` Linus Torvalds
  2018-08-29 23:57         ` Nicholas Piggin
  1 sibling, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2018-08-29 23:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, Aug 29, 2018 at 4:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Dirty micro fault seems to be the big one for my Skylake, takes 300
> nanoseconds per access. Accessed takes about 100. (I think, have to
> go over my benchmark a bit more carefully and re-test).

Yeah, but they only happen for shared areas after fork, which sounds
like it shouldn't be a big deal in most cases.

And I'm not entirely objecting to your patch per se, I just would want
to keep the accessed bit changes separate from the dirty bit ones.

*If* somebody has bisectable issues with it (performance or not), it
will then be clearer what the exact issue is.

            Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 23:15       ` Linus Torvalds
  2018-08-29 23:15         ` Linus Torvalds
@ 2018-08-29 23:57         ` Nicholas Piggin
  2018-08-29 23:57           ` Nicholas Piggin
  1 sibling, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-29 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, 29 Aug 2018 16:15:37 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Aug 29, 2018 at 4:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Dirty micro fault seems to be the big one for my Skylake, takes 300
> > nanoseconds per access. Accessed takes about 100. (I think, have to
> > go over my benchmark a bit more carefully and re-test).  
> 
> Yeah, but they only happen for shared areas after fork, which sounds
> like it shouldn't be a big deal in most cases.

You might be right there.

> 
> And I'm not entirely objecting to your patch per se, I just would want
> to keep the accessed bit changes separate from the dirty bit ones.
> 
> *If* somebody has bisectable issues with it (performance or not), it
> will then be clearer what the exact issue is.

Yeah that makes a lot of sense. I'll do a bit more testing and send
Andrew a respin at least with those split (and a good comment for
the dirty bit vs unmap handling that you pointed out).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork
  2018-08-29 23:57         ` Nicholas Piggin
@ 2018-08-29 23:57           ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-08-29 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, linux-arch, Linux Kernel Mailing List, ppc-dev, Andrew Morton

On Wed, 29 Aug 2018 16:15:37 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Aug 29, 2018 at 4:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Dirty micro fault seems to be the big one for my Skylake, takes 300
> > nanoseconds per access. Accessed takes about 100. (I think, have to
> > go over my benchmark a bit more carefully and re-test).  
> 
> Yeah, but they only happen for shared areas after fork, which sounds
> like it shouldn't be a big deal in most cases.

You might be right there.

> 
> And I'm not entirely objecting to your patch per se, I just would want
> to keep the accessed bit changes separate from the dirty bit ones.
> 
> *If* somebody has bisectable issues with it (performance or not), it
> will then be clearer what the exact issue is.

Yeah that makes a lot of sense. I'll do a bit more testing and send
Andrew a respin at least with those split (and a good comment for
the dirty bit vs unmap handling that you pointed out).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-08-28 11:20 ` [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
  2018-08-28 11:20   ` Nicholas Piggin
@ 2018-09-05 14:29   ` Guenter Roeck
  2018-09-05 14:29     ` Guenter Roeck
                       ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Guenter Roeck @ 2018-09-05 14:29 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

Hi,

On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> Similarly to the previous patch, this tries to optimise dirty/accessed
> bits in ptes to avoid access costs of hardware setting them.
> 

This patch results in silent nios2 boot failures, silent meaning that
the boot stalls.

...
Unpacking initramfs...
Freeing initrd memory: 2168K
workingset: timestamp_bits=30 max_order=15 bucket_order=0
jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
random: fast init done
random: crng init done

[no further activity until the qemu session is aborted]

Reverting the patch fixes the problem. Bisect log is attached.

Guenter

---
# bad: [387ac6229ecf6e012649d4fc409c5352655a4cf0] Add linux-next specific files for 20180905
# good: [57361846b52bc686112da6ca5368d11210796804] Linux 4.19-rc2
git bisect start 'HEAD' 'v4.19-rc2'
# good: [668570e8389bb076bea9b7531553e1362f5abd11] Merge remote-tracking branch 'net-next/master'
git bisect good 668570e8389bb076bea9b7531553e1362f5abd11
# good: [7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6] Merge remote-tracking branch 'battery/for-next'
git bisect good 7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6
# good: [c31458d3e03e3a2edeaab225a22eaf68c07c8290] Merge remote-tracking branch 'rpmsg/for-next'
git bisect good c31458d3e03e3a2edeaab225a22eaf68c07c8290
# good: [e0f43dcbe9af8ac72f39fe92c5d0ee1883546427] Merge remote-tracking branch 'nvdimm/libnvdimm-for-next'
git bisect good e0f43dcbe9af8ac72f39fe92c5d0ee1883546427
# bad: [f509e2c0f3cd11df238f0f1b5ba013fe726decdf] of: ignore sub-page memory regions
git bisect bad f509e2c0f3cd11df238f0f1b5ba013fe726decdf
# good: [2f7eebf30b87534f7e4c3982307579d9adc782a5] ocfs2: fix clusters leak in ocfs2_defrag_extent()
git bisect good 2f7eebf30b87534f7e4c3982307579d9adc782a5
# good: [119eb88c9dd23e305939ad748237100078e304a8] mm/swapfile.c: call free_swap_slot() in __swap_entry_free()
git bisect good 119eb88c9dd23e305939ad748237100078e304a8
# good: [21d64d37adf3ab20b4c3a1951018e84bf815c887] mm: remove vm_insert_pfn()
git bisect good 21d64d37adf3ab20b4c3a1951018e84bf815c887
# good: [90cd1a69010844e9dbfc43279d681d798812b962] cramfs: convert to use vmf_insert_mixed
git bisect good 90cd1a69010844e9dbfc43279d681d798812b962
# good: [c7dd91289b4bb4c400a8a71953511991815f8e6f] mm/cow: optimise pte dirty/accessed bits handling in fork
git bisect good c7dd91289b4bb4c400a8a71953511991815f8e6f
# bad: [87d74ae75700a39effcb8c9ed8a8445e719ac369] hexagon: switch to NO_BOOTMEM
git bisect bad 87d74ae75700a39effcb8c9ed8a8445e719ac369
# bad: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
git bisect bad 3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360
# first bad commit: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-05 14:29   ` Guenter Roeck
@ 2018-09-05 14:29     ` Guenter Roeck
  2018-09-05 22:18     ` Nicholas Piggin
  2018-09-17 17:53     ` Nicholas Piggin
  2 siblings, 0 replies; 28+ messages in thread
From: Guenter Roeck @ 2018-09-05 14:29 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

Hi,

On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> Similarly to the previous patch, this tries to optimise dirty/accessed
> bits in ptes to avoid access costs of hardware setting them.
> 

This patch results in silent nios2 boot failures, silent meaning that
the boot stalls.

...
Unpacking initramfs...
Freeing initrd memory: 2168K
workingset: timestamp_bits=30 max_order=15 bucket_order=0
jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
random: fast init done
random: crng init done

[no further activity until the qemu session is aborted]

Reverting the patch fixes the problem. Bisect log is attached.

Guenter

---
# bad: [387ac6229ecf6e012649d4fc409c5352655a4cf0] Add linux-next specific files for 20180905
# good: [57361846b52bc686112da6ca5368d11210796804] Linux 4.19-rc2
git bisect start 'HEAD' 'v4.19-rc2'
# good: [668570e8389bb076bea9b7531553e1362f5abd11] Merge remote-tracking branch 'net-next/master'
git bisect good 668570e8389bb076bea9b7531553e1362f5abd11
# good: [7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6] Merge remote-tracking branch 'battery/for-next'
git bisect good 7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6
# good: [c31458d3e03e3a2edeaab225a22eaf68c07c8290] Merge remote-tracking branch 'rpmsg/for-next'
git bisect good c31458d3e03e3a2edeaab225a22eaf68c07c8290
# good: [e0f43dcbe9af8ac72f39fe92c5d0ee1883546427] Merge remote-tracking branch 'nvdimm/libnvdimm-for-next'
git bisect good e0f43dcbe9af8ac72f39fe92c5d0ee1883546427
# bad: [f509e2c0f3cd11df238f0f1b5ba013fe726decdf] of: ignore sub-page memory regions
git bisect bad f509e2c0f3cd11df238f0f1b5ba013fe726decdf
# good: [2f7eebf30b87534f7e4c3982307579d9adc782a5] ocfs2: fix clusters leak in ocfs2_defrag_extent()
git bisect good 2f7eebf30b87534f7e4c3982307579d9adc782a5
# good: [119eb88c9dd23e305939ad748237100078e304a8] mm/swapfile.c: call free_swap_slot() in __swap_entry_free()
git bisect good 119eb88c9dd23e305939ad748237100078e304a8
# good: [21d64d37adf3ab20b4c3a1951018e84bf815c887] mm: remove vm_insert_pfn()
git bisect good 21d64d37adf3ab20b4c3a1951018e84bf815c887
# good: [90cd1a69010844e9dbfc43279d681d798812b962] cramfs: convert to use vmf_insert_mixed
git bisect good 90cd1a69010844e9dbfc43279d681d798812b962
# good: [c7dd91289b4bb4c400a8a71953511991815f8e6f] mm/cow: optimise pte dirty/accessed bits handling in fork
git bisect good c7dd91289b4bb4c400a8a71953511991815f8e6f
# bad: [87d74ae75700a39effcb8c9ed8a8445e719ac369] hexagon: switch to NO_BOOTMEM
git bisect bad 87d74ae75700a39effcb8c9ed8a8445e719ac369
# bad: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
git bisect bad 3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360
# first bad commit: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-05 14:29   ` Guenter Roeck
  2018-09-05 14:29     ` Guenter Roeck
@ 2018-09-05 22:18     ` Nicholas Piggin
  2018-09-05 22:18       ` Nicholas Piggin
  2018-09-06  0:36       ` Guenter Roeck
  2018-09-17 17:53     ` Nicholas Piggin
  2 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-05 22:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.
> 
> ...
> Unpacking initramfs...
> Freeing initrd memory: 2168K
> workingset: timestamp_bits=30 max_order=15 bucket_order=0
> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
> random: fast init done
> random: crng init done
> 
> [no further activity until the qemu session is aborted]
> 
> Reverting the patch fixes the problem. Bisect log is attached.

Thanks for bisecting it, I'll try to reproduce. Just qemu with no
obscure options? Interesting that it's hit nios2 but apparently not
other archs (yet).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-05 22:18     ` Nicholas Piggin
@ 2018-09-05 22:18       ` Nicholas Piggin
  2018-09-06  0:36       ` Guenter Roeck
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-05 22:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.
> 
> ...
> Unpacking initramfs...
> Freeing initrd memory: 2168K
> workingset: timestamp_bits=30 max_order=15 bucket_order=0
> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
> random: fast init done
> random: crng init done
> 
> [no further activity until the qemu session is aborted]
> 
> Reverting the patch fixes the problem. Bisect log is attached.

Thanks for bisecting it, I'll try to reproduce. Just qemu with no
obscure options? Interesting that it's hit nios2 but apparently not
other archs (yet).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-05 22:18     ` Nicholas Piggin
  2018-09-05 22:18       ` Nicholas Piggin
@ 2018-09-06  0:36       ` Guenter Roeck
  2018-09-06  0:36         ` Guenter Roeck
  1 sibling, 1 reply; 28+ messages in thread
From: Guenter Roeck @ 2018-09-06  0:36 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On 09/05/2018 03:18 PM, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> Hi,
>>
>> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
>>> Similarly to the previous patch, this tries to optimise dirty/accessed
>>> bits in ptes to avoid access costs of hardware setting them.
>>>    
>>
>> This patch results in silent nios2 boot failures, silent meaning that
>> the boot stalls.
>>
>> ...
>> Unpacking initramfs...
>> Freeing initrd memory: 2168K
>> workingset: timestamp_bits=30 max_order=15 bucket_order=0
>> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
>> random: fast init done
>> random: crng init done
>>
>> [no further activity until the qemu session is aborted]
>>
>> Reverting the patch fixes the problem. Bisect log is attached.
> 
> Thanks for bisecting it, I'll try to reproduce. Just qemu with no
> obscure options? Interesting that it's hit nios2 but apparently not
> other archs (yet).
> 

Nothing special. See https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2/.

Guenter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-06  0:36       ` Guenter Roeck
@ 2018-09-06  0:36         ` Guenter Roeck
  0 siblings, 0 replies; 28+ messages in thread
From: Guenter Roeck @ 2018-09-06  0:36 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On 09/05/2018 03:18 PM, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> Hi,
>>
>> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
>>> Similarly to the previous patch, this tries to optimise dirty/accessed
>>> bits in ptes to avoid access costs of hardware setting them.
>>>    
>>
>> This patch results in silent nios2 boot failures, silent meaning that
>> the boot stalls.
>>
>> ...
>> Unpacking initramfs...
>> Freeing initrd memory: 2168K
>> workingset: timestamp_bits=30 max_order=15 bucket_order=0
>> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
>> random: fast init done
>> random: crng init done
>>
>> [no further activity until the qemu session is aborted]
>>
>> Reverting the patch fixes the problem. Bisect log is attached.
> 
> Thanks for bisecting it, I'll try to reproduce. Just qemu with no
> obscure options? Interesting that it's hit nios2 but apparently not
> other archs (yet).
> 

Nothing special. See https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2/.

Guenter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-05 14:29   ` Guenter Roeck
  2018-09-05 14:29     ` Guenter Roeck
  2018-09-05 22:18     ` Nicholas Piggin
@ 2018-09-17 17:53     ` Nicholas Piggin
  2018-09-17 17:53       ` Nicholas Piggin
  2018-09-21  8:42       ` Ley Foon Tan
  2 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-17 17:53 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.

Okay I just got back to looking at this. The reason for the hang is
I think a bug in the nios2 TLB code, but maybe other archs have similar
issues.

In case of a missing / !present Linux pte, nios2 installs a TLB entry
with no permissions via its fast TLB exception handler (software TLB
fill). Then it relies on that causing a TLB permission exception in a
slower handler that calls handle_mm_fault to set the Linux pte and
flushes the old TLB. Then the fast exception handler will find the new
Linux pte.

With this patch, nios2 has a case where handle_mm_fault does not flush
the old TLB, which results in the TLB permission exception continually
being retried.

What happens now is that fault paths like do_read_fault will install a
Linux pte with the young bit clear and return. That will cause nios2 to
fault again but this time go down the bottom of handle_pte_fault and to
the access flags update with the young bit set. The young bit is seen to
be different, so that causes ptep_set_access_flags to do a TLB flush and
that finally allows the fast TLB handler to fire and pick up the new
Linux pte.

With this patch, the young bit is set in the first handle_mm_fault, so
the second handle_mm_fault no longer sees the ptes are different and
does not flush the TLB. The spurious fault handler also does not flush
them unless FAULT_FLAG_WRITE is set.

What nios2 should do is invalidate the TLB in update_mmu_cache. What it
*really* should do is install the new TLB entry, I have some patches to
make that work in qemu I can submit. But I would like to try getting
these dirty/accessed bit optimisation in 4.20, so I will send a simple
path to just do the TLB invalidate that could go in Andrew's git tree.

Is that agreeable with the nios2 maintainers?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-17 17:53     ` Nicholas Piggin
@ 2018-09-17 17:53       ` Nicholas Piggin
  2018-09-21  8:42       ` Ley Foon Tan
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-17 17:53 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.

Okay I just got back to looking at this. The reason for the hang is
I think a bug in the nios2 TLB code, but maybe other archs have similar
issues.

In case of a missing / !present Linux pte, nios2 installs a TLB entry
with no permissions via its fast TLB exception handler (software TLB
fill). Then it relies on that causing a TLB permission exception in a
slower handler that calls handle_mm_fault to set the Linux pte and
flushes the old TLB. Then the fast exception handler will find the new
Linux pte.

With this patch, nios2 has a case where handle_mm_fault does not flush
the old TLB, which results in the TLB permission exception continually
being retried.

What happens now is that fault paths like do_read_fault will install a
Linux pte with the young bit clear and return. That will cause nios2 to
fault again but this time go down the bottom of handle_pte_fault and to
the access flags update with the young bit set. The young bit is seen to
be different, so that causes ptep_set_access_flags to do a TLB flush and
that finally allows the fast TLB handler to fire and pick up the new
Linux pte.

With this patch, the young bit is set in the first handle_mm_fault, so
the second handle_mm_fault no longer sees the ptes are different and
does not flush the TLB. The spurious fault handler also does not flush
them unless FAULT_FLAG_WRITE is set.

What nios2 should do is invalidate the TLB in update_mmu_cache. What it
*really* should do is install the new TLB entry, I have some patches to
make that work in qemu I can submit. But I would like to try getting
these dirty/accessed bit optimisation in 4.20, so I will send a simple
path to just do the TLB invalidate that could go in Andrew's git tree.

Is that agreeable with the nios2 maintainers?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-17 17:53     ` Nicholas Piggin
  2018-09-17 17:53       ` Nicholas Piggin
@ 2018-09-21  8:42       ` Ley Foon Tan
  2018-09-21  8:42         ` Ley Foon Tan
  2018-09-23  9:23         ` Nicholas Piggin
  1 sibling, 2 replies; 28+ messages in thread
From: Ley Foon Tan @ 2018-09-21  8:42 UTC (permalink / raw)
  To: Nicholas Piggin, Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > 
> > Hi,
> > 
> > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > > 
> > > Similarly to the previous patch, this tries to optimise
> > > dirty/accessed
> > > bits in ptes to avoid access costs of hardware setting them.
> > > 
> > This patch results in silent nios2 boot failures, silent meaning
> > that
> > the boot stalls.
> Okay I just got back to looking at this. The reason for the hang is
> I think a bug in the nios2 TLB code, but maybe other archs have
> similar
> issues.
> 
> In case of a missing / !present Linux pte, nios2 installs a TLB entry
> with no permissions via its fast TLB exception handler (software TLB
> fill). Then it relies on that causing a TLB permission exception in a
> slower handler that calls handle_mm_fault to set the Linux pte and
> flushes the old TLB. Then the fast exception handler will find the
> new
> Linux pte.
> 
> With this patch, nios2 has a case where handle_mm_fault does not
> flush
> the old TLB, which results in the TLB permission exception
> continually
> being retried.
> 
> What happens now is that fault paths like do_read_fault will install
> a
> Linux pte with the young bit clear and return. That will cause nios2
> to
> fault again but this time go down the bottom of handle_pte_fault and
> to
> the access flags update with the young bit set. The young bit is seen
> to
> be different, so that causes ptep_set_access_flags to do a TLB flush
> and
> that finally allows the fast TLB handler to fire and pick up the new
> Linux pte.
> 
> With this patch, the young bit is set in the first handle_mm_fault,
> so
> the second handle_mm_fault no longer sees the ptes are different and
> does not flush the TLB. The spurious fault handler also does not
> flush
> them unless FAULT_FLAG_WRITE is set.
> 
> What nios2 should do is invalidate the TLB in update_mmu_cache. What
> it
> *really* should do is install the new TLB entry, I have some patches
> to
> make that work in qemu I can submit. But I would like to try getting
> these dirty/accessed bit optimisation in 4.20, so I will send a
> simple
> path to just do the TLB invalidate that could go in Andrew's git
> tree.
> 
> Is that agreeable with the nios2 maintainers?
> 
> Thanks,
> Nick
> 
Hi

Do you have patches to test?

Regards
Ley Foon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-21  8:42       ` Ley Foon Tan
@ 2018-09-21  8:42         ` Ley Foon Tan
  2018-09-23  9:23         ` Nicholas Piggin
  1 sibling, 0 replies; 28+ messages in thread
From: Ley Foon Tan @ 2018-09-21  8:42 UTC (permalink / raw)
  To: Nicholas Piggin, Guenter Roeck
  Cc: linux-mm, linux-arch, linux-kernel, linuxppc-dev, Andrew Morton,
	Linus Torvalds, Ley Foon Tan, nios2-dev

On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > 
> > Hi,
> > 
> > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > > 
> > > Similarly to the previous patch, this tries to optimise
> > > dirty/accessed
> > > bits in ptes to avoid access costs of hardware setting them.
> > > 
> > This patch results in silent nios2 boot failures, silent meaning
> > that
> > the boot stalls.
> Okay I just got back to looking at this. The reason for the hang is
> I think a bug in the nios2 TLB code, but maybe other archs have
> similar
> issues.
> 
> In case of a missing / !present Linux pte, nios2 installs a TLB entry
> with no permissions via its fast TLB exception handler (software TLB
> fill). Then it relies on that causing a TLB permission exception in a
> slower handler that calls handle_mm_fault to set the Linux pte and
> flushes the old TLB. Then the fast exception handler will find the
> new
> Linux pte.
> 
> With this patch, nios2 has a case where handle_mm_fault does not
> flush
> the old TLB, which results in the TLB permission exception
> continually
> being retried.
> 
> What happens now is that fault paths like do_read_fault will install
> a
> Linux pte with the young bit clear and return. That will cause nios2
> to
> fault again but this time go down the bottom of handle_pte_fault and
> to
> the access flags update with the young bit set. The young bit is seen
> to
> be different, so that causes ptep_set_access_flags to do a TLB flush
> and
> that finally allows the fast TLB handler to fire and pick up the new
> Linux pte.
> 
> With this patch, the young bit is set in the first handle_mm_fault,
> so
> the second handle_mm_fault no longer sees the ptes are different and
> does not flush the TLB. The spurious fault handler also does not
> flush
> them unless FAULT_FLAG_WRITE is set.
> 
> What nios2 should do is invalidate the TLB in update_mmu_cache. What
> it
> *really* should do is install the new TLB entry, I have some patches
> to
> make that work in qemu I can submit. But I would like to try getting
> these dirty/accessed bit optimisation in 4.20, so I will send a
> simple
> path to just do the TLB invalidate that could go in Andrew's git
> tree.
> 
> Is that agreeable with the nios2 maintainers?
> 
> Thanks,
> Nick
> 
Hi

Do you have patches to test?

Regards
Ley Foon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-21  8:42       ` Ley Foon Tan
  2018-09-21  8:42         ` Ley Foon Tan
@ 2018-09-23  9:23         ` Nicholas Piggin
  2018-09-23  9:23           ` Nicholas Piggin
  1 sibling, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-23  9:23 UTC (permalink / raw)
  To: Ley Foon Tan
  Cc: Guenter Roeck, linux-mm, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds, Ley Foon Tan, nios2-dev

On Fri, 21 Sep 2018 16:42:05 +0800
Ley Foon Tan <ley.foon.tan@intel.com> wrote:

> On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> > On Wed, 5 Sep 2018 07:29:51 -0700
> > Guenter Roeck <linux@roeck-us.net> wrote:
> >   
> > > 
> > > Hi,
> > > 
> > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:  
> > > > 
> > > > Similarly to the previous patch, this tries to optimise
> > > > dirty/accessed
> > > > bits in ptes to avoid access costs of hardware setting them.
> > > >   
> > > This patch results in silent nios2 boot failures, silent meaning
> > > that
> > > the boot stalls.  
> > Okay I just got back to looking at this. The reason for the hang is
> > I think a bug in the nios2 TLB code, but maybe other archs have
> > similar
> > issues.
> > 
> > In case of a missing / !present Linux pte, nios2 installs a TLB entry
> > with no permissions via its fast TLB exception handler (software TLB
> > fill). Then it relies on that causing a TLB permission exception in a
> > slower handler that calls handle_mm_fault to set the Linux pte and
> > flushes the old TLB. Then the fast exception handler will find the
> > new
> > Linux pte.
> > 
> > With this patch, nios2 has a case where handle_mm_fault does not
> > flush
> > the old TLB, which results in the TLB permission exception
> > continually
> > being retried.
> > 
> > What happens now is that fault paths like do_read_fault will install
> > a
> > Linux pte with the young bit clear and return. That will cause nios2
> > to
> > fault again but this time go down the bottom of handle_pte_fault and
> > to
> > the access flags update with the young bit set. The young bit is seen
> > to
> > be different, so that causes ptep_set_access_flags to do a TLB flush
> > and
> > that finally allows the fast TLB handler to fire and pick up the new
> > Linux pte.
> > 
> > With this patch, the young bit is set in the first handle_mm_fault,
> > so
> > the second handle_mm_fault no longer sees the ptes are different and
> > does not flush the TLB. The spurious fault handler also does not
> > flush
> > them unless FAULT_FLAG_WRITE is set.
> > 
> > What nios2 should do is invalidate the TLB in update_mmu_cache. What
> > it
> > *really* should do is install the new TLB entry, I have some patches
> > to
> > make that work in qemu I can submit. But I would like to try getting
> > these dirty/accessed bit optimisation in 4.20, so I will send a
> > simple
> > path to just do the TLB invalidate that could go in Andrew's git
> > tree.
> > 
> > Is that agreeable with the nios2 maintainers?
> > 
> > Thanks,
> > Nick
> >   
> Hi
> 
> Do you have patches to test?

I've been working on some, it has taken longer than I expected, I'll
hopefully have something to send out by tomorrow.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
  2018-09-23  9:23         ` Nicholas Piggin
@ 2018-09-23  9:23           ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2018-09-23  9:23 UTC (permalink / raw)
  To: Ley Foon Tan
  Cc: Guenter Roeck, linux-mm, linux-arch, linux-kernel, linuxppc-dev,
	Andrew Morton, Linus Torvalds, Ley Foon Tan, nios2-dev

On Fri, 21 Sep 2018 16:42:05 +0800
Ley Foon Tan <ley.foon.tan@intel.com> wrote:

> On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> > On Wed, 5 Sep 2018 07:29:51 -0700
> > Guenter Roeck <linux@roeck-us.net> wrote:
> >   
> > > 
> > > Hi,
> > > 
> > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:  
> > > > 
> > > > Similarly to the previous patch, this tries to optimise
> > > > dirty/accessed
> > > > bits in ptes to avoid access costs of hardware setting them.
> > > >   
> > > This patch results in silent nios2 boot failures, silent meaning
> > > that
> > > the boot stalls.  
> > Okay I just got back to looking at this. The reason for the hang is
> > I think a bug in the nios2 TLB code, but maybe other archs have
> > similar
> > issues.
> > 
> > In case of a missing / !present Linux pte, nios2 installs a TLB entry
> > with no permissions via its fast TLB exception handler (software TLB
> > fill). Then it relies on that causing a TLB permission exception in a
> > slower handler that calls handle_mm_fault to set the Linux pte and
> > flushes the old TLB. Then the fast exception handler will find the
> > new
> > Linux pte.
> > 
> > With this patch, nios2 has a case where handle_mm_fault does not
> > flush
> > the old TLB, which results in the TLB permission exception
> > continually
> > being retried.
> > 
> > What happens now is that fault paths like do_read_fault will install
> > a
> > Linux pte with the young bit clear and return. That will cause nios2
> > to
> > fault again but this time go down the bottom of handle_pte_fault and
> > to
> > the access flags update with the young bit set. The young bit is seen
> > to
> > be different, so that causes ptep_set_access_flags to do a TLB flush
> > and
> > that finally allows the fast TLB handler to fire and pick up the new
> > Linux pte.
> > 
> > With this patch, the young bit is set in the first handle_mm_fault,
> > so
> > the second handle_mm_fault no longer sees the ptes are different and
> > does not flush the TLB. The spurious fault handler also does not
> > flush
> > them unless FAULT_FLAG_WRITE is set.
> > 
> > What nios2 should do is invalidate the TLB in update_mmu_cache. What
> > it
> > *really* should do is install the new TLB entry, I have some patches
> > to
> > make that work in qemu I can submit. But I would like to try getting
> > these dirty/accessed bit optimisation in 4.20, so I will send a
> > simple
> > path to just do the TLB invalidate that could go in Andrew's git
> > tree.
> > 
> > Is that agreeable with the nios2 maintainers?
> > 
> > Thanks,
> > Nick
> >   
> Hi
> 
> Do you have patches to test?

I've been working on some, it has taken longer than I expected, I'll
hopefully have something to send out by tomorrow.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-09-23 15:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 11:20 [PATCH 0/3] mm: dirty/accessed pte optimisations Nicholas Piggin
2018-08-28 11:20 ` Nicholas Piggin
2018-08-28 11:20 ` [PATCH 1/3] mm/cow: don't bother write protectig already write-protected huge pages Nicholas Piggin
2018-08-28 11:20   ` Nicholas Piggin
2018-08-28 11:20 ` [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Nicholas Piggin
2018-08-28 11:20   ` Nicholas Piggin
2018-08-29 15:42   ` Linus Torvalds
2018-08-29 15:42     ` Linus Torvalds
2018-08-29 23:12     ` Nicholas Piggin
2018-08-29 23:12       ` Nicholas Piggin
2018-08-29 23:15       ` Linus Torvalds
2018-08-29 23:15         ` Linus Torvalds
2018-08-29 23:57         ` Nicholas Piggin
2018-08-29 23:57           ` Nicholas Piggin
2018-08-28 11:20 ` [PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
2018-08-28 11:20   ` Nicholas Piggin
2018-09-05 14:29   ` Guenter Roeck
2018-09-05 14:29     ` Guenter Roeck
2018-09-05 22:18     ` Nicholas Piggin
2018-09-05 22:18       ` Nicholas Piggin
2018-09-06  0:36       ` Guenter Roeck
2018-09-06  0:36         ` Guenter Roeck
2018-09-17 17:53     ` Nicholas Piggin
2018-09-17 17:53       ` Nicholas Piggin
2018-09-21  8:42       ` Ley Foon Tan
2018-09-21  8:42         ` Ley Foon Tan
2018-09-23  9:23         ` Nicholas Piggin
2018-09-23  9:23           ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).