All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: linux-mm@kvack.org
Cc: Nicholas Piggin <npiggin@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Bibo Mao <maobibo@loongson.cn>
Subject: [PATCH v3 2/3] mm/cow: optimise pte accessed bit handling in fork
Date: Sun, 20 Dec 2020 14:55:34 +1000	[thread overview]
Message-ID: <20201220045535.848591-3-npiggin@gmail.com> (raw)
In-Reply-To: <20201220045535.848591-1-npiggin@gmail.com>

fork clears dirty/accessed bits from new ptes in the child. This logic
has existed since mapped page reclaim was done by scanning ptes when
it may have been quite important. Today with physical based pte
scanning, there is less reason to clear these bits, so this patch
avoids clearing the accessed bit in the child.

Any accessed bit is treated similarly to many, with the difference
today with > 1 referenced bit causing the page to be activated, while
1 bit causes it to be kept. This patch causes pages shared by fork(2)
to be more readily activated, but this heuristic is very fuzzy anyway
-- a page can be accessed by multiple threads via a single pte and be
just as important as one that is accessed via multiple ptes, for
example. In the end I don't believe fork(2) is a significant driver of
page reclaim behaviour that this should matter too much.

This and the following change eliminate a major source of faults that
powerpc/radix requires to set dirty/accessed bits in ptes, speeding
up a fork/exit microbenchmark by about 5% on POWER9 (16600 -> 17500
fork/execs per second).

Skylake appears to have a micro-fault overhead too -- a test which
allocates 4GB anonymous memory, reads each page, then forks, and times
the child reading a byte from each page. The first pass over the pages
takes about 1000 cycles per page, the second pass takes about 27
cycles (TLB miss). With no additional minor faults measured due to
either child pass, and the page array well exceeding TLB capacity, the
large cost must be micro faults caused by setting the accessed bit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 2 --
 mm/memory.c      | 1 -
 mm/vmscan.c      | 5 +++++
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87da60c583a9..f2ca0326b5af 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1115,7 +1115,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pmdp_set_wrprotect(src_mm, addr, src_pmd);
 		pmd = pmd_wrprotect(pmd);
 	}
-	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1225,7 +1224,6 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pudp_set_wrprotect(src_mm, addr, src_pud);
 		pud = pud_mkold(pud_wrprotect(pud));
 	}
-	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
diff --git a/mm/memory.c b/mm/memory.c
index 990e5d704c08..dd1f364d8ca3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -886,7 +886,6 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 	 */
 	if (vm_flags & VM_SHARED)
 		pte = pte_mkclean(pte);
-	pte = pte_mkold(pte);
 
 	/*
 	 * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 257cba79a96d..604ead623842 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1012,6 +1012,11 @@ static enum page_references page_check_references(struct page *page,
 		 * Note: the mark is set for activated pages as well
 		 * so that recently deactivated but used pages are
 		 * quickly recovered.
+		 *
+		 * Note: fork() will copy referenced bit from parent
+		 * to child ptes, despite not having been accessed by
+		 * the child. This is to avoid micro-faults on initial
+		 * access.
 		 */
 		SetPageReferenced(page);
 
-- 
2.23.0



  parent reply	other threads:[~2020-12-20  4:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-20  4:55 [PATCH v3 0/3] mm: improve pte updates and dirty/accessed Nicholas Piggin
2020-12-20  4:55 ` [PATCH v3 1/3] mm/cow: don't bother write protecting already write-protected huge pages Nicholas Piggin
2020-12-20  4:55 ` Nicholas Piggin [this message]
2020-12-20  4:55 ` [PATCH v3 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion Nicholas Piggin
2020-12-21 18:21   ` Hugh Dickins
2020-12-22  3:24     ` Nicholas Piggin
2020-12-23  0:56       ` Huang Pei
2020-12-20 18:00 ` [PATCH v3 0/3] mm: improve pte updates and dirty/accessed Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201220045535.848591-3-npiggin@gmail.com \
    --to=npiggin@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=maobibo@loongson.cn \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.