linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 05/25] mm/arm: Use mm_fault_accounting()
       [not found] <20200615221607.7764-1-peterx@redhat.com>
@ 2020-06-15 22:15 ` Peter Xu
  2020-06-15 22:15 ` [PATCH 06/25] mm/arm64: " Peter Xu
  2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2020-06-15 22:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrea Arcangeli, Will Deacon, Russell King, peterx,
	linux-arm-kernel, Andrew Morton, Linus Torvalds, Gerald Schaefer

Use the new mm_fault_accounting() helper for page fault accounting.

Avoid doing page fault accounting multiple times if the page fault is retried.
Meanwhile, take the page fault as a major fault as long as any of the retried
page fault is a major fault.

CC: Russell King <linux@armlinux.org.uk>
CC: Will Deacon <will@kernel.org>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm/mm/fault.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2dd5c41cbb8d..92d4436e74da 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -240,7 +240,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	int sig, code;
-	vm_fault_t fault;
+	vm_fault_t fault, major = 0;
 	unsigned int flags = FAULT_FLAG_DEFAULT;
 
 	if (kprobe_page_fault(regs, fsr))
@@ -290,6 +290,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	}
 
 	fault = __do_page_fault(mm, addr, fsr, flags, tsk);
+	major |= fault & VM_FAULT_MAJOR;
 
 	/* If we need to retry but a fatal signal is pending, handle the
 	 * signal first. We do not need to release the mmap_sem because
@@ -301,23 +302,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		return 0;
 	}
 
-	/*
-	 * Major/minor page fault accounting is only done on the
-	 * initial attempt. If we go through a retry, it is extremely
-	 * likely that the page will be found in page cache at that point.
-	 */
-
-	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 	if (!(fault & VM_FAULT_ERROR) && flags & FAULT_FLAG_ALLOW_RETRY) {
-		if (fault & VM_FAULT_MAJOR) {
-			tsk->maj_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
-					regs, addr);
-		} else {
-			tsk->min_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
-					regs, addr);
-		}
 		if (fault & VM_FAULT_RETRY) {
 			flags |= FAULT_FLAG_TRIED;
 			goto retry;
@@ -326,6 +311,8 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 
 	up_read(&mm->mmap_sem);
 
+	mm_fault_accounting(tsk, regs, addr, major);
+
 	/*
 	 * Handle the "normal" case first - VM_FAULT_MAJOR
 	 */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 06/25] mm/arm64: Use mm_fault_accounting()
       [not found] <20200615221607.7764-1-peterx@redhat.com>
  2020-06-15 22:15 ` [PATCH 05/25] mm/arm: Use mm_fault_accounting() Peter Xu
@ 2020-06-15 22:15 ` Peter Xu
  2020-06-16  7:43   ` Will Deacon
  2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2020-06-15 22:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrea Arcangeli, Will Deacon, Catalin Marinas, peterx,
	linux-arm-kernel, Andrew Morton, Linus Torvalds, Gerald Schaefer

Use the new mm_fault_accounting() helper for page fault accounting.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will@kernel.org>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/mm/fault.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c9cedc0432d2..09af7d7a60ec 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -484,8 +484,6 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 					 addr, esr, regs);
 	}
 
-	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
-
 	/*
 	 * As per x86, we may deadlock here. However, since the kernel only
 	 * validly references user space from well defined areas of the code,
@@ -535,20 +533,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			      VM_FAULT_BADACCESS)))) {
 		/*
 		 * Major/minor page fault accounting is only done
-		 * once. If we go through a retry, it is extremely
-		 * likely that the page will be found in page cache at
-		 * that point.
+		 * once.
 		 */
-		if (major) {
-			current->maj_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
-				      addr);
-		} else {
-			current->min_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
-				      addr);
-		}
-
+		mm_fault_accounting(current, regs, address, major);
 		return 0;
 	}
 
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 06/25] mm/arm64: Use mm_fault_accounting()
  2020-06-15 22:15 ` [PATCH 06/25] mm/arm64: " Peter Xu
@ 2020-06-16  7:43   ` Will Deacon
  2020-06-16 15:59     ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2020-06-16  7:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Catalin Marinas, linux-kernel,
	linux-arm-kernel, Andrew Morton, Linus Torvalds, Gerald Schaefer

On Mon, Jun 15, 2020 at 06:15:48PM -0400, Peter Xu wrote:
> Use the new mm_fault_accounting() helper for page fault accounting.
> 
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will@kernel.org>
> CC: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/arm64/mm/fault.c | 17 ++---------------
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index c9cedc0432d2..09af7d7a60ec 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -484,8 +484,6 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  					 addr, esr, regs);
>  	}
>  
> -	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
> -
>  	/*
>  	 * As per x86, we may deadlock here. However, since the kernel only
>  	 * validly references user space from well defined areas of the code,
> @@ -535,20 +533,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  			      VM_FAULT_BADACCESS)))) {
>  		/*
>  		 * Major/minor page fault accounting is only done
> -		 * once. If we go through a retry, it is extremely
> -		 * likely that the page will be found in page cache at
> -		 * that point.
> +		 * once.
>  		 */
> -		if (major) {
> -			current->maj_flt++;
> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
> -				      addr);
> -		} else {
> -			current->min_flt++;
> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
> -				      addr);
> -		}
> -
> +		mm_fault_accounting(current, regs, address, major);

Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS
update like this? Seems like a user-visible change to me, so some
justification would really help.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 06/25] mm/arm64: Use mm_fault_accounting()
  2020-06-16  7:43   ` Will Deacon
@ 2020-06-16 15:59     ` Peter Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2020-06-16 15:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrea Arcangeli, Catalin Marinas, linux-kernel,
	linux-arm-kernel, Andrew Morton, Linus Torvalds, Gerald Schaefer

Hi, Will,

On Tue, Jun 16, 2020 at 08:43:08AM +0100, Will Deacon wrote:
> Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS
> update like this? Seems like a user-visible change to me, so some
> justification would really help.

Indeed this could be a functional change for PERF_COUNT_SW_PAGE_FAULTS on some
archs, e.g., for arm64, PERF_COUNT_SW_PAGE_FAULTS previously will also contain
accounting of severe errors where we go into the "no_context" path.  However if
you see the other archs, it's not always true, for example, the xtensa arch
only accounts the correctly handled faults (arch/xtensa/mm/fault.c).

After I thought about this, I don't think it's extremely useful (or please
correct me if I missed something important) to use PERF_COUNT_SW_PAGE_FAULTS
for fatal error accountings among all those correct ones.  After all they are
really extremely rare cases, and even if we got a sigbus for a process, we'll
normally got something dumped in dmesg so if we really want to capture the
error cases there should always be a better way (because by following things
like dmesg we can not only know how many error faults triggered, but also on
the details of the errors).

IOW, my understanding of users of PERF_COUNT_SW_PAGE_FAULTS is that they want
to trap normal/correct page faults, not really care about rare errors.

Then when I went back to think PERF_COUNT_SW_PAGE_FAULTS, it's really about:

  A=PERF_COUNT_SW_PAGE_FAULTS 
  B=PERF_COUNT_SW_PAGE_FAULTS_MAJ
  C=PERF_COUNT_SW_PAGE_FAULTS_MIN

And:

  A=B+C

If that's the case (which is simple and clear), it's really helpful too that we
unify this definition across all the architectures, then it'll also be easier
for us to provide some helper like mm_fault_accounting() so that the accounting
can be managed in the general code rather than in arch-specific ways.

Thanks,

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/25] mm: Page fault accounting cleanups
       [not found] <20200615221607.7764-1-peterx@redhat.com>
  2020-06-15 22:15 ` [PATCH 05/25] mm/arm: Use mm_fault_accounting() Peter Xu
  2020-06-15 22:15 ` [PATCH 06/25] mm/arm64: " Peter Xu
@ 2020-06-16 18:55 ` Linus Torvalds
  2020-06-16 21:03   ` Peter Xu
  2020-06-17  0:55   ` Michael Ellerman
  2 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2020-06-16 18:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, linux-arch, linux-s390, Catalin Marinas,
	Linux Kernel Mailing List, openrisc, Linux ARM, Andrew Morton,
	Will Deacon, Alexander Gordeev, Gerald Schaefer

[-- Attachment #1: Type: text/plain, Size: 2788 bytes --]

On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@redhat.com> wrote:
>
> This series tries to address all of them by introducing mm_fault_accounting()
> first, so that we move all the page fault accounting into the common code base,
> then call it properly from arch pf handlers just like handle_mm_fault().

Hmm.

So having looked at this a bit more, I'd actually like to go even
further, and just get rid of the per-architecture code _entirely_.

Here's a straw-man patch to the generic code - the idea is mostly laid
out in the comment that I'm just quoting here directly too:

        /*
         * Do accounting in the common code, to avoid unnecessary
         * architecture differences or duplicated code.
         *
         * We arbitrarily make the rules be:
         *
         *  - faults that never even got here (because the address
         *    wasn't valid). That includes arch_vma_access_permitted()
         *    failing above.
         *
         *    So this is expressly not a "this many hardware page
         *    faults" counter. Use the hw profiling for that.
         *
         *  - incomplete faults (ie RETRY) do not count (see above).
         *    They will only count once completed.
         *
         *  - the fault counts as a "major" fault when the final
         *    successful fault is VM_FAULT_MAJOR, or if it was a
         *    retry (which implies that we couldn't handle it
         *    immediately previously).
         *
         *  - if the fault is done for GUP, regs wil be NULL and
         *    no accounting will be done (but you _could_ pass in
         *    your own regs and it would be accounted to the thread
         *    doing the fault, not to the target!)
         */

the code itself in the patch is

 (a) pretty trivial and self-evident

 (b) INCOMPLETE

that (b) is worth noting: this patch won't compile on its own. It
intentionally leaves all the users without the new 'regs' argument,
because you obviously simply need to remove all the code that
currently tries to do any accounting.

Comments?

This is a bigger change, but I think it might be worth it to _really_
consolidate the major/minor logic.

One detail worth noting: I do wonder if we should put the

    perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);

just in the arch code at the top of the fault handling, and consider
it entirely unrelated to the major/minor fault handling. The
major/minor faults fundamnetally are about successes. But the plain
PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including
things that never even get to this point at all.

I'm not convinced it's useful to have three SW events that are defined
to be A=B+C.

But this does *not* do that part. It' sreally just an RFC patch.

                    Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 2918 bytes --]

 include/linux/mm.h |  3 ++-
 mm/memory.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc7b87310c10..e82a604339c0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1648,7 +1648,8 @@ int invalidate_inode_page(struct page *page);
 
 #ifdef CONFIG_MMU
 extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
-			unsigned long address, unsigned int flags);
+			unsigned long address, unsigned int flags,
+			struct pt_regs *);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			    unsigned long address, unsigned int fault_flags,
 			    bool *unlocked);
diff --git a/mm/memory.c b/mm/memory.c
index dc7f3543b1fd..f15056151fb1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -72,6 +72,7 @@
 #include <linux/oom.h>
 #include <linux/numa.h>
 
+#include <linux/perf_event.h>
 #include <trace/events/kmem.h>
 
 #include <asm/io.h>
@@ -4353,7 +4354,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
  * return value.  See filemap_fault() and __lock_page_or_retry().
  */
 vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
-		unsigned int flags)
+		unsigned int flags, struct pt_regs *regs)
 {
 	vm_fault_t ret;
 
@@ -4394,6 +4395,49 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 			mem_cgroup_oom_synchronize(false);
 	}
 
+	if (ret & VM_FAULT_RETRY)
+		return ret;
+
+	/*
+	 * Do accounting in the common code, to avoid unnecessary
+	 * architecture differences or duplicated code.
+	 *
+	 * We arbitrarily make the rules be:
+	 *
+	 *  - faults that never even got here (because the address
+	 *    wasn't valid). That includes arch_vma_access_permitted()
+	 *    failing above.
+	 *
+	 *    So this is expressly not a "this many hardware page
+	 *    faults" counter. Use the hw profiling for that.
+	 *
+	 *  - incomplete faults (ie RETRY) do not count (see above).
+	 *    They will only count once completed.
+	 *
+	 *  - the fault counts as a "major" fault when the final
+	 *    successful fault is VM_FAULT_MAJOR, or if it was a
+	 *    retry (which implies that we couldn't handle it
+	 *    immediately previously).
+	 *
+	 *  - if the fault is done for GUP, regs wil be NULL and
+	 *    no accounting will be done (but you _could_ pass in
+	 *    your own regs and it would be accounted to the thread
+	 *    doing the fault, not to the target!)
+	 */
+
+	if (!regs)
+		return ret;
+
+	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+
+	if ((ret & VM_FAULT_MAJOR) || (flags & FAULT_FLAG_TRIED)) {
+		current->maj_flt++;
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+	} else {
+		current->min_flt++;
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
+	}
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(handle_mm_fault);

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/25] mm: Page fault accounting cleanups
  2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
@ 2020-06-16 21:03   ` Peter Xu
  2020-06-17  0:55   ` Michael Ellerman
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Xu @ 2020-06-16 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, linux-arch, linux-s390, Catalin Marinas,
	Linux Kernel Mailing List, openrisc, Linux ARM, Andrew Morton,
	Will Deacon, Alexander Gordeev, Gerald Schaefer

On Tue, Jun 16, 2020 at 11:55:17AM -0700, Linus Torvalds wrote:
> On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > This series tries to address all of them by introducing mm_fault_accounting()
> > first, so that we move all the page fault accounting into the common code base,
> > then call it properly from arch pf handlers just like handle_mm_fault().
> 
> Hmm.
> 
> So having looked at this a bit more, I'd actually like to go even
> further, and just get rid of the per-architecture code _entirely_.
> 
> Here's a straw-man patch to the generic code - the idea is mostly laid
> out in the comment that I'm just quoting here directly too:
> 
>         /*
>          * Do accounting in the common code, to avoid unnecessary
>          * architecture differences or duplicated code.
>          *
>          * We arbitrarily make the rules be:
>          *
>          *  - faults that never even got here (because the address
>          *    wasn't valid). That includes arch_vma_access_permitted()
>          *    failing above.
>          *
>          *    So this is expressly not a "this many hardware page
>          *    faults" counter. Use the hw profiling for that.
>          *
>          *  - incomplete faults (ie RETRY) do not count (see above).
>          *    They will only count once completed.
>          *
>          *  - the fault counts as a "major" fault when the final
>          *    successful fault is VM_FAULT_MAJOR, or if it was a
>          *    retry (which implies that we couldn't handle it
>          *    immediately previously).
>          *
>          *  - if the fault is done for GUP, regs wil be NULL and
>          *    no accounting will be done (but you _could_ pass in
>          *    your own regs and it would be accounted to the thread
>          *    doing the fault, not to the target!)
>          */
> 
> the code itself in the patch is
> 
>  (a) pretty trivial and self-evident
> 
>  (b) INCOMPLETE
> 
> that (b) is worth noting: this patch won't compile on its own. It
> intentionally leaves all the users without the new 'regs' argument,
> because you obviously simply need to remove all the code that
> currently tries to do any accounting.
> 
> Comments?

Looks clean to me.  The definition of "major faults" will slightly change even
for those who has done the "major |= fault & MAJOR" operations before, but at
least I can't see anything bad about that either...

To make things easier, we can use the 1st patch to introduce this change,
however pass regs==NULL at the callers to never trigger this accounting.  Then
we can still use one patch for each arch to do the final convertions.

> 
> This is a bigger change, but I think it might be worth it to _really_
> consolidate the major/minor logic.
> 
> One detail worth noting: I do wonder if we should put the
> 
>     perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
> 
> just in the arch code at the top of the fault handling, and consider
> it entirely unrelated to the major/minor fault handling. The
> major/minor faults fundamnetally are about successes. But the plain
> PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including
> things that never even get to this point at all.
> 
> I'm not convinced it's useful to have three SW events that are defined
> to be A=B+C.

IMHO it's still common to have a "total" statistics in softwares even if each
of the subsets are accounted separately.  Here it's just a bit special because
there're only two elements so the addition is so straightforward.  It seems a
trade-off on whether we'd like to do the accounting of errornous faults, or we
want to make it cleaner by put them altogether but only successful page faults.
I slightly preferred the latter due to the fact that I failed to find great
usefulness out of keeping error fault accountings, but no strong opinions..

Thanks,

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/25] mm: Page fault accounting cleanups
  2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
  2020-06-16 21:03   ` Peter Xu
@ 2020-06-17  0:55   ` Michael Ellerman
  2020-06-17  8:04     ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-06-17  0:55 UTC (permalink / raw)
  To: Linus Torvalds, Peter Xu
  Cc: Andrea Arcangeli, linux-arch, linux-s390, Catalin Marinas,
	Linux Kernel Mailing List, openrisc, Linux ARM, Andrew Morton,
	Will Deacon, Alexander Gordeev, Gerald Schaefer

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@redhat.com> wrote:
>> This series tries to address all of them by introducing mm_fault_accounting()
>> first, so that we move all the page fault accounting into the common code base,
>> then call it properly from arch pf handlers just like handle_mm_fault().
>
> Hmm.
>
> So having looked at this a bit more, I'd actually like to go even
> further, and just get rid of the per-architecture code _entirely_.

<snip>

> One detail worth noting: I do wonder if we should put the
>
>     perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
>
> just in the arch code at the top of the fault handling, and consider
> it entirely unrelated to the major/minor fault handling. The
> major/minor faults fundamnetally are about successes. But the plain
> PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including
> things that never even get to this point at all.

Yeah I think we should keep it in the arch code at roughly the top.

If it's moved to the end you could have a process spinning taking bad
page faults (and fixing them up), and see no sign of it from the perf
page fault counters.

cheers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/25] mm: Page fault accounting cleanups
  2020-06-17  0:55   ` Michael Ellerman
@ 2020-06-17  8:04     ` Will Deacon
  2020-06-17 16:10       ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2020-06-17  8:04 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrea Arcangeli, linux-arch, linux-s390, Catalin Marinas,
	Linux Kernel Mailing List, Peter Xu, openrisc, Linux ARM,
	Andrew Morton, Linus Torvalds, Alexander Gordeev,
	Gerald Schaefer

On Wed, Jun 17, 2020 at 10:55:14AM +1000, Michael Ellerman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@redhat.com> wrote:
> >> This series tries to address all of them by introducing mm_fault_accounting()
> >> first, so that we move all the page fault accounting into the common code base,
> >> then call it properly from arch pf handlers just like handle_mm_fault().
> >
> > Hmm.
> >
> > So having looked at this a bit more, I'd actually like to go even
> > further, and just get rid of the per-architecture code _entirely_.
> 
> <snip>
> 
> > One detail worth noting: I do wonder if we should put the
> >
> >     perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
> >
> > just in the arch code at the top of the fault handling, and consider
> > it entirely unrelated to the major/minor fault handling. The
> > major/minor faults fundamnetally are about successes. But the plain
> > PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including
> > things that never even get to this point at all.
> 
> Yeah I think we should keep it in the arch code at roughly the top.

I agree. It's a nice idea to consolidate the code, but I don't see that
it's really possible for PERF_COUNT_SW_PAGE_FAULTS without significantly
changing the semantics (and a potentially less useful way. Of course,
moving more of do_page_fault() out of the arch code would be great, but
that's a much bigger effort.

> If it's moved to the end you could have a process spinning taking bad
> page faults (and fixing them up), and see no sign of it from the perf
> page fault counters.

The current arm64 behaviour is that we record PERF_COUNT_SW_PAGE_FAULTS
if _all_ of the following are true:

  1. The fault isn't handled by kprobes
  2. The pagefault handler is enabled
  3. We have an mm (current->mm)
  4. The fault isn't an unexpected kernel fault on a user address (we oops
     and kill the task in this case)

Which loosely corresponds to "we took a fault on a user address that it
looks like we can handle".

That said, I'm happy to tweak this if it brings us into line with other
architectures.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/25] mm: Page fault accounting cleanups
  2020-06-17  8:04     ` Will Deacon
@ 2020-06-17 16:10       ` Peter Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2020-06-17 16:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrea Arcangeli, linux-arch, linux-s390, Michael Ellerman,
	Linux Kernel Mailing List, openrisc, Linux ARM, Catalin Marinas,
	Andrew Morton, Linus Torvalds, Alexander Gordeev,
	Gerald Schaefer

On Wed, Jun 17, 2020 at 09:04:06AM +0100, Will Deacon wrote:
> On Wed, Jun 17, 2020 at 10:55:14AM +1000, Michael Ellerman wrote:
> > Linus Torvalds <torvalds@linux-foundation.org> writes:
> > > On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@redhat.com> wrote:
> > >> This series tries to address all of them by introducing mm_fault_accounting()
> > >> first, so that we move all the page fault accounting into the common code base,
> > >> then call it properly from arch pf handlers just like handle_mm_fault().
> > >
> > > Hmm.
> > >
> > > So having looked at this a bit more, I'd actually like to go even
> > > further, and just get rid of the per-architecture code _entirely_.
> > 
> > <snip>
> > 
> > > One detail worth noting: I do wonder if we should put the
> > >
> > >     perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
> > >
> > > just in the arch code at the top of the fault handling, and consider
> > > it entirely unrelated to the major/minor fault handling. The
> > > major/minor faults fundamnetally are about successes. But the plain
> > > PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including
> > > things that never even get to this point at all.
> > 
> > Yeah I think we should keep it in the arch code at roughly the top.
> 
> I agree. It's a nice idea to consolidate the code, but I don't see that
> it's really possible for PERF_COUNT_SW_PAGE_FAULTS without significantly
> changing the semantics (and a potentially less useful way. Of course,
> moving more of do_page_fault() out of the arch code would be great, but
> that's a much bigger effort.
> 
> > If it's moved to the end you could have a process spinning taking bad
> > page faults (and fixing them up), and see no sign of it from the perf
> > page fault counters.
> 
> The current arm64 behaviour is that we record PERF_COUNT_SW_PAGE_FAULTS
> if _all_ of the following are true:
> 
>   1. The fault isn't handled by kprobes
>   2. The pagefault handler is enabled
>   3. We have an mm (current->mm)
>   4. The fault isn't an unexpected kernel fault on a user address (we oops
>      and kill the task in this case)
> 
> Which loosely corresponds to "we took a fault on a user address that it
> looks like we can handle".
> 
> That said, I'm happy to tweak this if it brings us into line with other
> architectures.

I see.  I'll keep the semantics for PERF_COUNT_SW_PAGE_FAULTS in the next
version.  Thanks for all the comments!

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-17 16:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200615221607.7764-1-peterx@redhat.com>
2020-06-15 22:15 ` [PATCH 05/25] mm/arm: Use mm_fault_accounting() Peter Xu
2020-06-15 22:15 ` [PATCH 06/25] mm/arm64: " Peter Xu
2020-06-16  7:43   ` Will Deacon
2020-06-16 15:59     ` Peter Xu
2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
2020-06-16 21:03   ` Peter Xu
2020-06-17  0:55   ` Michael Ellerman
2020-06-17  8:04     ` Will Deacon
2020-06-17 16:10       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).