All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Vasily Gorbik <gor@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Stafford Horne <shorne@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Johannes Berg <johannes@sipsolutions.net>,
	Brian Cain <bcain@quicinc.com>,
	x86@kernel.org, linux-parisc@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Richard Weinberger <richard@nod.at>,
	linux-hexagon@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Janosch Frank <frankja@linux.ibm.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-ia64@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Russell King <linux@armlinux.org.uk>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alistair Popple <apopple@nvidia.com>,
	Jonas Bonn <jonas@southpole.se>,
	sparclinux@vger.kernel.org, linux-csky@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-um@lists.infradead.org, Michal Simek <monstr@monstr.eu>,
	Matt Turner <mattst88@gmail.com>,
	linux-m68k@lists.linux-m68k.org,
	Paul Mackerras <paulus@samba.org>,
	linux-xtensa@linux-xtensa.org,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	David Hildenbrand <david@redhat.com>,
	openrisc@lists.librecores.org,
	linux-arm-kernel@lists.infradead.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Chris Zankel <chris@zankel.net>, Hugh Dickins <hughd@google.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Rich Felker <dalias@libc.org>, "H . Peter Anvin" <hpa@zytor.com>,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Max Filippov <jcmvbkbc@gmail.com>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-snps-arc@lists.infradead.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, Helge Deller <deller@gmx.de>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Vasily Gorbik <gor@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Stafford Horne <shorne@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Johannes Berg <johannes@sipsolutions.net>,
	Brian Cain <bcain@quicinc.com>,
	x86@kernel.org, linux-parisc@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Richard Weinberger <richard@nod.at>,
	linux-hexagon@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Janosch Frank <frankja@linux.ibm.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-ia64@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Russell King <linux@armlinux.org.uk>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alistair Popple <apopple@nvidia.com>,
	Jonas Bonn <jonas@southpole.se>,
	sparclinux@vger.kernel.org, linux-csky@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-um@lists.infradead.org, Michal Simek <monstr@monstr.eu>,
	Matt Turner <mattst88@gmail.com>,
	linux-m68k@lists.linux-m68k.org,
	Paul Mackerras <paulus@samba.org>,
	linux-xtensa@linux-xtensa.org,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	David Hildenbrand <david@redhat.com>,
	openrisc@lists.librecores.org,
	linux-arm-kernel@lists.infradead.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Chris Zankel <chris@zankel.net>, Hugh Dickins <hughd@google.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Rich Felker <dalias@libc.org>, "H . Peter Anvin" <hpa@zytor.com>,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Max Filippov <jcmvbkbc@gmail.com>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-snps-arc@lists.infradead.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, Helge Deller <deller@gmx.de>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Vasily Gorbik <gor@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Stafford Horne <shorne@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Johannes Berg <johannes@sipsolutions.net>,
	Brian Cain <bcain@quicinc.com>,
	x86@kernel.org, linux-parisc@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Richard Weinberger <richard@nod.at>,
	linux-hexagon@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Janosch Frank <frankja@linux.ibm.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-ia64@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Russell King <linux@armlinux.org.uk>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alistair Popple <apopple@nvidia.com>,
	Jonas Bonn <jonas@southpole.se>,
	sparclinux@vger.kernel.org, linux-csky@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-um@lists.infradead.org, Michal Simek <monstr@monstr.eu>,
	Matt Turner <mattst88@gmail.com>,
	linux-m68k@lists.linux-m68k.org,
	Paul Mackerras <paulus@samba.org>,
	linux-xtensa@linux-xtensa.org,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	David Hildenbrand <david@redhat.com>,
	openrisc@lists.librecores.org,
	linux-arm-kernel@lists.infradead.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Chris Zankel <chris@zankel.net>, Hugh Dickins <hughd@google.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Rich Felker <dalias@libc.org>, "H . Peter Anvin" <hpa@zytor.com>,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Max Filippov <jcmvbkbc@gmail.com>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-snps-arc@lists.infradead.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, Helge Deller <deller@gmx.de>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	linux-ia64@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	linux-mips@vger.kernel.org,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	linux-mm@kvack.org, Rich Felker <dalias@libc.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"H . Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Will Deacon <will@kernel.org>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org,
	Janosch Frank <frankja@linux.ibm.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-sh@vger.kernel.org, Richard Weinberger <richard@nod.at>,
	Helge Deller <deller@gmx.de>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Russell King <linux@armlinux.org.uk>,
	linux-csky@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	linux-alpha@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	Vineet Gupta <vgupta@kernel.org>,
	Matt Turner <mattst88@gmail.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonas Bonn <jonas@southpole.se>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Chris Zankel <chris@zankel.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-um@lists.infradead.org, Nicholas Piggin <npiggin@gmail.com>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Richard Henderson <rth@twiddle.net>,
	Brian Cain <bcain@quicinc.com>, Michal Simek <monstr@monstr.eu>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-parisc@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
	linux-kernel@vger.kernel.org, Dinh Nguyen <dinguyen@kernel.org>,
	linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Sven Schnelle <svens@linux.ibm.com>, Guo Ren <guoren@kernel.org>,
	linux-hexagon@vger.kernel.org,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Johannes Berg <johannes@sipsolutions.net>,
	linuxppc-dev@lists.ozlabs.org,
	"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	linux-ia64@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	linux-mips@vger.kernel.org,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	linux-mm@kvack.org, Rich Felker <dalias@libc.org>,
	Paul Mackerras <paulus@samba.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Will Deacon <will@kernel.org>, Stafford Horne <shorne@gmail.com>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org,
	Janosch Frank <frankja@linux.ibm.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	linux-sh@vger.kernel.org, Richard Weinberger <richard@nod.at>,
	Helge Deller <deller@gmx.de>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Russell King <linux@armlinux.org.uk>,
	linux-c sky@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	linux-alpha@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	linux-arm-kernel@lists.infradead.org,
	Vineet Gupta <vgupta@kernel.org>,
	Matt Turner <mattst88@gmail.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonas Bonn <jonas@southpole.se>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Chris Zankel <chris@zankel.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-um@lists.infradead.org, Nicholas Piggin <npiggin@gmail.com>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Richard Henderson < rth@twiddle.net>,
	Brian Cain <bcain@quicinc.com>,
	Mic
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

hal Simek <monstr@monstr.eu>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, linux-parisc@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>, linux-kernel@vger.kernel.org, Dinh Nguyen <dinguyen@kernel.org>, linux-riscv@lists.infradead.org, Palmer Dabbelt <palmer@dabbelt.com>, Sven Schnelle <svens@linux.ibm.com>, Guo Ren <guoren@kernel.org>, linux-hexagon@vger.kernel.org, Ivan Kokshaysky <ink@jurassic.park.msu.ru>, Johannes Berg <johannes@sipsolutions.net>, linuxppc-dev@lists.ozlabs.org, "David S . Miller" <davem@davemloft.net>
Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org
Sender: "Linuxppc-dev" <linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Vasily Gorbik <gor@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Stafford Horne <shorne@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Johannes Berg <johannes@sipsolutions.net>,
	Brian Cain <bcain@quicinc.com>,
	x86@kernel.org, linux-parisc@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Richard Weinberger <richard@nod.at>,
	linux-hexagon@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Janosch Frank <frankja@linux.ibm.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-ia64@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Russell King <linux@armlinux.org.uk>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alistair Popple <apopple@nvidia.com>,
	Jonas Bonn <jonas@southpole.se>,
	sparclinux@vger.kernel.org, linux-csky@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-um@lists.infradead.org, Michal Simek <monstr@monstr.eu>,
	Matt Turner <mattst88@gmail.com>,
	linux-m68k@lists.linux-m68k.org,
	Paul Mackerras <paulus@samba.org>,
	linux-xtensa@linux-xtensa.org,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	David Hildenbrand <david@redhat.com>,
	openrisc@lists.librecores.org,
	linux-arm-kernel@lists.infradead.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Chris Zankel <chris@zankel.net>, Hugh Dickins <hughd@google.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Rich Felker <dalias@libc.org>, "H . Peter Anvin" <hpa@zytor.com>,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Max Filippov <jcmvbkbc@gmail.com>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-snps-arc@lists.infradead.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, Helge Deller <deller@gmx.de>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 07:59:37 +0000	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Vasily Gorbik <gor@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Stafford Horne <shorne@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Johannes Berg <johannes@sipsolutions.net>,
	Brian Cain <bcain@quicinc.com>,
	x86@kernel.org, linux-parisc@vger.kernel.org,
	Richard Henderson <rth@twiddle.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Richard Weinberger <richard@nod.at>,
	linux-hexagon@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Janosch Frank <frankja@linux.ibm.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-ia64@vger.kernel.org
Subject: Re: [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types
Date: Tue, 31 May 2022 09:59:37 +0200	[thread overview]
Message-ID: <YpXK6VO8y6ZQlimG@osiris> (raw)
In-Reply-To: <20220530183450.42886-1-peterx@redhat.com>

On Mon, May 30, 2022 at 02:34:50PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page.  It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
> 
> Then after that throttling we return VM_FAULT_RETRY.
> 
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
> 
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
> 
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
> 
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
> 
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock.  It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
> 
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
> 
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)
> 
> I believe it could help more than that.
> 
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
> 
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
> 
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
> 
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Vineet Gupta <vgupta@kernel.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Acked-by: Max Filippov <jcmvbkbc@gmail.com>
> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
...
>  arch/s390/mm/fault.c          | 12 ++++++++++++
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..973dcd05c293 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -433,6 +433,17 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  			goto out_up;
>  		goto out;
>  	}
> +
> +	/* The fault is fully completed (including releasing mmap lock) */
> +	if (fault & VM_FAULT_COMPLETED) {
> +		if (gmap) {
> +			mmap_read_lock(mm);
> +			goto out_gmap;
> +		}
> +		fault = 0;
> +		goto out;
> +	}
> +
>  	if (unlikely(fault & VM_FAULT_ERROR))
>  		goto out_up;
>  
> @@ -452,6 +463,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>  		mmap_read_lock(mm);
>  		goto retry;
>  	}
> +out_gmap:
>  	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
>  		address =  __gmap_link(gmap, current->thread.gmap_addr,
>  				       address);

FWIW:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

  parent reply	other threads:[~2022-05-31  8:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30 18:34 [PATCH v5] mm: Avoid unnecessary page fault retires on shared memory types Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 18:34 ` Peter Xu
2022-05-30 22:26 ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-30 22:26   ` Russell King (Oracle)
2022-05-31  7:59 ` Heiko Carstens [this message]
2022-05-31  7:59   ` Heiko Carstens
2022-05-31  7:59   ` Heiko Carstens
2022-05-31  7:59   ` Heiko Carstens
2022-05-31  7:59   ` Heiko Carstens
2022-05-31  7:59   ` Heiko Carstens
2022-05-31  7:59   ` Heiko Carstens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpXK6VO8y6ZQlimG@osiris \
    --to=hca@linux.ibm.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=apopple@nvidia.com \
    --cc=bcain@quicinc.com \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chris@zankel.net \
    --cc=dalias@libc.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=dinguyen@kernel.org \
    --cc=frankja@linux.ibm.com \
    --cc=geert@linux-m68k.org \
    --cc=gor@linux.ibm.com \
    --cc=guoren@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=jonas@southpole.se \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linux-um@lists.infradead.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mattst88@gmail.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=monstr@monstr.eu \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=openrisc@lists.librecores.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulus@samba.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rth@twiddle.net \
    --cc=shorne@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=vgupta@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.