linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
@ 2019-09-24 20:31 Leonardo Bras
  0 siblings, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2019-09-24 20:31 UTC (permalink / raw)
  To: jhubbard, linuxppc-dev, linux-kernel, linux-mm
  Cc: benh, paulus, mpe, arnd, aneesh.kumar, christophe.leroy, akpm,
	dan.j.williams, npiggin, mahesh, tglx, rfontana, ganeshgr,
	allison, gregkh, rppt, yuehaibing, ira.weiny, jgg, keith.busch

[-- Attachment #1: Type: text/plain, Size: 207 bytes --]

John Hubbard <jhubbard@nvidia.com> writes:
> Also, which tree do these patches apply to, please? 

I will send a v3 that applies directly over v5.3, and make sure to
include mm mailing list.

Thanks!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
       [not found]   ` <b2d47f4b-2bf2-20a8-2438-4fd3f9b08a63@nvidia.com>
@ 2019-09-23 20:58     ` Leonardo Bras
  0 siblings, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2019-09-23 20:58 UTC (permalink / raw)
  To: John Hubbard, linuxppc-dev, linux-kernel, Linux-MM
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Arnd Bergmann, Aneesh Kumar K.V, Christophe Leroy, Andrew Morton,
	Dan Williams, Nicholas Piggin, Mahesh Salgaonkar,
	Thomas Gleixner, Richard Fontana, Ganesh Goudar, Allison Randal,
	Greg Kroah-Hartman, Mike Rapoport, YueHaibing, Ira Weiny,
	Jason Gunthorpe, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

On Mon, 2019-09-23 at 13:51 -0700, John Hubbard wrote:
> Also, which tree do these patches apply to, please? 
> 
> thanks,

They should apply on top of v5.3 + one patch: 
https://patchwork.ozlabs.org/patch/1164925/

I was working on top of this patch, because I thought it would be
merged fast. But since I got no feedback, it was not merged and the
present patchset became broken. :(

But I will rebase v3 on top of plain v5.3.

Thanks,
Leonardo Bras




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
  2019-09-20 19:50 Leonardo Bras
  2019-09-20 19:56 ` Leonardo Bras
@ 2019-09-20 20:12 ` Leonardo Bras
       [not found]   ` <b2d47f4b-2bf2-20a8-2438-4fd3f9b08a63@nvidia.com>
  1 sibling, 1 reply; 5+ messages in thread
From: Leonardo Bras @ 2019-09-20 20:12 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Arnd Bergmann, Aneesh Kumar K.V, Christophe Leroy, Andrew Morton,
	Dan Williams, Nicholas Piggin, Mahesh Salgaonkar,
	Thomas Gleixner, Richard Fontana, Ganesh Goudar, Allison Randal,
	Greg Kroah-Hartman, Mike Rapoport, YueHaibing, Ira Weiny,
	Jason Gunthorpe, John Hubbard, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 3862 bytes --]

If a process (qemu) with a lot of CPUs (128) try to munmap() a large
chunk of memory (496GB) mapped with THP, it takes an average of 275
seconds, which can cause a lot of problems to the load (in qemu case,
the guest will lock for this time).

Trying to find the source of this bug, I found out most of this time is
spent on serialize_against_pte_lookup(). This function will take a lot
of time in smp_call_function_many() if there is more than a couple CPUs
running the user process. Since it has to happen to all THP mapped, it
will take a very long time for large amounts of memory.

By the docs, serialize_against_pte_lookup() is needed in order to avoid
pmd_t to pte_t casting inside find_current_mm_pte(), or any lockless
pagetable walk, to happen concurrently with THP splitting/collapsing.

It does so by calling a do_nothing() on each CPU in mm->cpu_bitmap[],
after interrupts are re-enabled. 
Since, interrupts are (usually) disabled during lockless pagetable
walk, and serialize_against_pte_lookup will only return after
interrupts are enabled, it is protected.

So, by what I could understand, if there is no lockless pagetable walk
running, there is no need to call serialize_against_pte_lookup().

So, to avoid the cost of running serialize_against_pte_lookup(), I
propose a counter that keeps track of how many find_current_mm_pte()
are currently running, and if there is none, just skip 
smp_call_function_many().

The related functions are:
start_lockless_pgtbl_walk(mm)
	Insert before starting any lockless pgtable walk
end_lockless_pgtbl_walk(mm)
	Insert after the end of any lockless pgtable walk
	(Mostly after the ptep is last used)
running_lockless_pgtbl_walk(mm)
	Returns the number of lockless pgtable walks running


On my workload (qemu), I could see munmap's time reduction from 275
seconds to 418ms.

> Leonardo Bras (11):
>   powerpc/mm: Adds counting method to monitor lockless pgtable walks
>   asm-generic/pgtable: Adds dummy functions to monitor lockless pgtable
>     walks
>   mm/gup: Applies counting method to monitor gup_pgd_range
>   powerpc/mce_power: Applies counting method to monitor lockless pgtbl
>     walks
>   powerpc/perf: Applies counting method to monitor lockless pgtbl walks
>   powerpc/mm/book3s64/hash: Applies counting method to monitor lockless
>     pgtbl walks
>   powerpc/kvm/e500: Applies counting method to monitor lockless pgtbl
>     walks
>   powerpc/kvm/book3s_hv: Applies counting method to monitor lockless
>     pgtbl walks
>   powerpc/kvm/book3s_64: Applies counting method to monitor lockless
>     pgtbl walks
>   powerpc/book3s_64: Enables counting method to monitor lockless pgtbl
>     walk
>   powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing
> 
>  arch/powerpc/include/asm/book3s/64/mmu.h     |  3 +++
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  5 +++++
>  arch/powerpc/kernel/mce_power.c              | 13 ++++++++++---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c          |  2 ++
>  arch/powerpc/kvm/book3s_64_mmu_radix.c       | 20 ++++++++++++++++++--
>  arch/powerpc/kvm/book3s_64_vio_hv.c          |  4 ++++
>  arch/powerpc/kvm/book3s_hv_nested.c          |  8 ++++++++
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c          |  9 ++++++++-
>  arch/powerpc/kvm/e500_mmu_host.c             |  4 ++++
>  arch/powerpc/mm/book3s64/hash_tlb.c          |  2 ++
>  arch/powerpc/mm/book3s64/hash_utils.c        |  7 +++++++
>  arch/powerpc/mm/book3s64/mmu_context.c       |  1 +
>  arch/powerpc/mm/book3s64/pgtable.c           | 20 +++++++++++++++++++-
>  arch/powerpc/perf/callchain.c                |  5 ++++-
>  include/asm-generic/pgtable.h                |  9 +++++++++
>  mm/gup.c                                     |  4 ++++
>  16 files changed, 108 insertions(+), 8 deletions(-)
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
  2019-09-20 19:50 Leonardo Bras
@ 2019-09-20 19:56 ` Leonardo Bras
  2019-09-20 20:12 ` Leonardo Bras
  1 sibling, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2019-09-20 19:56 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Arnd Bergmann, Aneesh Kumar K.V, Christophe Leroy, Andrew Morton,
	Dan Williams, Nicholas Piggin, Mahesh Salgaonkar,
	Thomas Gleixner, Richard Fontana, Ganesh Goudar, Allison Randal,
	Greg Kroah-Hartman, Mike Rapoport, YueHaibing, Ira Weiny,
	Jason Gunthorpe, John Hubbard, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 211 bytes --]

On Fri, 2019-09-20 at 16:50 -0300, Leonardo Bras wrote:
> *** BLURB HERE ***

Sorry, something gone terribly wrong with my cover letter.
I will try to find it and send here, or rewrite it.

Best regards,

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
@ 2019-09-20 19:50 Leonardo Bras
  2019-09-20 19:56 ` Leonardo Bras
  2019-09-20 20:12 ` Leonardo Bras
  0 siblings, 2 replies; 5+ messages in thread
From: Leonardo Bras @ 2019-09-20 19:50 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Leonardo Bras, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Arnd Bergmann, Aneesh Kumar K.V,
	Christophe Leroy, Andrew Morton, Dan Williams, Nicholas Piggin,
	Mahesh Salgaonkar, Thomas Gleixner, Richard Fontana,
	Ganesh Goudar, Allison Randal, Greg Kroah-Hartman, Mike Rapoport,
	YueHaibing, Ira Weiny, Jason Gunthorpe, John Hubbard,
	Keith Busch

*** BLURB HERE ***

Leonardo Bras (11):
  powerpc/mm: Adds counting method to monitor lockless pgtable walks
  asm-generic/pgtable: Adds dummy functions to monitor lockless pgtable
    walks
  mm/gup: Applies counting method to monitor gup_pgd_range
  powerpc/mce_power: Applies counting method to monitor lockless pgtbl
    walks
  powerpc/perf: Applies counting method to monitor lockless pgtbl walks
  powerpc/mm/book3s64/hash: Applies counting method to monitor lockless
    pgtbl walks
  powerpc/kvm/e500: Applies counting method to monitor lockless pgtbl
    walks
  powerpc/kvm/book3s_hv: Applies counting method to monitor lockless
    pgtbl walks
  powerpc/kvm/book3s_64: Applies counting method to monitor lockless
    pgtbl walks
  powerpc/book3s_64: Enables counting method to monitor lockless pgtbl
    walk
  powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing

 arch/powerpc/include/asm/book3s/64/mmu.h     |  3 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h |  5 +++++
 arch/powerpc/kernel/mce_power.c              | 13 ++++++++++---
 arch/powerpc/kvm/book3s_64_mmu_hv.c          |  2 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c       | 20 ++++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c          |  4 ++++
 arch/powerpc/kvm/book3s_hv_nested.c          |  8 ++++++++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c          |  9 ++++++++-
 arch/powerpc/kvm/e500_mmu_host.c             |  4 ++++
 arch/powerpc/mm/book3s64/hash_tlb.c          |  2 ++
 arch/powerpc/mm/book3s64/hash_utils.c        |  7 +++++++
 arch/powerpc/mm/book3s64/mmu_context.c       |  1 +
 arch/powerpc/mm/book3s64/pgtable.c           | 20 +++++++++++++++++++-
 arch/powerpc/perf/callchain.c                |  5 ++++-
 include/asm-generic/pgtable.h                |  9 +++++++++
 mm/gup.c                                     |  4 ++++
 16 files changed, 108 insertions(+), 8 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-24 20:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 20:31 [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls Leonardo Bras
  -- strict thread matches above, loose matches on Subject: below --
2019-09-20 19:50 Leonardo Bras
2019-09-20 19:56 ` Leonardo Bras
2019-09-20 20:12 ` Leonardo Bras
     [not found]   ` <b2d47f4b-2bf2-20a8-2438-4fd3f9b08a63@nvidia.com>
2019-09-23 20:58     ` Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).