linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks
       [not found] ` <20191003072952.GN4536@hirez.programming.kicks-ass.net>
@ 2019-10-03 20:36   ` Leonardo Bras
       [not found]     ` <99754d82-33c5-a54f-8607-b6bf151069d4@nvidia.com>
  2019-10-04 11:42     ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Leonardo Bras @ 2019-10-03 20:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

[-- Attachment #1: Type: text/plain, Size: 4020 bytes --]

Hello Peter, thanks for the feedback!

On Thu, 2019-10-03 at 09:29 +0200, Peter Zijlstra wrote:
> On Wed, Oct 02, 2019 at 10:33:14PM -0300, Leonardo Bras wrote:
> > If a process (qemu) with a lot of CPUs (128) try to munmap() a large
> > chunk of memory (496GB) mapped with THP, it takes an average of 275
> > seconds, which can cause a lot of problems to the load (in qemu case,
> > the guest will lock for this time).
> > 
> > Trying to find the source of this bug, I found out most of this time is
> > spent on serialize_against_pte_lookup(). This function will take a lot
> > of time in smp_call_function_many() if there is more than a couple CPUs
> > running the user process. Since it has to happen to all THP mapped, it
> > will take a very long time for large amounts of memory.
> > 
> > By the docs, serialize_against_pte_lookup() is needed in order to avoid
> > pmd_t to pte_t casting inside find_current_mm_pte(), or any lockless
> > pagetable walk, to happen concurrently with THP splitting/collapsing.
> > 
> > It does so by calling a do_nothing() on each CPU in mm->cpu_bitmap[],
> > after interrupts are re-enabled.
> > Since, interrupts are (usually) disabled during lockless pagetable
> > walk, and serialize_against_pte_lookup will only return after
> > interrupts are enabled, it is protected.
> 
> This is something entirely specific to Power, you shouldn't be touching
> generic code at all.

Up to v4, I was declaring dummy functions so it would not mess up with
other archs: http://patchwork.ozlabs.org/patch/1168779/

But I was recommended to create a generic function that could guide the
way to archs: http://patchwork.ozlabs.org/patch/1168775/

The idea was to concentrate all routines of beginning/ending lockless
pagetable walks on these functions, and call them instead of
irq_disable/irq_enable.
Then it was easy to place the refcount-based tracking in these
functions. It should only be enabled in case the config chooses to do
so. 

> 
> Also, I'm not sure I understand things properly.
> 
> So serialize_against_pte_lookup() wants to wait for all currently
> out-standing __find_linux_pte() instances (which are very similar to
> gup_fast).
> 
> It seems to want to do this before flushing the THP TLB for some reason;
> why? Should not THP observe the normal page table freeing rules which
> includes a RCU-like grace period like this already.
> 
> Why is THP special here? This doesn't seem adequately explained.

"It's necessary to monitor lockless pagetable walks, in order to avoid
doing THP splitting/collapsing during them."

If a there is a THP split/collapse during the lockless pagetable walk,
the returned ptep can be a pointing to an invalid pte. 

To avoid that, the pmd is updated, then serialize_against_pte_lookup is
ran. Serialize runs a do_nothing in all cpu in cpu_mask. 

So, after all cpus finish running do_nothing(), there is a guarantee
that if there is any 'lockless pagetable walk' it is running on top of
a updated version of this pmd, and so, collapsing/splitting THP is
safe.

> 
> Also, specifically to munmap(), this seems entirely superfluous,
> munmap() uses the normal page-table freeing code and should be entirely
> fine without additional waiting.

To be honest, I remember it being needed in munmap case, but I really
don't remember the details. I will take a deeper look and come back
with this answer. 

> Furthermore, Power never accurately tracks mm_cpumask(), so using that
> makes the whole thing more expensive than it needs to be. Also, I
> suppose that is buggered vs file backed THP.

That accuracy of mm_cpumask is above my knowledge right now. =)

I agree that it's to expensive to do that. That's why I suggested this
method, that can check if there is any 'lockless pagetable walk'
running before trying to serialize. It reduced the waiting time a lot
for large amounts of memory. (more details on cover letter)

Best regards,

Leonardo Brás

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 10/11] mm/Kconfig: Adds config option to track lockless pagetable walks
       [not found]   ` <20191003074432.GO4536@hirez.programming.kicks-ass.net>
@ 2019-10-03 20:40     ` Leonardo Bras
  0 siblings, 0 replies; 8+ messages in thread
From: Leonardo Bras @ 2019-10-03 20:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

[-- Attachment #1: Type: text/plain, Size: 226 bytes --]

On Thu, 2019-10-03 at 09:44 +0200, Peter Zijlstra wrote:
> This shouldn't be a user visible option at all. Either the arch needs
> it and selects it or not.

You are right. I will do that on v6.
Thanks for the feedback!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks
       [not found]     ` <20191003115141.GJ4581@hirez.programming.kicks-ass.net>
@ 2019-10-03 21:24       ` Leonardo Bras
  2019-10-04 11:28         ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Leonardo Bras @ 2019-10-03 21:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

[-- Attachment #1: Type: text/plain, Size: 3835 bytes --]

Hello Peter, thanks for the feedback!

On Thu, 2019-10-03 at 13:51 +0200, Peter Zijlstra wrote:
> On Thu, Oct 03, 2019 at 09:11:45AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 02, 2019 at 10:33:15PM -0300, Leonardo Bras wrote:
> > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> > > index 818691846c90..3043ea9812d5 100644
> > > --- a/include/asm-generic/pgtable.h
> > > +++ b/include/asm-generic/pgtable.h
> > > @@ -1171,6 +1171,64 @@ static inline bool arch_has_pfn_modify_check(void)
> > >  #endif
> > >  #endif
> > >  
> > > +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
> > > +static inline unsigned long begin_lockless_pgtbl_walk(struct mm_struct *mm)
> > > +{
> > > +	unsigned long irq_mask;
> > > +
> > > +	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING))
> > > +		atomic_inc(&mm->lockless_pgtbl_walkers);
> > 
> > This will not work for file backed THP. Also, this is a fairly serious
> > contention point all on its own.
> 
> Kiryl says we have tmpfs-thp, this would be broken vs that, as would
> your (PowerPC) use of mm_cpumask() for that IPI.

Could you please explain it?
I mean, why this breaks tmpfs-thp?

Also, why mm_cpumask() is also broken?

> 
> > > +	/*
> > > +	 * Interrupts must be disabled during the lockless page table walk.
> > > +	 * That's because the deleting or splitting involves flushing TLBs,
> > > +	 * which in turn issues interrupts, that will block when disabled.
> > > +	 */
> > > +	local_irq_save(irq_mask);
> > > +
> > > +	/*
> > > +	 * This memory barrier pairs with any code that is either trying to
> > > +	 * delete page tables, or split huge pages. Without this barrier,
> > > +	 * the page tables could be read speculatively outside of interrupt
> > > +	 * disabling.
> > > +	 */
> > > +	smp_mb();
> > 
> > I don't think this is something smp_mb() can guarantee. smp_mb() is
> > defined to order memory accesses, in this case the store of the old
> > flags vs whatever comes after this.
> > 
> > It cannot (in generic) order against completion of prior instructions,
> > like clearing the interrupt enabled flags.
> > 
> > Possibly you want barrier_nospec().
> 
> I'm still really confused about this barrier. It just doesn't make
> sense.
> 
> If an interrupt happens before the local_irq_disable()/save(), then it
> will discard any and all speculation that would be in progress to handle
> the exception.
> 
> If there isn't an interrupt (or it happens after disable) it is
> irrelevant.
> 
> Specifically, that serialize-IPI thing wants to ensure in-progress
> lookups are complete, and I can't find a scenario where
> local_irq_disable/enable() needs additional help vs IPIs. The moment an
> interrupt lands it kills speculation and forces things into
> program-order.
> 
> Did you perhaps want something like:
> 
> 	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) {
> 		atomic_inc(&foo);
> 		smp_mb__after_atomic();
> 	}
> 
> 	...
> 
> 	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) {
> 		smp_mb__before_atomic();
> 		atomic_dec(&foo);
> 	}
> 
> To ensure everything happens inside of the increment?
> 

I need to rethink this barrier, but yes. I think that's it. 
It's how it was on v4.

I have changed it because I thought it would be better this way. Well,
it was probably a mistake of my part.

> And I still think all that wrong, you really shouldn't need to wait on
> munmap().

That is something I need to better understand. I mean, before coming
with this patch, I thought exactly this: not serialize when on munmap. 

But on the way I was convinced it would not work on munmap. I need to
recall why, and if it was false to assume this, re-think the whole
solution.

Best regards,

Leonardo Brás

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks
       [not found]     ` <99754d82-33c5-a54f-8607-b6bf151069d4@nvidia.com>
@ 2019-10-03 21:38       ` Leonardo Bras
  0 siblings, 0 replies; 8+ messages in thread
From: Leonardo Bras @ 2019-10-03 21:38 UTC (permalink / raw)
  To: John Hubbard, Peter Zijlstra
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, Jesper Dangaard Brouer,
	Nicholas Piggin, Jérôme Glisse, Mathieu Desnoyers,
	kvm-ppc, Thomas Gleixner, Reza Arbab, Allison Randal,
	Christian Brauner, Greg Kroah-Hartman, linux-kernel,
	Logan Gunthorpe, Souptick Joarder, Andrew Morton, linuxppc-dev,
	Roman Gushchin, Kirill A. Shutemov, Al Viro

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Thu, 2019-10-03 at 13:49 -0700, John Hubbard wrote:
> Yes. And to clarify, I was assuming that the changes to mm/gup.c were 
> required in order to accomplish your goals. Given that assumption, I 
> wanted the generic code to be "proper", and that's what that feedback
> is about.

You assumed right. On my counting approach it's necessary count all
'lockless pagetable walks', including the ones in generic code.

And, I think even without the counting approach, it was a good way
focus this 'lockless pagetable walk' routine in a single place.

> 
> Although the other questions about file-backed THP
> make it sound like some rethinking across the board is required now.

Yeap, I need to better understand how the file-backed THP problem
works.

Thanks,

Leonardo Brás

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks
  2019-10-03 21:24       ` [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks Leonardo Bras
@ 2019-10-04 11:28         ` Peter Zijlstra
  2019-10-09 18:09           ` Leonardo Bras
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-10-04 11:28 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

On Thu, Oct 03, 2019 at 06:24:07PM -0300, Leonardo Bras wrote:
> Hello Peter, thanks for the feedback!
> 
> On Thu, 2019-10-03 at 13:51 +0200, Peter Zijlstra wrote:
> > On Thu, Oct 03, 2019 at 09:11:45AM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 02, 2019 at 10:33:15PM -0300, Leonardo Bras wrote:
> > > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> > > > index 818691846c90..3043ea9812d5 100644
> > > > --- a/include/asm-generic/pgtable.h
> > > > +++ b/include/asm-generic/pgtable.h
> > > > @@ -1171,6 +1171,64 @@ static inline bool arch_has_pfn_modify_check(void)
> > > >  #endif
> > > >  #endif
> > > >  
> > > > +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
> > > > +static inline unsigned long begin_lockless_pgtbl_walk(struct mm_struct *mm)
> > > > +{
> > > > +	unsigned long irq_mask;
> > > > +
> > > > +	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING))
> > > > +		atomic_inc(&mm->lockless_pgtbl_walkers);
> > > 
> > > This will not work for file backed THP. Also, this is a fairly serious
> > > contention point all on its own.
> > 
> > Kiryl says we have tmpfs-thp, this would be broken vs that, as would
> > your (PowerPC) use of mm_cpumask() for that IPI.
> 
> Could you please explain it?
> I mean, why this breaks tmpfs-thp?
> Also, why mm_cpumask() is also broken?

Because shared pages are not bound by a mm; or does it not share the thp
state between mappings?

> > And I still think all that wrong, you really shouldn't need to wait on
> > munmap().
> 
> That is something I need to better understand. I mean, before coming
> with this patch, I thought exactly this: not serialize when on munmap. 
> 
> But on the way I was convinced it would not work on munmap. I need to
> recall why, and if it was false to assume this, re-think the whole
> solution.

And once you (re)figure it out, please write it down. It is a crucial
bit of the puzzle and needs to be part of the Changelogs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks
  2019-10-03 20:36   ` [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks Leonardo Bras
       [not found]     ` <99754d82-33c5-a54f-8607-b6bf151069d4@nvidia.com>
@ 2019-10-04 11:42     ` Peter Zijlstra
  2019-10-04 12:57       ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-10-04 11:42 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

On Thu, Oct 03, 2019 at 05:36:31PM -0300, Leonardo Bras wrote:

> > Also, I'm not sure I understand things properly.
> > 
> > So serialize_against_pte_lookup() wants to wait for all currently
> > out-standing __find_linux_pte() instances (which are very similar to
> > gup_fast).
> > 
> > It seems to want to do this before flushing the THP TLB for some reason;
> > why? Should not THP observe the normal page table freeing rules which
> > includes a RCU-like grace period like this already.
> > 
> > Why is THP special here? This doesn't seem adequately explained.
> 
> "It's necessary to monitor lockless pagetable walks, in order to avoid
> doing THP splitting/collapsing during them."
> 
> If a there is a THP split/collapse during the lockless pagetable walk,
> the returned ptep can be a pointing to an invalid pte. 

So the whole premise of lockless page-table walks (gup_fast) is that it
can work on in-flux page-tables. Specifically gup_fast() never returns
PTEs, only struct page *, and only if it can increment the page
refcount.

In order to enable this, page-table pages are RCU(-like) freed, such
that even if we access page-tables that have (concurrently) been
unlinked, we'll not UaF (see asm-generic/tlb.h, the comment at
HAVE_RCU_TABLE_FREE). IOW, the worst case if not getting a struct page
*.

I really don't see how THP splitting/collapsing is special here, either
we see the PMD and find a struct page * or we see a PTE and find the
same struct page * (compound page head).

The only thing that needs to be guaranteed is that both PTEs and PMD
page-tables are valid. Is this not so?

> To avoid that, the pmd is updated, then serialize_against_pte_lookup is
> ran. Serialize runs a do_nothing in all cpu in cpu_mask. 
> 
> So, after all cpus finish running do_nothing(), there is a guarantee
> that if there is any 'lockless pagetable walk' it is running on top of
> a updated version of this pmd, and so, collapsing/splitting THP is
> safe.

But why would it matter?! It would find the same struct page * through
either version of the page-tables. *confused*

> > Also, specifically to munmap(), this seems entirely superfluous,
> > munmap() uses the normal page-table freeing code and should be entirely
> > fine without additional waiting.
> 
> To be honest, I remember it being needed in munmap case, but I really
> don't remember the details. I will take a deeper look and come back
> with this answer. 

munmap does normal mmu_gather page-table teardown, the THP PMD should be
RCU-like freed just like any other PMD. Which should be perfectly safe
vs lockless page-table walks.

If you can find anything there that isn't right, please explain that in
detail and we'll need to look hard at fixing _that_.

> > Furthermore, Power never accurately tracks mm_cpumask(), so using that
> > makes the whole thing more expensive than it needs to be. Also, I
> > suppose that is buggered vs file backed THP.
> 
> That accuracy of mm_cpumask is above my knowledge right now. =)

Basically PowerPC only ever sets bits in there, unlike x86 that also
clears bits (at expense, but it's worth it for us).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks
  2019-10-04 11:42     ` Peter Zijlstra
@ 2019-10-04 12:57       ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-10-04 12:57 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

On Fri, Oct 04, 2019 at 01:42:36PM +0200, Peter Zijlstra wrote:
> If you can find anything there that isn't right, please explain that in
> detail and we'll need to look hard at fixing _that_.

Also, I can't imagine Nick is happy with 128 CPUs banging on that atomic
counter, esp. since atomics are horrifically slow on Power.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks
  2019-10-04 11:28         ` Peter Zijlstra
@ 2019-10-09 18:09           ` Leonardo Bras
  0 siblings, 0 replies; 8+ messages in thread
From: Leonardo Bras @ 2019-10-09 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, Michal Hocko, Dmitry V. Levin, Keith Busch, linux-mm,
	Paul Mackerras, Christoph Lameter, Ira Weiny, Dan Williams,
	Elena Reshetova, linux-arch, Santosh Sivaraj, Davidlohr Bueso,
	Aneesh Kumar K.V, Bartlomiej Zolnierkiewicz, Mike Rapoport,
	Jason Gunthorpe, Vlastimil Babka, Mahesh Salgaonkar,
	Andrey Ryabinin, Alexey Dobriyan, Ingo Molnar, Andrea Arcangeli,
	Ralph Campbell, Arnd Bergmann, Jann Horn, John Hubbard,
	Jesper Dangaard Brouer, Nicholas Piggin, Jérôme Glisse,
	Mathieu Desnoyers, kvm-ppc, Thomas Gleixner, Reza Arbab,
	Allison Randal, Christian Brauner, Greg Kroah-Hartman,
	linux-kernel, Logan Gunthorpe, Souptick Joarder, Andrew Morton,
	linuxppc-dev, Roman Gushchin, Kirill A. Shutemov, Al Viro

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On Fri, 2019-10-04 at 13:28 +0200, Peter Zijlstra wrote:
> > Could you please explain it?
> > I mean, why this breaks tmpfs-thp?
> > Also, why mm_cpumask() is also broken?
> 
> Because shared pages are not bound by a mm; or does it not share the thp
> state between mappings?

By what I could understand, even though the memory is shared, the
mapping may differ for different processes (i.e. the same physical
memory that is mapped as a hugepage in process A can be mapped as a lot
of smallpages in process B).

Did I miss something here?  

> And once you (re)figure it out, please write it down. It is a crucial
> bit of the puzzle and needs to be part of the Changelogs.

I am still investing time studying this. More on this later :)

Thanks!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-09 18:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191003013325.2614-1-leonardo@linux.ibm.com>
     [not found] ` <20191003072952.GN4536@hirez.programming.kicks-ass.net>
2019-10-03 20:36   ` [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks Leonardo Bras
     [not found]     ` <99754d82-33c5-a54f-8607-b6bf151069d4@nvidia.com>
2019-10-03 21:38       ` Leonardo Bras
2019-10-04 11:42     ` Peter Zijlstra
2019-10-04 12:57       ` Peter Zijlstra
     [not found] ` <20191003013325.2614-11-leonardo@linux.ibm.com>
     [not found]   ` <20191003074432.GO4536@hirez.programming.kicks-ass.net>
2019-10-03 20:40     ` [PATCH v5 10/11] mm/Kconfig: Adds config option to track " Leonardo Bras
     [not found] ` <20191003013325.2614-2-leonardo@linux.ibm.com>
     [not found]   ` <20191003071145.GM4536@hirez.programming.kicks-ass.net>
     [not found]     ` <20191003115141.GJ4581@hirez.programming.kicks-ass.net>
2019-10-03 21:24       ` [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks Leonardo Bras
2019-10-04 11:28         ` Peter Zijlstra
2019-10-09 18:09           ` Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).