linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Leonardo Bras <leonardo@linux.ibm.com>
Cc: "Song Liu" <songliubraving@fb.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	"Keith Busch" <keith.busch@intel.com>,
	linux-mm@kvack.org, "Paul Mackerras" <paulus@samba.org>,
	"Christoph Lameter" <cl@linux.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Elena Reshetova" <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org,
	"Santosh Sivaraj" <santosh@fossix.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Mahesh Salgaonkar" <mahesh@linux.vnet.ibm.com>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Arnd Bergmann" <arnd@arndb.de>, "Jann Horn" <jannh@google.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	kvm-ppc@vger.kernel.org, "Thomas Gleixner" <tglx@linutronix.de>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, "Roman Gushchin" <guro@fb.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks
Date: Thu, 3 Oct 2019 13:51:41 +0200	[thread overview]
Message-ID: <20191003115141.GJ4581@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20191003071145.GM4536@hirez.programming.kicks-ass.net>

On Thu, Oct 03, 2019 at 09:11:45AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 02, 2019 at 10:33:15PM -0300, Leonardo Bras wrote:
> > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> > index 818691846c90..3043ea9812d5 100644
> > --- a/include/asm-generic/pgtable.h
> > +++ b/include/asm-generic/pgtable.h
> > @@ -1171,6 +1171,64 @@ static inline bool arch_has_pfn_modify_check(void)
> >  #endif
> >  #endif
> >  
> > +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
> > +static inline unsigned long begin_lockless_pgtbl_walk(struct mm_struct *mm)
> > +{
> > +	unsigned long irq_mask;
> > +
> > +	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING))
> > +		atomic_inc(&mm->lockless_pgtbl_walkers);
> 
> This will not work for file backed THP. Also, this is a fairly serious
> contention point all on its own.

Kiryl says we have tmpfs-thp, this would be broken vs that, as would
your (PowerPC) use of mm_cpumask() for that IPI.

> > +	/*
> > +	 * Interrupts must be disabled during the lockless page table walk.
> > +	 * That's because the deleting or splitting involves flushing TLBs,
> > +	 * which in turn issues interrupts, that will block when disabled.
> > +	 */
> > +	local_irq_save(irq_mask);
> > +
> > +	/*
> > +	 * This memory barrier pairs with any code that is either trying to
> > +	 * delete page tables, or split huge pages. Without this barrier,
> > +	 * the page tables could be read speculatively outside of interrupt
> > +	 * disabling.
> > +	 */
> > +	smp_mb();
> 
> I don't think this is something smp_mb() can guarantee. smp_mb() is
> defined to order memory accesses, in this case the store of the old
> flags vs whatever comes after this.
> 
> It cannot (in generic) order against completion of prior instructions,
> like clearing the interrupt enabled flags.
> 
> Possibly you want barrier_nospec().

I'm still really confused about this barrier. It just doesn't make
sense.

If an interrupt happens before the local_irq_disable()/save(), then it
will discard any and all speculation that would be in progress to handle
the exception.

If there isn't an interrupt (or it happens after disable) it is
irrelevant.

Specifically, that serialize-IPI thing wants to ensure in-progress
lookups are complete, and I can't find a scenario where
local_irq_disable/enable() needs additional help vs IPIs. The moment an
interrupt lands it kills speculation and forces things into
program-order.

Did you perhaps want something like:

	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) {
		atomic_inc(&foo);
		smp_mb__after_atomic();
	}

	...

	if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) {
		smp_mb__before_atomic();
		atomic_dec(&foo);
	}

To ensure everything happens inside of the increment?

And I still think all that wrong, you really shouldn't need to wait on
munmap().

  reply	other threads:[~2019-10-03 11:54 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03  1:33 [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks Leonardo Bras
2019-10-03  7:11   ` Peter Zijlstra
2019-10-03 11:51     ` Peter Zijlstra [this message]
2019-10-03 20:40       ` John Hubbard
2019-10-04 11:24         ` Peter Zijlstra
2019-10-03 21:24       ` Leonardo Bras
2019-10-04 11:28         ` Peter Zijlstra
2019-10-09 18:09           ` Leonardo Bras
2019-10-05  8:35       ` Aneesh Kumar K.V
2019-10-08 14:47         ` Kirill A. Shutemov
2019-10-03  1:33 ` [PATCH v5 02/11] powerpc/mm: Adds counting method " Leonardo Bras
2019-10-08 15:11   ` Christopher Lameter
2019-10-08 17:13     ` Leonardo Bras
2019-10-08 17:43       ` Christopher Lameter
2019-10-08 18:02         ` Leonardo Bras
2019-10-08 18:27           ` Christopher Lameter
2019-10-03  1:33 ` [PATCH v5 03/11] mm/gup: Applies counting method to monitor gup_pgd_range Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 04/11] powerpc/mce_power: Applies counting method to monitor lockless pgtbl walks Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 05/11] powerpc/perf: " Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 06/11] powerpc/mm/book3s64/hash: " Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 07/11] powerpc/kvm/e500: " Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 08/11] powerpc/kvm/book3s_hv: " Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 09/11] powerpc/kvm/book3s_64: " Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 10/11] mm/Kconfig: Adds config option to track lockless pagetable walks Leonardo Bras
2019-10-03  2:08   ` Qian Cai
2019-10-03 19:04     ` Leonardo Bras
2019-10-03 19:08       ` Leonardo Bras
2019-10-03  7:44   ` Peter Zijlstra
2019-10-03 20:40     ` Leonardo Bras
2019-10-03  1:33 ` [PATCH v5 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing Leonardo Bras
2019-10-03  7:29 ` [PATCH v5 00/11] Introduces new count-based method for tracking lockless pagetable walks Peter Zijlstra
2019-10-03 20:36   ` Leonardo Bras
2019-10-03 20:49     ` John Hubbard
2019-10-03 21:38       ` Leonardo Bras
2019-10-04 11:42     ` Peter Zijlstra
2019-10-04 12:57       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191003115141.GJ4581@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arbab@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=aryabinin@virtuozzo.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=brouer@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=elena.reshetova@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jrdr.linux@gmail.com \
    --cc=keith.busch@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=ldv@altlinux.org \
    --cc=leonardo@linux.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@linux.ibm.com \
    --cc=santosh@fossix.org \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).