linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Will Deacon <will@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Linux-MM <linux-mm@kvack.org>, Anton Blanchard <anton@ozlabs.org>
Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
Date: Tue, 1 Dec 2020 13:50:38 -0800	[thread overview]
Message-ID: <CALCETrVP3qAQ50yHU-AzZQsiRB9JGO5FQf91kuk7DCvNY51EXQ@mail.gmail.com> (raw)
In-Reply-To: <20201201212758.GA28300@willie-the-truck>

On Tue, Dec 1, 2020 at 1:28 PM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> > other arch folk: there's some background here:
> >
> > https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
> >
> > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > >
> > > > > On big systems, the mm refcount can become highly contented when doing
> > > > > a lot of context switching with threaded applications (particularly
> > > > > switching between the idle thread and an application thread).
> > > > >
> > > > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > > > any remaining lazy ones.
> > > > >
> > > > > Shootdown IPIs are some concern, but they have not been observed to be
> > > > > a big problem with this scheme (the powerpc implementation generated
> > > > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > > > There are a number of strategies that could be employed to reduce IPIs
> > > > > if they turn out to be a problem for some workload.
> > > >
> > > > I'm still wondering whether we can do even better.
> > > >
> > >
> > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> > > the TLB.  On x86, this will shoot down all lazies as long as even a
> > > single pagetable was freed.  (Or at least it will if we don't have a
> > > serious bug, but the code seems okay.  We'll hit pmd_free_tlb, which
> > > sets tlb->freed_tables, which will trigger the IPI.)  So, on
> > > architectures like x86, the shootdown approach should be free.  The
> > > only way it ought to have any excess IPIs is if we have CPUs in
> > > mm_cpumask() that don't need IPI to free pagetables, which could
> > > happen on paravirt.
> >
> > Indeed, on x86, we do this:
> >
> > [   11.558844]  flush_tlb_mm_range.cold+0x18/0x1d
> > [   11.559905]  tlb_finish_mmu+0x10e/0x1a0
> > [   11.561068]  exit_mmap+0xc8/0x1a0
> > [   11.561932]  mmput+0x29/0xd0
> > [   11.562688]  do_exit+0x316/0xa90
> > [   11.563588]  do_group_exit+0x34/0xb0
> > [   11.564476]  __x64_sys_exit_group+0xf/0x10
> > [   11.565512]  do_syscall_64+0x34/0x50
> >
> > and we have info->freed_tables set.
> >
> > What are the architectures that have large systems like?
> >
> > x86: we already zap lazies, so it should cost basically nothing to do
> > a little loop at the end of __mmput() to make sure that no lazies are
> > left.  If we care about paravirt performance, we could implement one
> > of the optimizations I mentioned above to fix up the refcounts instead
> > of sending an IPI to any remaining lazies.
> >
> > arm64: AFAICT arm64's flush uses magic arm64 hardware support for
> > remote flushes, so any lazy mm references will still exist after
> > exit_mmap().  (arm64 uses lazy TLB, right?)  So this is kind of like
> > the x86 paravirt case.  Are there large enough arm64 systems that any
> > of this matters?
>
> Yes, there are large arm64 systems where performance of TLB invalidation
> matters, but they're either niche (supercomputers) or not readily available
> (NUMA boxes).
>
> But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after
> freeing the page-tables. We have an optimisation to avoid flushing if
> we're just unmapping leaf entries when the mm is going away, but we don't
> have a choice once we get to actually reclaiming the page-tables.
>
> One thing I probably should mention, though, is that we don't maintain
> mm_cpumask() because we're not able to benefit from it and the atomic
> update is a waste of time.

Do you do anything special for lazy TLB or do you just use the generic
code?  (i.e. where do your user pagetables point when you go from a
user task to idle or to a kernel thread?)

Do you end up with all cpus set in mm_cpumask or can you have the mm
loaded on a CPU that isn't in mm_cpumask?

--Andy

>
> Will

  reply	other threads:[~2020-12-01 21:51 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 16:01 [PATCH 0/8] shoot lazy tlbs Nicholas Piggin
2020-11-28 16:01 ` [PATCH 1/8] lazy tlb: introduce exit_lazy_tlb Nicholas Piggin
2020-11-29  0:38   ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin
2020-11-28 17:55   ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-12-03  5:09       ` Andy Lutomirski
2020-12-05  8:00         ` Nicholas Piggin
2020-12-05 16:11           ` Andy Lutomirski
2020-12-05 23:14             ` Nicholas Piggin
2020-12-06  0:36               ` Andy Lutomirski
2020-12-06  3:59                 ` Nicholas Piggin
2020-12-11  0:11                   ` Andy Lutomirski
2020-12-14  4:07                     ` Nicholas Piggin
2020-12-14  5:53                       ` Nicholas Piggin
2020-11-30 14:57   ` Mathieu Desnoyers
2020-11-28 16:01 ` [PATCH 3/8] x86: remove ARCH_HAS_SYNC_CORE_BEFORE_USERMODE Nicholas Piggin
2020-11-28 16:01 ` [PATCH 4/8] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2020-11-28 16:01 ` [PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin
2020-11-29  0:36   ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2020-11-29  3:54   ` Andy Lutomirski
2020-11-29 20:16     ` Andy Lutomirski
2020-11-30  9:25       ` Peter Zijlstra
2020-11-30 18:31       ` Andy Lutomirski
2020-12-01 21:27         ` Will Deacon
2020-12-01 21:50           ` Andy Lutomirski [this message]
2020-12-01 23:04             ` Will Deacon
2020-12-02  3:47         ` Nicholas Piggin
2020-12-03  5:05           ` Andy Lutomirski
2020-12-03 17:03         ` Alexander Gordeev
2020-12-03 17:14           ` Andy Lutomirski
2020-12-03 18:33             ` Alexander Gordeev
2020-11-30  9:26     ` Peter Zijlstra
2020-11-30  9:30     ` Peter Zijlstra
2020-11-30  9:34       ` Peter Zijlstra
2020-12-02  3:09     ` Nicholas Piggin
2020-12-02 11:17   ` Peter Zijlstra
2020-12-02 12:45     ` Peter Zijlstra
2020-12-02 14:19   ` Peter Zijlstra
2020-12-02 14:38     ` Andy Lutomirski
2020-12-02 16:29       ` Peter Zijlstra
2020-11-28 16:01 ` [PATCH 7/8] powerpc: use lazy mm refcount helper functions Nicholas Piggin
2020-11-28 16:01 ` [PATCH 8/8] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVP3qAQ50yHU-AzZQsiRB9JGO5FQf91kuk7DCvNY51EXQ@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).