archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <>
To: Andrea Arcangeli <>
Cc: Will Deacon <>, Jon Masters <>,
	Rafael Aquini <>,
	Mark Salter <>,,,
Subject: Re: [PATCH 2/2] arm64: tlb: skip tlbi broadcast for single threaded TLB flushes
Date: Mon, 10 Feb 2020 17:51:06 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hi Andrea,

Thanks for having a go at this.

On Mon, Feb 03, 2020 at 03:17:45PM -0500, Andrea Arcangeli wrote:
> With multiple NUMA nodes and multiple sockets, the tlbi broadcast
> shall be delivered through the interconnects in turn increasing the
> interconnect traffic and the latency of the tlbi broadcast instruction.

There are better ways to handle this in hardware (snoop filters), so
hopefully those vendors will eventually learn.

> After the local TLB flush this however means the ASID context goes out
> of sync in all CPUs except the local one. This can be tracked in the
> mm_cpumask(mm): if the bit is set it means the asid context is stale
> for that CPU. This results in an extra local ASID TLB flush only if a
> single threaded process is migrated to a different CPU and only after a
> TLB flush. No extra local TLB flush is needed for the common case of
> single threaded processes context scheduling within the same CPU and for
> multithreaded processes.

Relying om mm_users is not sufficient AFAICT. Let's say on CPU0 you have
a kernel thread running with the previous user pgd and ASID set in
ttbr0_el1. The mm_users would still be 1 since only mm_count is
incremented in context_switch(). If the user thread now runs on CPU1, a
local tlbi would only invalidate the TLBs on CPU1. However, CPU0 may
still walk (speculatively) the user page tables.

An example where this matters is a group of small pages converted to a
huge page. If CPU0 already has some TLB entries for small pages in the
group but, not being aware of a TLBI for the ptes in the range, may read
a block pmd entry (huge page) and we end up with a TLB conflict on CPU0
(CPU1 is fine since you do the local tlbi).

There are other examples where this could go wrong as the hardware may
keep intermediate pgtable entries in a walk cache. In the arm64 kernel
we rely on something the architecture calls break-before-make for any
page table updates and these need to be broadcast to other CPUs that may
potentially have an entry in their TLB.

It may be better if you used mm_cpumask to mark wherever an mm ever ran
than relying on mm_users.

> Skipping the tlbi instruction broadcasting is already implemented in
> local_flush_tlb_all(), this patch only extends it to flush_tlb_mm(),
> flush_tlb_range() and flush_tlb_page() too.
> Here's the result of 32 CPUs (ARMv8 Ampere) running mprotect at the same
> time from 32 single threaded processes before the patch:
>  Performance counter stats for './loop' (3 runs):
>                  0      dummy
>           2.121353 +- 0.000387 seconds time elapsed  ( +-  0.02% )
> and with the patch applied:
>  Performance counter stats for './loop' (3 runs):
>                  0      dummy
>          0.1197750 +- 0.0000827 seconds time elapsed  ( +-  0.07% )

That's a pretty artificial test and it is indeed improved by this patch.
However, it would be nice to have some real-world scenarios where this


  reply	other threads:[~2020-02-10 17:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 20:17 [RFC] [PATCH 0/2] arm64: tlb: skip tlbi broadcast for single threaded TLB flushes Andrea Arcangeli
2020-02-03 20:17 ` [PATCH 1/2] mm: use_mm: fix for arches checking mm_users to optimize " Andrea Arcangeli
2020-02-18 11:31   ` Michal Hocko
2020-02-18 18:56     ` Andrea Arcangeli
2020-02-19 11:58       ` Michal Hocko
2020-02-19 20:04         ` Andrea Arcangeli
2020-02-03 20:17 ` [PATCH 2/2] arm64: tlb: skip tlbi broadcast for single threaded " Andrea Arcangeli
2020-02-10 17:51   ` Catalin Marinas [this message]
2020-02-10 20:14     ` Andrea Arcangeli
2020-02-11 14:00       ` Catalin Marinas
2020-02-12 12:57         ` Andrea Arcangeli
2020-02-12 14:13   ` qi.fuli
2020-02-12 15:26     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).