Linux-ARM-Kernel Archive on
 help / color / Atom feed
From: Will Deacon <>
To: Takao Indoh <>
Cc: QI Fuli <>,,,
	Catalin Marinas <>,
	Jonathan Corbet <>,,,
	Takao Indoh <>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
Date: Mon, 17 Jun 2019 18:03:28 +0100
Message-ID: <> (raw)
In-Reply-To: <>

Hi Takao,

[+Peter Z]

On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote:
> From: Takao Indoh <>
> I found a performance issue related on the implementation of Linux's TLB
> flush for arm64.
> When I run a single-threaded test program on moderate environment, it
> usually takes 39ms to finish its work. However, when I put a small
> apprication, which just calls mprotest() continuously, on one of sibling
> cores and run it simultaneously, the test program slows down significantly.
> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
> ThunderX1 and Fujitsu A64FX.

This is a problem for any applications that share hardware resources with
each other, so I don't think it's something we should be too concerned about
addressing unless there is a practical DoS scenario, which there doesn't
appear to be in this case. It may be that the real answer is "don't call
mprotect() in a loop".

> I suppose the root cause of this issue is the implementation of Linux's TLB
> flush for arm64, especially use of TLBI-is instruction which is a broadcast
> to all processor core on the system. In case of the above situation,
> TLBI-is is called by mprotect().

On the flip side, Linux is providing the hardware with enough information
not to broadcast to cores for which the remote TLBs don't have entries
allocated for the ASID being invalidated. I would say that the root cause
of the issue is that this filtering is not taking place.

> This is not a problem for small environment, but this causes a significant
> performance noise for large-scale HPC environment, which has more than
> thousand nodes with low latency interconnect.

If you have a system with over a thousand nodes, without snoop filtering
for DVM messages and you expect performance to scale in the face of tight
mprotect() loops then I think you have a problem irrespective of this patch.
What happens if somebody runs I-cache invalidation in a loop?

> To fix this problem, this patch adds new boot parameter
> 'disable_tlbflush_is'.  In the case of flush_tlb_mm() *without* this
> parameter, TLB entry is invalidated by __tlbi(aside1is, asid). By this
> instruction, all CPUs within the same inner shareable domain check if there
> are TLB entries which have this ASID, this causes performance noise. OTOH,
> when this new parameter is specified, TLB entry is invalidated by
> __tlbi(aside1, asid) only on the CPUs specified by mm_cpumask(mm).
> Therefore TLB flush is done on minimal CPUs and performance problem does
> not occur. Actually I confirm the performance problem is fixed by this
> patch.

Other than my comments above, my overall concern with this patch is that
it introduces divergent behaviour for our TLB invalidation flow, which is
undesirable from both maintainability and usability perspectives. If you
wish to change the code, please don't put it behind a command-line option,
but instead improve the code that is already there. However, I suspect that
blowing away the local TLB on every context-switch may have hidden costs
which are only apparent with workloads different from the contrived case
that you're seeking to improve. You also haven't taken into account the
effects of virtualisation, where it's likely that the hypervisor will
upgrade non-shareable operations to inner-shareable ones anyway.



linux-arm-kernel mailing list

  parent reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 14:32 Takao Indoh
2019-06-17 14:32 ` [PATCH 1/2] arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm: kill mm_cpumask usage")) Takao Indoh
2019-07-23 11:55   ` Catalin Marinas
2019-06-17 14:32 ` [PATCH 2/2] arm64: tlb: Add boot parameter to disable TLB flush within the same inner shareable domain Takao Indoh
2019-07-23 12:11   ` Catalin Marinas
2019-06-17 17:03 ` Will Deacon [this message]
2019-06-24 10:34   ` [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction " qi.fuli
2019-06-27 10:27     ` Will Deacon
2019-07-03  2:45       ` qi.fuli
2019-07-09  0:25         ` Jon Masters
2019-07-09  0:29           ` Jon Masters
2019-07-09  8:03             ` Will Deacon
2019-07-09  8:07         ` Will Deacon
2019-11-01  9:56 ` qi.fuli
2019-11-01 17:28   ` Will Deacon
2019-11-26 14:26     ` Matthias Brugger
2019-11-26 14:36       ` Will Deacon
2019-12-01 16:02     ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on

Archives are clonable:
	git clone --mirror linux-arm-kernel/git/0.git
	git clone --mirror linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ \
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone