All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Matthias Brugger <mbrugger@suse.com>
Cc: "qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>,
	"tokamoto@jp.fujitsu.com" <tokamoto@jp.fujitsu.com>,
	Jon Masters <jcm@jonmasters.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"maeda.naoaki@fujitsu.com" <maeda.naoaki@fujitsu.com>,
	"misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>,
	Itaru Kitayama <itaru.kitayama@gmail.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"indou.takao@fujitsu.com" <indou.takao@fujitsu.com>,
	Robert Richter <rrichter@marvell.com>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
Date: Tue, 26 Nov 2019 14:36:58 +0000	[thread overview]
Message-ID: <20191126143657.GA9395@willie-the-truck> (raw)
In-Reply-To: <adac2265-2e40-bc2f-a6e2-8d6013b9416c@suse.com>

On Tue, Nov 26, 2019 at 03:26:48PM +0100, Matthias Brugger wrote:
> On 01/11/2019 18:28, Will Deacon wrote:
> > On Fri, Nov 01, 2019 at 09:56:05AM +0000, qi.fuli@fujitsu.com wrote:
> >> First of all thanks for the comments for the patch.
> >>
> >> I'm still struggling with this problem to find out the solution.
> >> As a result of an investigation on this problem, after all, I think it 
> >> is necessary to improve TLB flush mechanism of the kernel to fix this 
> >> problem completely.
> >>
> >> So, I'd like to restart a discussion. At first, I summarize this problem 
> >> to recall what was the problem and then I want to discuss how to fix it.
> >>
> >> Summary of the problem:
> >> A few months ago I proposed patches to solve a performance problem due 
> >> to TLB flush.[1]
> >>
> >> A problem is that TLB flush on a core affects all other cores even if 
> >> all other cores do not need actual flush, and it causes performance 
> >> degradation.
> >>
> >> In this thread, I explained that:
> >> * I found a performance problem which is caused by TLBI-is instruction.
> >> * The problem occurs like this:
> >>   1) On a core, OS tries to flush TLB using TLBI-is instruction
> >>   2) TLBI-is instruction causes a broadcast to all other cores, and
> >>   each core received hard-wired signal
> >>   3) Each core check if there are TLB entries which have the specified 
> >> ASID/VA
> > 
> > For those following along at home, my understanding is that this "check"
> > effectively stalls the pipeline as though it is being performed in software.
> > 
> > Some questions:
> > 
> > Does this mean a malicious virtual machine can effectively DoS the system?
> > What about a malicious application calling mprotect()?
> > 
> > Do all broadcast TLBI instructions cause this expensive check, or are
> > some significantly slower than others?
> > 
> >>   4) This check causes performance degradation
> >> * We ran FWQ[2] and detected OS jitter due to this problem, this noise
> >>   is serious for HPC usage.
> >>
> >> The noise means here a difference between maximum time and minimum time 
> >> which the same work takes.
> >>
> >> How to fix:
> >> I think the cause is TLB flush by TLBI-is because the instruction 
> >> affects cores that are not related to its flush.
> > 
> > Does broadcast I-cache maintenance cause the same problem?
> > 
> >> So the previous patch I posted is
> >> * Use mm_cpumask in mm_struct to find appropriate CPUs for TLB flush
> >> * Exec TLBI instead of TLBI-is only to CPUs specified by mm_cpumask
> >>   (This is the same behavior as arm32 and x86)
> >>
> >> And after the discussion about this patch, I got the following comments.
> >> 1) This patch switches the behavior (original flush by TLBI-is and new 
> >> flush by TLBI) by boot parameter, this implementation is not acceptable 
> >> due to bad maintainability.
> >> 2) Even if this patch fixes this problem, it may cause another 
> >> performance problem.
> >>
> >> I'd like to start over the implementation by considering these points.
> >> For the second comment above, I will run a benchmark test to analyze the 
> >> impact on performance.
> >> Please let me know if there are other points I should take into 
> >> consideration.
> > 
> > I think it's worth bearing in mind that I have little sympathy for the
> > problem that you are seeing. As far as I can tell, you've done the
> > following:
> > 
> >   1. You designed a CPU micro-architecture that stalls whenever it receives
> >      a TLB invalidation request.
> > 
> >   2. You integrated said CPU design into a system where broadcast TLB
> >      invalidation is not filtered and therefore stalls every CPU every
> >      time that /any/ TLB invalidation is broadcast.
> > 
> >   3. You deployed a mixture of Linux and jitter-sensitive software on
> >      this system, and now you're failing to meet your performance
> >      requirements.
> > 
> > Have I got that right?
> > 
> > If so, given that your CPU design isn't widely available, nobody else
> > appears to have made this mistake and jitter hasn't been reported as an
> > issue for any other systems, it's very unlikely that we're going to make
> > invasive upstream kernel changes to support you. I'm sorry, but all I can
> > suggest is that you check that your micro-architecture and performance
> > requirements are aligned with the design of Linux *before* building another
> > machine like this in future.
> > 
> 
> I just wanted to note that the cover letter states that they have also seen this
> on Thunderx1 and Thunderx2.
> 
> Not sure about other machines, like the Huawei TaiShan 200 series.
> 
> What I want to say, it seems not to be something that only affects Fujitsu but
> also other vendors. So maybe we should consider adding an erratum like the one
> for the repeated TLBI on Qualcomm SoCs.

Careful here -- we're talking about a reported performance issue, not a
correctness one. The "repeated TLBI" sequence is very much a workaround for
the latter.

In the case of TX1/TX2, I can imagine the "let's sit in a loop of mprotect()
calls" scaling poorly, which is what the cover letter is referring to, but
that's not really a workload that we need to optimise for. However, the case
that Fujitsu are reporting seems to go beyond that because of the design of
their CPU micro-architecture, where even just a single TLB invalidation
message stalls all of the other CPUs in the system. I don't have any reason
to believe that particular problem affects other CPU designs.

Thanks,

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Matthias Brugger <mbrugger@suse.com>
Cc: "tokamoto@jp.fujitsu.com" <tokamoto@jp.fujitsu.com>,
	"qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>,
	Jon Masters <jcm@jonmasters.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"maeda.naoaki@fujitsu.com" <maeda.naoaki@fujitsu.com>,
	Robert Richter <rrichter@marvell.com>,
	"misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>,
	Itaru Kitayama <itaru.kitayama@gmail.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"indou.takao@fujitsu.com" <indou.takao@fujitsu.com>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
Date: Tue, 26 Nov 2019 14:36:58 +0000	[thread overview]
Message-ID: <20191126143657.GA9395@willie-the-truck> (raw)
In-Reply-To: <adac2265-2e40-bc2f-a6e2-8d6013b9416c@suse.com>

On Tue, Nov 26, 2019 at 03:26:48PM +0100, Matthias Brugger wrote:
> On 01/11/2019 18:28, Will Deacon wrote:
> > On Fri, Nov 01, 2019 at 09:56:05AM +0000, qi.fuli@fujitsu.com wrote:
> >> First of all thanks for the comments for the patch.
> >>
> >> I'm still struggling with this problem to find out the solution.
> >> As a result of an investigation on this problem, after all, I think it 
> >> is necessary to improve TLB flush mechanism of the kernel to fix this 
> >> problem completely.
> >>
> >> So, I'd like to restart a discussion. At first, I summarize this problem 
> >> to recall what was the problem and then I want to discuss how to fix it.
> >>
> >> Summary of the problem:
> >> A few months ago I proposed patches to solve a performance problem due 
> >> to TLB flush.[1]
> >>
> >> A problem is that TLB flush on a core affects all other cores even if 
> >> all other cores do not need actual flush, and it causes performance 
> >> degradation.
> >>
> >> In this thread, I explained that:
> >> * I found a performance problem which is caused by TLBI-is instruction.
> >> * The problem occurs like this:
> >>   1) On a core, OS tries to flush TLB using TLBI-is instruction
> >>   2) TLBI-is instruction causes a broadcast to all other cores, and
> >>   each core received hard-wired signal
> >>   3) Each core check if there are TLB entries which have the specified 
> >> ASID/VA
> > 
> > For those following along at home, my understanding is that this "check"
> > effectively stalls the pipeline as though it is being performed in software.
> > 
> > Some questions:
> > 
> > Does this mean a malicious virtual machine can effectively DoS the system?
> > What about a malicious application calling mprotect()?
> > 
> > Do all broadcast TLBI instructions cause this expensive check, or are
> > some significantly slower than others?
> > 
> >>   4) This check causes performance degradation
> >> * We ran FWQ[2] and detected OS jitter due to this problem, this noise
> >>   is serious for HPC usage.
> >>
> >> The noise means here a difference between maximum time and minimum time 
> >> which the same work takes.
> >>
> >> How to fix:
> >> I think the cause is TLB flush by TLBI-is because the instruction 
> >> affects cores that are not related to its flush.
> > 
> > Does broadcast I-cache maintenance cause the same problem?
> > 
> >> So the previous patch I posted is
> >> * Use mm_cpumask in mm_struct to find appropriate CPUs for TLB flush
> >> * Exec TLBI instead of TLBI-is only to CPUs specified by mm_cpumask
> >>   (This is the same behavior as arm32 and x86)
> >>
> >> And after the discussion about this patch, I got the following comments.
> >> 1) This patch switches the behavior (original flush by TLBI-is and new 
> >> flush by TLBI) by boot parameter, this implementation is not acceptable 
> >> due to bad maintainability.
> >> 2) Even if this patch fixes this problem, it may cause another 
> >> performance problem.
> >>
> >> I'd like to start over the implementation by considering these points.
> >> For the second comment above, I will run a benchmark test to analyze the 
> >> impact on performance.
> >> Please let me know if there are other points I should take into 
> >> consideration.
> > 
> > I think it's worth bearing in mind that I have little sympathy for the
> > problem that you are seeing. As far as I can tell, you've done the
> > following:
> > 
> >   1. You designed a CPU micro-architecture that stalls whenever it receives
> >      a TLB invalidation request.
> > 
> >   2. You integrated said CPU design into a system where broadcast TLB
> >      invalidation is not filtered and therefore stalls every CPU every
> >      time that /any/ TLB invalidation is broadcast.
> > 
> >   3. You deployed a mixture of Linux and jitter-sensitive software on
> >      this system, and now you're failing to meet your performance
> >      requirements.
> > 
> > Have I got that right?
> > 
> > If so, given that your CPU design isn't widely available, nobody else
> > appears to have made this mistake and jitter hasn't been reported as an
> > issue for any other systems, it's very unlikely that we're going to make
> > invasive upstream kernel changes to support you. I'm sorry, but all I can
> > suggest is that you check that your micro-architecture and performance
> > requirements are aligned with the design of Linux *before* building another
> > machine like this in future.
> > 
> 
> I just wanted to note that the cover letter states that they have also seen this
> on Thunderx1 and Thunderx2.
> 
> Not sure about other machines, like the Huawei TaiShan 200 series.
> 
> What I want to say, it seems not to be something that only affects Fujitsu but
> also other vendors. So maybe we should consider adding an erratum like the one
> for the repeated TLBI on Qualcomm SoCs.

Careful here -- we're talking about a reported performance issue, not a
correctness one. The "repeated TLBI" sequence is very much a workaround for
the latter.

In the case of TX1/TX2, I can imagine the "let's sit in a loop of mprotect()
calls" scaling poorly, which is what the cover letter is referring to, but
that's not really a workload that we need to optimise for. However, the case
that Fujitsu are reporting seems to go beyond that because of the design of
their CPU micro-architecture, where even just a single TLB invalidation
message stalls all of the other CPUs in the system. I don't have any reason
to believe that particular problem affects other CPU designs.

Thanks,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-11-26 14:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 14:32 [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain Takao Indoh
2019-06-17 14:32 ` Takao Indoh
2019-06-17 14:32 ` [PATCH 1/2] arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm: kill mm_cpumask usage")) Takao Indoh
2019-06-17 14:32   ` Takao Indoh
2019-07-23 11:55   ` Catalin Marinas
2019-07-23 11:55     ` Catalin Marinas
2019-06-17 14:32 ` [PATCH 2/2] arm64: tlb: Add boot parameter to disable TLB flush within the same inner shareable domain Takao Indoh
2019-06-17 14:32   ` Takao Indoh
2019-07-23 12:11   ` Catalin Marinas
2019-07-23 12:11     ` Catalin Marinas
2019-06-17 17:03 ` [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction " Will Deacon
2019-06-17 17:03   ` Will Deacon
2019-06-24 10:34   ` qi.fuli
2019-06-24 10:34     ` qi.fuli
2019-06-27 10:27     ` Will Deacon
2019-06-27 10:27       ` Will Deacon
2019-07-03  2:45       ` qi.fuli
2019-07-03  2:45         ` qi.fuli
2019-07-09  0:25         ` Jon Masters
2019-07-09  0:25           ` Jon Masters
2019-07-09  0:29           ` Jon Masters
2019-07-09  0:29             ` Jon Masters
2019-07-09  8:03             ` Will Deacon
2019-07-09  8:03               ` Will Deacon
2019-07-09  8:07         ` Will Deacon
2019-07-09  8:07           ` Will Deacon
2019-11-01  9:56 ` qi.fuli
2019-11-01  9:56   ` qi.fuli
2019-11-01 17:28   ` Will Deacon
2019-11-01 17:28     ` Will Deacon
2019-11-26 14:26     ` Matthias Brugger
2019-11-26 14:26       ` Matthias Brugger
2019-11-26 14:36       ` Will Deacon [this message]
2019-11-26 14:36         ` Will Deacon
2019-12-01 16:02     ` Jon Masters
2019-12-01 16:02       ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191126143657.GA9395@willie-the-truck \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=indou.takao@fujitsu.com \
    --cc=itaru.kitayama@gmail.com \
    --cc=jcm@jonmasters.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maeda.naoaki@fujitsu.com \
    --cc=mbrugger@suse.com \
    --cc=misono.tomohiro@fujitsu.com \
    --cc=peterz@infradead.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=rrichter@marvell.com \
    --cc=tokamoto@jp.fujitsu.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.