All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: "qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>,
	"indou.takao@fujitsu.com" <indou.takao@fujitsu.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
Date: Tue, 9 Jul 2019 09:07:52 +0100	[thread overview]
Message-ID: <20190709080751.3nm2llg64g44hmwn@willie-the-truck> (raw)
In-Reply-To: <5999ed84-72d0-9d42-bf7d-b8d56eaa4d4a@jp.fujitsu.com>

On Wed, Jul 03, 2019 at 02:45:43AM +0000, qi.fuli@fujitsu.com wrote:
> We used FWQ [1] to do an experiment on 1 node of our HPC environment,
> we expected it would be tens of microseconds of maximum OS jitter, but 
> it was
> hundreds of microseconds, which didn't meet our requirement. We tried to 
> find
> out the cause by using ftrace, but we cannot find any processes which would
> cause noise and only knew the extension of processing time. Then we 
> confirmed
> the CPU instruction count through CPU PMU, we also didn't find any changes.
> However, we found that with the increase of that the TLB flash was called,
> the noise was also increasing. Here we understood that the cause of this 
> issue
> is the implementation of Linux's TLB flush for arm64, especially use of 
> TLBI-is
> instruction which is a broadcast to all processor core on the system. 
> Therefore,
> we made this patch set to fix this issue. After testing for several 
> times, the
> noise was reduced and our original goal was achieved, so we do think 
> this patch
> makes sense.
> 
> As I mentioned, the OS jitter is a vital issue for large-scale HPC 
> environment.
> We tried a lot of things to reduce the OS jitter. One of them is task 
> separation
> between the CPUs which are used for computing and the CPUs which are 
> used for
> maintenance. All of the daemon processes and I/O interrupts are bounden 
> to the
> maintenance CPUs. Further more, we used nohz_full to avoid the noise 
> caused by
> computing CPU interruption, but all of the CPUs were affected by TLBI-is
> instruction, the task separation of CPUs didn't work. Therefore, we 
> would like
> to implement that TLB flush is done on minimal CPUs to reducing the OS 
> jitter
> by using this patch set.

So have you confirmed that this is due to TLBI traffic and not, for example,
stores sitting in remote store buffers that get flushed by the IPI or
something else like that? It feels like you're inferring things about the
underlying behaviour, whereas you should be in a position to simulate this
if nothing else.

If it *is* because of TLBI, then where are they coming from? Is FWQ calling
munmap/mprotect all the time? Why?

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: "qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"indou.takao@fujitsu.com" <indou.takao@fujitsu.com>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain
Date: Tue, 9 Jul 2019 09:07:52 +0100	[thread overview]
Message-ID: <20190709080751.3nm2llg64g44hmwn@willie-the-truck> (raw)
In-Reply-To: <5999ed84-72d0-9d42-bf7d-b8d56eaa4d4a@jp.fujitsu.com>

On Wed, Jul 03, 2019 at 02:45:43AM +0000, qi.fuli@fujitsu.com wrote:
> We used FWQ [1] to do an experiment on 1 node of our HPC environment,
> we expected it would be tens of microseconds of maximum OS jitter, but 
> it was
> hundreds of microseconds, which didn't meet our requirement. We tried to 
> find
> out the cause by using ftrace, but we cannot find any processes which would
> cause noise and only knew the extension of processing time. Then we 
> confirmed
> the CPU instruction count through CPU PMU, we also didn't find any changes.
> However, we found that with the increase of that the TLB flash was called,
> the noise was also increasing. Here we understood that the cause of this 
> issue
> is the implementation of Linux's TLB flush for arm64, especially use of 
> TLBI-is
> instruction which is a broadcast to all processor core on the system. 
> Therefore,
> we made this patch set to fix this issue. After testing for several 
> times, the
> noise was reduced and our original goal was achieved, so we do think 
> this patch
> makes sense.
> 
> As I mentioned, the OS jitter is a vital issue for large-scale HPC 
> environment.
> We tried a lot of things to reduce the OS jitter. One of them is task 
> separation
> between the CPUs which are used for computing and the CPUs which are 
> used for
> maintenance. All of the daemon processes and I/O interrupts are bounden 
> to the
> maintenance CPUs. Further more, we used nohz_full to avoid the noise 
> caused by
> computing CPU interruption, but all of the CPUs were affected by TLBI-is
> instruction, the task separation of CPUs didn't work. Therefore, we 
> would like
> to implement that TLB flush is done on minimal CPUs to reducing the OS 
> jitter
> by using this patch set.

So have you confirmed that this is due to TLBI traffic and not, for example,
stores sitting in remote store buffers that get flushed by the IPI or
something else like that? It feels like you're inferring things about the
underlying behaviour, whereas you should be in a position to simulate this
if nothing else.

If it *is* because of TLBI, then where are they coming from? Is FWQ calling
munmap/mprotect all the time? Why?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-07-09  8:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 14:32 [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain Takao Indoh
2019-06-17 14:32 ` Takao Indoh
2019-06-17 14:32 ` [PATCH 1/2] arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm: kill mm_cpumask usage")) Takao Indoh
2019-06-17 14:32   ` Takao Indoh
2019-07-23 11:55   ` Catalin Marinas
2019-07-23 11:55     ` Catalin Marinas
2019-06-17 14:32 ` [PATCH 2/2] arm64: tlb: Add boot parameter to disable TLB flush within the same inner shareable domain Takao Indoh
2019-06-17 14:32   ` Takao Indoh
2019-07-23 12:11   ` Catalin Marinas
2019-07-23 12:11     ` Catalin Marinas
2019-06-17 17:03 ` [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction " Will Deacon
2019-06-17 17:03   ` Will Deacon
2019-06-24 10:34   ` qi.fuli
2019-06-24 10:34     ` qi.fuli
2019-06-27 10:27     ` Will Deacon
2019-06-27 10:27       ` Will Deacon
2019-07-03  2:45       ` qi.fuli
2019-07-03  2:45         ` qi.fuli
2019-07-09  0:25         ` Jon Masters
2019-07-09  0:25           ` Jon Masters
2019-07-09  0:29           ` Jon Masters
2019-07-09  0:29             ` Jon Masters
2019-07-09  8:03             ` Will Deacon
2019-07-09  8:03               ` Will Deacon
2019-07-09  8:07         ` Will Deacon [this message]
2019-07-09  8:07           ` Will Deacon
2019-11-01  9:56 ` qi.fuli
2019-11-01  9:56   ` qi.fuli
2019-11-01 17:28   ` Will Deacon
2019-11-01 17:28     ` Will Deacon
2019-11-26 14:26     ` Matthias Brugger
2019-11-26 14:26       ` Matthias Brugger
2019-11-26 14:36       ` Will Deacon
2019-11-26 14:36         ` Will Deacon
2019-12-01 16:02     ` Jon Masters
2019-12-01 16:02       ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190709080751.3nm2llg64g44hmwn@willie-the-truck \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=indou.takao@fujitsu.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.