All of lore.kernel.org
 help / color / mirror / Atom feed
From: He Ying <heying24@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: <catalin.marinas@arm.com>, <will@kernel.org>,
	<mark.rutland@arm.com>, <marcan@marcan.st>, <joey.gouly@arm.com>,
	<pcc@google.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code
Date: Mon, 10 Jan 2022 11:20:39 +0800	[thread overview]
Message-ID: <a010e8dd-428b-c295-b2db-020d8cc698c5@huawei.com> (raw)
In-Reply-To: <87pmp2tmpg.wl-maz@kernel.org>

Hi Marc,

I'm just back from the weekend and sorry for the delayed reply.


在 2022/1/8 20:51, Marc Zyngier 写道:
> On Fri, 07 Jan 2022 08:55:36 +0000,
> He Ying <heying24@huawei.com> wrote:
>> Our product has been updating its kernel from 4.4 to 5.10 recently and
>> found a performance issue. We do a bussiness test called ARP test, which
>> tests the latency for a ping-pong packets traffic with a certain payload.
>> The result is as following.
>>
>>   - 4.4 kernel: avg = ~20s
>>   - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s
>>
>> I have been just learning arm64 pseudo-NMI code and have a question,
>> why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI?
>> I wonder if this brings some performance regression.
>>
>> First, I make this patch and then do the test again. Here's the result.
>>
>>   - 5.10 kernel with this patch not applied: avg = ~40s
>>   - 5.10 kernel with this patch applied: avg = ~23s
>>
>> Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not
>> set. It seems the pseudo-NMI feature actually brings some overhead to
>> performance event if CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>> Furthermore, I find the feature also brings some overhead to vmlinux size.
>> I build 5.10 kernel with this patch applied or not while
>> CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>>   - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes.
>>   - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes.
>>
>> That means arm64 pseudo-NMI feature may bring ~200KB overhead to
>> vmlinux size.
>>
>> Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size
>> and performance even if config is not set. To avoid it, add macro control
>> all around the related code.
> This obviously attracted my attention, and I took this patch for a
> ride on 5.16-rc8 on a machine that doesn't support GICv3 NMIs to make
> sure that any extra code would only result in pure overhead.
>
> There was no measurable difference with this patch applied or not,
> with CONFIG_ARM64_PSEUDO_NMI selected or not for the workloads I tried
> (I/O heavy virtual machines, hackbench).
Our test is some kind of network test.
>
> Mark already asked a number of questions (test case, implementation,
> test on a modern kernel). Please provide as many detail as you
> possibly can, because such a regression really isn't expected, and
> doesn't show up on the systems I have at hand. Some profiling numbers
> could also be interesting, in case this is a result of a particular
> resource being thrashed (TLB, cache...).

I replied to Mark a few moments ago and provided as many details as I can.

You mentioned TLB and cache could be thrashed. How can we check this?

By using perf tools?

>
> Thanks,
>
> 	M.
>

WARNING: multiple messages have this Message-ID (diff)
From: He Ying <heying24@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: <catalin.marinas@arm.com>, <will@kernel.org>,
	<mark.rutland@arm.com>, <marcan@marcan.st>, <joey.gouly@arm.com>,
	<pcc@google.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code
Date: Mon, 10 Jan 2022 11:20:39 +0800	[thread overview]
Message-ID: <a010e8dd-428b-c295-b2db-020d8cc698c5@huawei.com> (raw)
In-Reply-To: <87pmp2tmpg.wl-maz@kernel.org>

Hi Marc,

I'm just back from the weekend and sorry for the delayed reply.


在 2022/1/8 20:51, Marc Zyngier 写道:
> On Fri, 07 Jan 2022 08:55:36 +0000,
> He Ying <heying24@huawei.com> wrote:
>> Our product has been updating its kernel from 4.4 to 5.10 recently and
>> found a performance issue. We do a bussiness test called ARP test, which
>> tests the latency for a ping-pong packets traffic with a certain payload.
>> The result is as following.
>>
>>   - 4.4 kernel: avg = ~20s
>>   - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s
>>
>> I have been just learning arm64 pseudo-NMI code and have a question,
>> why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI?
>> I wonder if this brings some performance regression.
>>
>> First, I make this patch and then do the test again. Here's the result.
>>
>>   - 5.10 kernel with this patch not applied: avg = ~40s
>>   - 5.10 kernel with this patch applied: avg = ~23s
>>
>> Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not
>> set. It seems the pseudo-NMI feature actually brings some overhead to
>> performance event if CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>> Furthermore, I find the feature also brings some overhead to vmlinux size.
>> I build 5.10 kernel with this patch applied or not while
>> CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>>   - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes.
>>   - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes.
>>
>> That means arm64 pseudo-NMI feature may bring ~200KB overhead to
>> vmlinux size.
>>
>> Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size
>> and performance even if config is not set. To avoid it, add macro control
>> all around the related code.
> This obviously attracted my attention, and I took this patch for a
> ride on 5.16-rc8 on a machine that doesn't support GICv3 NMIs to make
> sure that any extra code would only result in pure overhead.
>
> There was no measurable difference with this patch applied or not,
> with CONFIG_ARM64_PSEUDO_NMI selected or not for the workloads I tried
> (I/O heavy virtual machines, hackbench).
Our test is some kind of network test.
>
> Mark already asked a number of questions (test case, implementation,
> test on a modern kernel). Please provide as many detail as you
> possibly can, because such a regression really isn't expected, and
> doesn't show up on the systems I have at hand. Some profiling numbers
> could also be interesting, in case this is a result of a particular
> resource being thrashed (TLB, cache...).

I replied to Mark a few moments ago and provided as many details as I can.

You mentioned TLB and cache could be thrashed. How can we check this?

By using perf tools?

>
> Thanks,
>
> 	M.
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-01-10  3:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-07  8:55 [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code He Ying
2022-01-07  8:55 ` He Ying
2022-01-07 13:19 ` Mark Rutland
2022-01-07 13:19   ` Mark Rutland
2022-01-10  3:00   ` He Ying
2022-01-10  3:00     ` He Ying
2022-01-10 11:26     ` Mark Rutland
2022-01-10 11:26       ` Mark Rutland
2022-01-11  8:52       ` He Ying
2022-01-11  8:52         ` He Ying
2022-01-11 11:05         ` Mark Rutland
2022-01-11 11:05           ` Mark Rutland
2022-01-08 12:51 ` Marc Zyngier
2022-01-08 12:51   ` Marc Zyngier
2022-01-10  3:20   ` He Ying [this message]
2022-01-10  3:20     ` He Ying
2022-01-12  3:24 ` [PATCH] arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set He Ying
2022-01-12  3:24   ` He Ying
2022-01-19  6:40   ` He Ying
2022-01-19  6:40     ` He Ying
2022-01-19  9:35     ` Mark Rutland
2022-01-19  9:35       ` Mark Rutland
2022-01-19  9:47       ` He Ying
2022-01-19  9:47         ` He Ying
2022-02-15 23:18   ` Will Deacon
2022-02-15 23:18     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a010e8dd-428b-c295-b2db-020d8cc698c5@huawei.com \
    --to=heying24@huawei.com \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=pcc@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.