From: He Ying <heying24@huawei.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: <catalin.marinas@arm.com>, <dvyukov@google.com>,
<elver@google.com>, <james.morse@arm.com>,
<linux-arm-kernel@lists.infradead.org>, <paulmck@kernel.org>,
<peterz@infradead.org>, <will@kernel.org>
Subject: Re: [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C
Date: Fri, 7 May 2021 11:25:31 +0800 [thread overview]
Message-ID: <2fbcbef1-523d-da73-9426-205dbfb1877b@huawei.com> (raw)
In-Reply-To: <20210506105832.GB32366@C02TD0UTHF1T.local>
在 2021/5/6 18:58, Mark Rutland 写道:
> On Thu, May 06, 2021 at 06:25:40PM +0800, He Ying wrote:
>> 在 2021/5/6 17:16, Mark Rutland 写道:
>>> On Thu, May 06, 2021 at 04:28:09PM +0800, He Ying wrote:
>>>> Hi Mark,
>>> Hi,
>>>
>>>> I have faced a performance regression for handling IPIs since this commit.
>>>>
>>>> I caculate the cycles from the entry of el1_irq to the entry of
>>>> gic_handle_irq.
>>>>
>>>> From my test, this commit may overhead an average of 200 cycles. Do you
>>>>
>>>> have any ideas about this? Looking forward to your reply.
>>> On that path, the only meaningfull difference is the call to
>>> enter_el1_irq_or_nmi(), since that's now unconditional, and it's an
>>> extra layer in the callchain.
>>>
>>> When either CONFIG_ARM64_PSEUDO_NMI or CONFIG_TRACE_IRQFLAGS are
>>> selected, enter_el1_irq_or_nmi() is a wrapper for functions we'd already
>>> call, and I'd expectthe cost of the callees to dominate.
>>>
>>> When neither CONFIG_ARM64_PSEUDO_NMI nor CONFIG_TRACE_IRQFLAGS are
>>> selected, this should add a trivial function that immediately returns,
>>> and so 200 cycles seems excessive.
>>>
>>> Building that commit with defconfig, I see that GCC 10.1.0 generates:
>>>
>>> | ffff800010dfc864 <enter_el1_irq_or_nmi>:
>>> | ffff800010dfc864: d503233f paciasp
>>> | ffff800010dfc868: d50323bf autiasp
>>> | ffff800010dfc86c: d65f03c0 ret
>> CONFIG_ARM64_PSEUDO_NMI is not set in my test. And I generate a different
>> object
>>
>> from yours:
>>
>> 00000000000002b8 <enter_el1_irq_or_nmi>:
>>
>> 2b8: d503233f paciasp
>> 2bc: a9bf7bfd stp x29, x30, [sp, #-16]!
>> 2c0: 91052000 add x0, x0, #0x148
>> 2c4: 910003fd mov x29, sp
>> 2c8: 97ffff57 bl 24 <enter_from_kernel_mode.isra.6>
>> 2cc: a8c17bfd ldp x29, x30, [sp], #16
>> 2d0: d50323bf autiasp
>> 2d4: d65f03c0 ret
> Which commit are you testing with?
>
> The call to enter_from_kernel_mode() was introduced later in commit:
>
> 7cd1ea1010acbede ("rm64: entry: fix non-NMI kernel<->kernel transitions")
>
> ... and doesn't exist in commit:
>
> 105fc3352077bba5 ("arm64: entry: move el1 irq/nmi logic to C")
>
> Do you see the 200 cycle penalty with 105fc3352077bba5 alone? ... or
> only only after the whole series is applied?
Sorry I didn't point it out. The truth is after the whole series is applied.
>
> If enter_from_kernel_mode() is what's taking the bulk of the cycles,
> then this is likely unavoidable work that previously (erroneously)
> omitted.
Unavoided work? No, please...
>
>>> ... so perhaps the PACIASP and AUTIASP have an impact?
>> I'm not sure...
>>> I have a few questions:
>>>
>>> * Which CPU do you see this on?
>> Hisilicon hip05-d02.
>>> * Does that CPU implement pointer authentication?
>> I'm not sure. How to check?
> Does the dmesg contain "Address authentication" anywhere?
I don't find "Address authentication" in dmesg. But I find
CONFIG_ARM64_PTR_AUTH is set to y in our config.
Does the config CONFIG_ARM64_PTR_AUTH impact the performance?
>
>>> * What kernel config are you using? e.g. is this seen with defconfig?
>> Our own. But CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>> Should I provide it as an attachment?
> >From your attachment I see that TRACE_IRQFLAGS and LOCKDEP aren't
> selected either, so IIUC the only non-trivial bits in
> enter_from_kernel_mode() will be the RCU accounting.
From my other tests, the following code contriutes most to the overhead.
static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
{
regs->exit_rcu = false;
...
}
>
>>> * What's the total cycle count from el1_irq to gic_handle_irq?
>> Applying the patchset: 249 cycles.
>>
>> Reverting the patchset: 77 cycles.
>>
>> Maybe 170 cycles is more correct.
>>
>>> * Does this measurably impact a real workload?
>> Have some impact to scheduling perf test.
> Does it affect a real workload? i.e. not a microbenchmark?
We just run some benchmarks. I'm not sure how it affects a real workload.
>
> Thanks,
> Mark.
> .
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-05-07 3:27 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-30 11:59 [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Mark Rutland
2020-11-30 11:59 ` [PATCHv2 01/11] arm64: syscall: exit userspace before unmasking exceptions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 02/11] arm64: mark idle code as noinstr Mark Rutland
2020-11-30 11:59 ` [PATCHv2 03/11] arm64: entry: mark entry " Mark Rutland
2020-11-30 11:59 ` [PATCHv2 04/11] arm64: entry: move enter_from_user_mode to entry-common.c Mark Rutland
2020-11-30 11:59 ` [PATCHv2 05/11] arm64: entry: prepare ret_to_user for function call Mark Rutland
2020-12-17 17:57 ` Guenter Roeck
2020-12-17 18:38 ` Catalin Marinas
2020-11-30 11:59 ` [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C Mark Rutland
2021-05-06 8:28 ` He Ying
2021-05-06 9:16 ` Mark Rutland
[not found] ` <e3843e03-173e-10a6-5b14-0d8c14219e09@huawei.com>
2021-05-06 10:58 ` Mark Rutland
2021-05-07 3:25 ` He Ying [this message]
2021-05-07 9:41 ` Mark Rutland
2021-05-07 10:02 ` He Ying
2020-11-30 11:59 ` [PATCHv2 07/11] arm64: entry: fix non-NMI user<->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 08/11] arm64: ptrace: prepare for EL1 irq/rcu tracking Mark Rutland
2020-11-30 11:59 ` [PATCHv2 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions Mark Rutland
2021-04-25 5:29 ` Zenghui Yu
2021-04-26 9:21 ` Mark Rutland
2021-04-26 13:39 ` Zenghui Yu
2021-04-27 7:15 ` Zenghui Yu
2021-04-27 14:43 ` Mark Rutland
2020-11-30 11:59 ` [PATCHv2 10/11] arm64: entry: fix NMI {user, kernel}->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 11/11] arm64: entry: fix EL1 debug transitions Mark Rutland
2020-11-30 13:52 ` [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2fbcbef1-523d-da73-9426-205dbfb1877b@huawei.com \
--to=heying24@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).