All of lore.kernel.org
 help / color / mirror / Atom feed
From: He Ying <heying24@huawei.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: <catalin.marinas@arm.com>, <dvyukov@google.com>,
	<elver@google.com>, <james.morse@arm.com>,
	<linux-arm-kernel@lists.infradead.org>, <paulmck@kernel.org>,
	<peterz@infradead.org>, <will@kernel.org>
Subject: Re: [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C
Date: Fri, 7 May 2021 11:25:31 +0800	[thread overview]
Message-ID: <2fbcbef1-523d-da73-9426-205dbfb1877b@huawei.com> (raw)
In-Reply-To: <20210506105832.GB32366@C02TD0UTHF1T.local>


在 2021/5/6 18:58, Mark Rutland 写道:
> On Thu, May 06, 2021 at 06:25:40PM +0800, He Ying wrote:
>> 在 2021/5/6 17:16, Mark Rutland 写道:
>>> On Thu, May 06, 2021 at 04:28:09PM +0800, He Ying wrote:
>>>> Hi Mark,
>>> Hi,
>>>
>>>> I have faced a performance regression for handling IPIs since this commit.
>>>>
>>>> I caculate the cycles from the entry of el1_irq to the entry of
>>>> gic_handle_irq.
>>>>
>>>>   From my test, this commit may overhead an average of 200 cycles. Do you
>>>>
>>>> have any ideas about this? Looking forward to your reply.
>>> On that path, the only meaningfull difference is the call to
>>> enter_el1_irq_or_nmi(), since that's now unconditional, and it's an
>>> extra layer in the callchain.
>>>
>>> When either CONFIG_ARM64_PSEUDO_NMI or CONFIG_TRACE_IRQFLAGS are
>>> selected, enter_el1_irq_or_nmi() is a wrapper for functions we'd already
>>> call, and I'd expectthe cost of the callees to dominate.
>>>
>>> When neither CONFIG_ARM64_PSEUDO_NMI nor CONFIG_TRACE_IRQFLAGS are
>>> selected, this should add a trivial function that immediately returns,
>>> and so 200 cycles seems excessive.
>>>
>>> Building that commit with defconfig, I see that GCC 10.1.0 generates:
>>>
>>> | ffff800010dfc864 <enter_el1_irq_or_nmi>:
>>> | ffff800010dfc864:       d503233f        paciasp
>>> | ffff800010dfc868:       d50323bf        autiasp
>>> | ffff800010dfc86c:       d65f03c0        ret
>> CONFIG_ARM64_PSEUDO_NMI is not set in my test. And I generate a different
>> object
>>
>> from yours:
>>
>> 00000000000002b8 <enter_el1_irq_or_nmi>:
>>
>>   2b8:        d503233f         paciasp
>>   2bc:        a9bf7bfd          stp  x29, x30, [sp, #-16]!
>>   2c0:        91052000         add x0, x0, #0x148
>>   2c4:        910003fd          mov x29, sp
>>   2c8:        97ffff57             bl 24 <enter_from_kernel_mode.isra.6>
>>   2cc:        a8c17bfd           ldp x29, x30, [sp], #16
>>   2d0:       d50323bf           autiasp
>>   2d4:       d65f03c0            ret
> Which commit are you testing with?
>
> The call to enter_from_kernel_mode() was introduced later in commit:
>
>    7cd1ea1010acbede ("rm64: entry: fix non-NMI kernel<->kernel transitions")
>
> ... and doesn't exist in commit:
>
>    105fc3352077bba5 ("arm64: entry: move el1 irq/nmi logic to C")
>
> Do you see the 200 cycle penalty with 105fc3352077bba5 alone? ... or
> only only after the whole series is applied?
Sorry I didn't point it out. The truth is after the whole series is applied.
>
> If enter_from_kernel_mode() is what's taking the bulk of the cycles,
> then this is likely unavoidable work that previously (erroneously)
> omitted.
Unavoided work? No, please...
>
>>> ... so perhaps the PACIASP and AUTIASP have an impact?
>> I'm not sure...
>>> I have a few questions:
>>>
>>> * Which CPU do you see this on?
>> Hisilicon hip05-d02.
>>> * Does that CPU implement pointer authentication?
>> I'm not sure. How to check?
> Does the dmesg contain "Address authentication" anywhere?

I don't find "Address authentication" in dmesg. But I find 
CONFIG_ARM64_PTR_AUTH is set to y in our config.

Does the config CONFIG_ARM64_PTR_AUTH impact the performance?

>
>>> * What kernel config are you using? e.g. is this seen with defconfig?
>> Our own. But CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>> Should I provide it as an attachment?
> >From your attachment I see that TRACE_IRQFLAGS and LOCKDEP aren't
> selected either, so IIUC the only non-trivial bits in
> enter_from_kernel_mode() will be the RCU accounting.

 From my other tests, the following code contriutes most to the overhead.


static void noinstr enter_from_kernel_mode(struct pt_regs *regs)

{

   regs->exit_rcu = false;

   ...

}

>
>>> * What's the total cycle count from el1_irq to gic_handle_irq?
>> Applying the patchset:   249 cycles.
>>
>> Reverting the patchset:  77 cycles.
>>
>> Maybe 170 cycles is more correct.
>>
>>> * Does this measurably impact a real workload?
>> Have some impact to scheduling perf test.
> Does it affect a real workload? i.e. not a microbenchmark?
We just run some benchmarks. I'm not sure how it affects a real workload.
>
> Thanks,
> Mark.
> .

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-07  3:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 11:59 [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Mark Rutland
2020-11-30 11:59 ` [PATCHv2 01/11] arm64: syscall: exit userspace before unmasking exceptions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 02/11] arm64: mark idle code as noinstr Mark Rutland
2020-11-30 11:59 ` [PATCHv2 03/11] arm64: entry: mark entry " Mark Rutland
2020-11-30 11:59 ` [PATCHv2 04/11] arm64: entry: move enter_from_user_mode to entry-common.c Mark Rutland
2020-11-30 11:59 ` [PATCHv2 05/11] arm64: entry: prepare ret_to_user for function call Mark Rutland
2020-12-17 17:57   ` Guenter Roeck
2020-12-17 18:38     ` Catalin Marinas
2020-11-30 11:59 ` [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C Mark Rutland
2021-05-06  8:28   ` He Ying
2021-05-06  9:16     ` Mark Rutland
     [not found]       ` <e3843e03-173e-10a6-5b14-0d8c14219e09@huawei.com>
2021-05-06 10:58         ` Mark Rutland
2021-05-07  3:25           ` He Ying [this message]
2021-05-07  9:41             ` Mark Rutland
2021-05-07 10:02               ` He Ying
2020-11-30 11:59 ` [PATCHv2 07/11] arm64: entry: fix non-NMI user<->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 08/11] arm64: ptrace: prepare for EL1 irq/rcu tracking Mark Rutland
2020-11-30 11:59 ` [PATCHv2 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions Mark Rutland
2021-04-25  5:29   ` Zenghui Yu
2021-04-26  9:21     ` Mark Rutland
2021-04-26 13:39       ` Zenghui Yu
2021-04-27  7:15         ` Zenghui Yu
2021-04-27 14:43           ` Mark Rutland
2020-11-30 11:59 ` [PATCHv2 10/11] arm64: entry: fix NMI {user, kernel}->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 11/11] arm64: entry: fix EL1 debug transitions Mark Rutland
2020-11-30 13:52 ` [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fbcbef1-523d-da73-9426-205dbfb1877b@huawei.com \
    --to=heying24@huawei.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.