All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: He Ying <heying24@huawei.com>
Cc: catalin.marinas@arm.com, dvyukov@google.com, elver@google.com,
	james.morse@arm.com, linux-arm-kernel@lists.infradead.org,
	paulmck@kernel.org, peterz@infradead.org, will@kernel.org
Subject: Re: [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C
Date: Fri, 7 May 2021 10:41:13 +0100	[thread overview]
Message-ID: <20210507094113.GA52849@C02TD0UTHF1T.local> (raw)
In-Reply-To: <2fbcbef1-523d-da73-9426-205dbfb1877b@huawei.com>

On Fri, May 07, 2021 at 11:25:31AM +0800, He Ying wrote:
> 在 2021/5/6 18:58, Mark Rutland 写道:
> > On Thu, May 06, 2021 at 06:25:40PM +0800, He Ying wrote:
> > > 在 2021/5/6 17:16, Mark Rutland 写道:
> > > > On Thu, May 06, 2021 at 04:28:09PM +0800, He Ying wrote:
> > > > > Hi Mark,
> > > > Hi,
> > > > 
> > > > > I have faced a performance regression for handling IPIs since this commit.
> > > > > 
> > > > > I caculate the cycles from the entry of el1_irq to the entry of
> > > > > gic_handle_irq.
> > > > > 
> > > > >   From my test, this commit may overhead an average of 200 cycles. Do you
> > > > > 
> > > > > have any ideas about this? Looking forward to your reply.
> > > > On that path, the only meaningfull difference is the call to
> > > > enter_el1_irq_or_nmi(), since that's now unconditional, and it's an
> > > > extra layer in the callchain.
> > > > 
> > > > When either CONFIG_ARM64_PSEUDO_NMI or CONFIG_TRACE_IRQFLAGS are
> > > > selected, enter_el1_irq_or_nmi() is a wrapper for functions we'd already
> > > > call, and I'd expectthe cost of the callees to dominate.
> > > > 
> > > > When neither CONFIG_ARM64_PSEUDO_NMI nor CONFIG_TRACE_IRQFLAGS are
> > > > selected, this should add a trivial function that immediately returns,
> > > > and so 200 cycles seems excessive.
> > > > 
> > > > Building that commit with defconfig, I see that GCC 10.1.0 generates:
> > > > 
> > > > | ffff800010dfc864 <enter_el1_irq_or_nmi>:
> > > > | ffff800010dfc864:       d503233f        paciasp
> > > > | ffff800010dfc868:       d50323bf        autiasp
> > > > | ffff800010dfc86c:       d65f03c0        ret
> > > CONFIG_ARM64_PSEUDO_NMI is not set in my test. And I generate a different
> > > object
> > > 
> > > from yours:
> > > 
> > > 00000000000002b8 <enter_el1_irq_or_nmi>:
> > > 
> > >   2b8:        d503233f         paciasp
> > >   2bc:        a9bf7bfd          stp  x29, x30, [sp, #-16]!
> > >   2c0:        91052000         add x0, x0, #0x148
> > >   2c4:        910003fd          mov x29, sp
> > >   2c8:        97ffff57             bl 24 <enter_from_kernel_mode.isra.6>
> > >   2cc:        a8c17bfd           ldp x29, x30, [sp], #16
> > >   2d0:       d50323bf           autiasp
> > >   2d4:       d65f03c0            ret
> > Which commit are you testing with?
> > 
> > The call to enter_from_kernel_mode() was introduced later in commit:
> > 
> >    7cd1ea1010acbede ("rm64: entry: fix non-NMI kernel<->kernel transitions")
> > 
> > ... and doesn't exist in commit:
> > 
> >    105fc3352077bba5 ("arm64: entry: move el1 irq/nmi logic to C")
> > 
> > Do you see the 200 cycle penalty with 105fc3352077bba5 alone? ... or
> > only only after the whole series is applied?
> Sorry I didn't point it out. The truth is after the whole series is applied.

Ok. In future it would be very helpful to be more precise, as otherwise
people can end up wasting time investigating with the wrong information.

What you initially said:

| I have faced a performance regression for handling IPIs since this
| commit.

... is somewhat misleading.

> > If enter_from_kernel_mode() is what's taking the bulk of the cycles,
> > then this is likely unavoidable work that previously (erroneously)
> > omitted.
> Unavoided work? No, please...
> > 
> > > > ... so perhaps the PACIASP and AUTIASP have an impact?
> > > I'm not sure...
> > > > I have a few questions:
> > > > 
> > > > * Which CPU do you see this on?
> > > Hisilicon hip05-d02.
> > > > * Does that CPU implement pointer authentication?
> > > I'm not sure. How to check?
> > Does the dmesg contain "Address authentication" anywhere?
> 
> I don't find "Address authentication" in dmesg. But I find
> CONFIG_ARM64_PTR_AUTH is set to y in our config.
> 
> Does the config CONFIG_ARM64_PTR_AUTH impact the performance?

If your HW implements pointer authentication, then there will be some
(small) impact. If your HW does not, then the cost should just be a few
NOPs, and is not expeected to be measureable.

> > > > * What kernel config are you using? e.g. is this seen with defconfig?
> > > Our own. But CONFIG_ARM64_PSEUDO_NMI is not set.
> > > 
> > > Should I provide it as an attachment?
> > >From your attachment I see that TRACE_IRQFLAGS and LOCKDEP aren't
> > selected either, so IIUC the only non-trivial bits in
> > enter_from_kernel_mode() will be the RCU accounting.
> 
> From my other tests, the following code contriutes most to the overhead.
> 
> 
> static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> 
> {
> 
>   regs->exit_rcu = false;
> 
>   ...
> 
> }

The logic manipulting regs->exit_rcu and calling rcu_irq_enter() is
necessary to correctly handle taking interrupts (or other exceptions)
from idle sequences. Without this, RCU isn't guaranteed to be watching,
and is unsafe to use.

So this isn't something that can be easily removed.

> > > > * What's the total cycle count from el1_irq to gic_handle_irq?
> > > Applying the patchset:   249 cycles.
> > > 
> > > Reverting the patchset:  77 cycles.
> > > 
> > > Maybe 170 cycles is more correct.
> > > 
> > > > * Does this measurably impact a real workload?
> > > Have some impact to scheduling perf test.
> > Does it affect a real workload? i.e. not a microbenchmark?
> We just run some benchmarks. I'm not sure how it affects a real workload.

I appreciate that you can measure this with a microbenchmark, but unless
this affects a real workload in a measureable way I don't think that we
should make any changes here.

Thanks
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-07  9:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 11:59 [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Mark Rutland
2020-11-30 11:59 ` [PATCHv2 01/11] arm64: syscall: exit userspace before unmasking exceptions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 02/11] arm64: mark idle code as noinstr Mark Rutland
2020-11-30 11:59 ` [PATCHv2 03/11] arm64: entry: mark entry " Mark Rutland
2020-11-30 11:59 ` [PATCHv2 04/11] arm64: entry: move enter_from_user_mode to entry-common.c Mark Rutland
2020-11-30 11:59 ` [PATCHv2 05/11] arm64: entry: prepare ret_to_user for function call Mark Rutland
2020-12-17 17:57   ` Guenter Roeck
2020-12-17 18:38     ` Catalin Marinas
2020-11-30 11:59 ` [PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C Mark Rutland
2021-05-06  8:28   ` He Ying
2021-05-06  9:16     ` Mark Rutland
     [not found]       ` <e3843e03-173e-10a6-5b14-0d8c14219e09@huawei.com>
2021-05-06 10:58         ` Mark Rutland
2021-05-07  3:25           ` He Ying
2021-05-07  9:41             ` Mark Rutland [this message]
2021-05-07 10:02               ` He Ying
2020-11-30 11:59 ` [PATCHv2 07/11] arm64: entry: fix non-NMI user<->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 08/11] arm64: ptrace: prepare for EL1 irq/rcu tracking Mark Rutland
2020-11-30 11:59 ` [PATCHv2 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions Mark Rutland
2021-04-25  5:29   ` Zenghui Yu
2021-04-26  9:21     ` Mark Rutland
2021-04-26 13:39       ` Zenghui Yu
2021-04-27  7:15         ` Zenghui Yu
2021-04-27 14:43           ` Mark Rutland
2020-11-30 11:59 ` [PATCHv2 10/11] arm64: entry: fix NMI {user, kernel}->kernel transitions Mark Rutland
2020-11-30 11:59 ` [PATCHv2 11/11] arm64: entry: fix EL1 debug transitions Mark Rutland
2020-11-30 13:52 ` [PATCHv2 00/11] arm64: entry lockdep/rcu/tracing fixes Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507094113.GA52849@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=heying24@huawei.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.