xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@arm.com>
To: Julien Grall <julien.grall@arm.com>
Cc: xen-devel@lists.xenproject.org,
	Stefano Stabellini <sstabellini@kernel.org>,
	Andrii Anisov <andrii_anisov@epam.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Andrii Anisov <andrii.anisov@gmail.com>
Subject: Re: [Xen-devel] [RFC 5/6] arm64: call enter_hypervisor_head only when it is needed
Date: Wed, 31 Jul 2019 12:33:14 +0100	[thread overview]
Message-ID: <20190731123314.1216dd07@donnerap.cambridge.arm.com> (raw)
In-Reply-To: <8c7bc6d1-3482-ec5b-b3d9-c6562caf5711@arm.com>

On Wed, 31 Jul 2019 12:02:20 +0100
Julien Grall <julien.grall@arm.com> wrote:

Hi,

> On 30/07/2019 18:35, Andrii Anisov wrote:
> > 
> > On 26.07.19 13:59, Julien Grall wrote:  
> >> Hi,
> >>
> >> On 26/07/2019 11:37, Andrii Anisov wrote:  
> >>> From: Andrii Anisov <andrii_anisov@epam.com>
> >>>
> >>> On ARM64 we know exactly if trap happened from hypervisor or guest, so
> >>> we do not need to take that decision. This reduces a condition for
> >>> all enter_hypervisor_head calls and the function call for traps from
> >>> the hypervisor mode.  
> >>
> >> One condition lost but ...  
> > 
> > ...In the hot path (actually at any trap).  
> 
> Everything is in the hot path here, yet there are a lot of other branches. So 
> why this branch in particular?
> 
> As I have mentioned a few times before, there are a difference between the 
> theory and the practice. In theory, removing a branch looks nice. But in 
> practice this may not be the case.
> 
> In this particular case, I don't believe this is going to have a real impact on 
> the performance.
> 
> The PSTATE has been saved a few instructions before in cpu_user_regs, so there
> are high chance the value will still be in the L1 cache.

I agree on this, and second the idea of *not* micro-optimising code just for the sake of it. If you have numbers that back this up, it would be a different story.

> The compiler may also decide to do the direct branch when not in guest_mode. A 
> trap from the hypervisor is mostly for interrupts. So there are chance this is 
> not going to have a real impact on the overall of the interrupt handling.
> 
> If you are really worry of the impact of branch then there are a few more 
> important places (with a greater benefits) to look:
>      1) It seems the compiler use a jump table for the switch case in 
> do_trap_guest_sync(), so it will result to multiple direct branch everytime.

I don't think it's worth to "fix" this issue. The compiler has done this for a reason, and I would guess it figured that this is cheaper than other ways of solving this. If you are really paranoid about this, I would try to compile this with tuning for your particular core (-mtune), so that the compiler can throw in more micro-architectural knowledge about the cost of certain instructions.

>      2) Indirect branch have a non-negligible cost compare to direct branch. 
> This is a lot used in the interrupt code (see gic_hw_ops->read_irq()). All of 
> them are known at boot time, so they could be replace with direct branch. x86 
> recently introduced alternative_call() for this purpose. This could be re-used 
> on Arm.

This is indeed something I was always worried about: It looks cheap and elegant in the C source code, but is potentially expensive on hardware. The particular snippet is:
...
  249024:       d5033fdf        isb
  249028:       f9401e80        ldr     x0, [x20, #56]
  24902c:       f9407801        ldr     x1, [x0, #240]
  249030:       2a1303e0        mov     w0, w19
  249034:       d63f0020        blr     x1
...
In case of an interrupt, the first load will probably miss the cache, and the CPU is stuck now, because due to the dependencies it can't do much else. It can't even predict the branch and speculatively execute anything, because the destination address is yet another dependent load away.
This might not matter for little cores like A53s, because they wouldn't speculate anyway. But better cores (A72, for instance) would most likely benefit from an optimisation in this area.

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-07-31 11:33 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 10:37 [Xen-devel] [RFC 0/6] XEN scheduling hardening Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 1/6] xen/arm: Re-enable interrupt later in the trap path Andrii Anisov
2019-07-26 10:48   ` Julien Grall
2019-07-30 17:35     ` Andrii Anisov
2019-07-30 20:10       ` Julien Grall
2019-08-01  6:45         ` Andrii Anisov
2019-08-01  9:37           ` Julien Grall
2019-08-02  8:28             ` Andrii Anisov
2019-08-02  9:03               ` Julien Grall
2019-08-02 12:24                 ` Andrii Anisov
2019-08-02 13:22                   ` Julien Grall
2019-08-01 11:19           ` Dario Faggioli
2019-08-02  7:50             ` Andrii Anisov
2019-08-02  9:15               ` Julien Grall
2019-08-02 13:07                 ` Andrii Anisov
2019-08-02 13:49                   ` Julien Grall
2019-08-03  1:39                     ` Dario Faggioli
2019-08-03  0:55                   ` Dario Faggioli
2019-08-06 13:09                     ` Andrii Anisov
2019-08-08 14:07                       ` Andrii Anisov
2019-08-13 14:45                         ` Dario Faggioli
2019-08-15 18:25                           ` Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 2/6] schedule: account true system idle time Andrii Anisov
2019-07-26 12:00   ` Dario Faggioli
2019-07-26 12:42     ` Andrii Anisov
2019-07-29 11:40       ` Dario Faggioli
2019-08-01  8:23         ` Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 3/6] sysctl: extend XEN_SYSCTL_getcpuinfo interface Andrii Anisov
2019-07-26 12:15   ` Dario Faggioli
2019-07-26 13:06     ` Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 4/6] xentop: show CPU load information Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 5/6] arm64: сall enter_hypervisor_head only when it is needed Andrii Anisov
2019-07-26 10:44   ` Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 5/6] arm64: call " Andrii Anisov
2019-07-26 10:59   ` Julien Grall
2019-07-30 17:35     ` Andrii Anisov
2019-07-31 11:02       ` Julien Grall
2019-07-31 11:33         ` Andre Przywara [this message]
2019-08-01  7:33         ` Andrii Anisov
2019-08-01 10:17           ` Julien Grall
2019-08-02 13:50             ` Andrii Anisov
2019-07-26 10:37 ` [Xen-devel] [RFC 6/6] schedule: account all the hypervisor time to the idle vcpu Andrii Anisov
2019-07-26 11:56 ` [Xen-devel] [RFC 0/6] XEN scheduling hardening Dario Faggioli
2019-07-26 12:14   ` Juergen Gross
2019-07-29 11:53     ` Dario Faggioli
2019-07-29 12:13       ` Juergen Gross
2019-07-29 14:47     ` Andrii Anisov
2019-07-29 18:46       ` Dario Faggioli
2019-07-29 14:28   ` Andrii Anisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190731123314.1216dd07@donnerap.cambridge.arm.com \
    --to=andre.przywara@arm.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrii.anisov@gmail.com \
    --cc=andrii_anisov@epam.com \
    --cc=julien.grall@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).