All of lore.kernel.org
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: Owen Anderson <oanderso@google.com>, qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RFC] Don't lookup full CPU state in the indirect branch fast path on AArch64 when running in user mode.
Date: Mon, 19 Oct 2020 11:22:52 -0700	[thread overview]
Message-ID: <7c416109-6431-076e-2284-0f8406931ab9@linaro.org> (raw)
In-Reply-To: <CAKs3XfJPrMGxdAouGCje3ppnWKA6fXpNaywQ5rj9u45VkPuUXg@mail.gmail.com>

On 9/29/20 2:32 PM, Owen Anderson wrote:
> Hello,
> 
> I would like to request feedback on the following patch, which I do
> not believe should be applied to master as-is.  The idea here is to
> avoid gathering the full CPU state in the fast path of an indirect
> branch lookup when running in user mode on a platform where the flags
> can only be changed in privileged mode.  I believe this is true on the
> AArch64 scenario that I care about, but clearly not true in general.
> I'm particularly seeking feedback on how to clean this up into a
> version that checks the correct necessary and sufficient conditions to
> allow all users that can benefit from it to do so.
> 
> On the workload that I am targeting (aarch64 on x86), this patch
> reduces execution wall time by approximately 20%, and eliminates
> indirect branch lookups from the hot stack traces entirely.

(1) What qemu version are you looking at and,
(2) Do you have --enable-tcg-debug enabled?

Because you should not be seeing anything even close to 20% overhead.

In e979972a6a1 (included in qemu 4.2), the AArch64 path is

    uint32_t flags = env->hflags;
    *cs_base = 0;
    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
        *pc = env->pc;
        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
            flags = FIELD_DP32(flags, TBFLAG_A64,
               BTYPE, env->btype);
        }
        pstate_for_ss = env->pstate;
    }
    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE) &&
        (pstate_for_ss & PSTATE_SS)) {
        flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
    }
    *pflags = flags;

With --enable-tcg-debug, there is an additional step wherein we validate that
env->hflags has the correct value.  Which has caught a number of bugs.

With a silly testcase like so:

    for (int x = 0; x < 10000000; ++x) {
        void *tmp;
        asm volatile("adr %0,1f; br %0; 1:" : "=r"(tmp));
    }

I see cpu_get_tb_cpu_state no higher than 10% of the total runtime.  Which, I
admit is higher than I expected, but still nothing like what you're reporting.
 And a "reasonable" test case should surely have a lower proportion of indirect
branches per dynamic instruction.


> +#if !defined(TARGET_ARM) || !defined(CONFIG_USER_ONLY)
>      cpu_get_tb_cpu_state(env, pc, cs_base, flags);
> +#else
> +    if (is_a64(env)) {
> +      *pc = env->pc;
> +    } else {
> +      *pc = env->regs[15];
> +    }
> +#endif
...
> +#if !defined(TARGET_ARM) || !defined(CONFIG_USER_ONLY)
>                 tb->cs_base == *cs_base &&
>                 tb->flags == *flags &&
> +#endif

This is assuming that all TB have the same flags, and thus the flags don't need
comparing.  Which is false, even for CONFIG_USER_ONLY.

I would guess that testing -cpu cortex-a57 does not use any of the bits that
might change, but post v8.2 will: SVE, BTI, MTE.  So, this change breaks stuff.


r~


  parent reply	other threads:[~2020-10-19 18:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 21:32 [RFC] Don't lookup full CPU state in the indirect branch fast path on AArch64 when running in user mode Owen Anderson
2020-10-12 20:52 ` Owen Anderson
2020-10-19 17:16   ` Owen Anderson
2020-10-19 18:22 ` Richard Henderson [this message]
2020-10-19 22:44   ` Owen Anderson
2020-10-19 22:58     ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c416109-6431-076e-2284-0f8406931ab9@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=oanderso@google.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.