Re: [PATCH] arm/hvf: Optimize and simplify WFI handling

From: Peter Collingbourne <pcc@google.com>
To: Alexander Graf <agraf@csgraf.de>
Cc: Frank Yang <lfy@google.com>,
	Roman Bolshakov <r.bolshakov@yadro.com>,
	 Peter Maydell <peter.maydell@linaro.org>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	 Cameron Esfahani <dirty@apple.com>,
	qemu-arm@nongnu.org, Claudio Fontana <cfontana@suse.de>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] arm/hvf: Optimize and simplify WFI handling
Date: Tue, 1 Dec 2020 16:52:18 -0800	[thread overview]
Message-ID: <CAMn1gO48FQ1Mm+88yr7g_uOuEmQyN9AfxvwGbHj6Hn-kir8jug@mail.gmail.com> (raw)
In-Reply-To: <8cc9052b-da85-de93-9d54-d4d0730054ec@csgraf.de>

On Tue, Dec 1, 2020 at 2:09 PM Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 01.12.20 21:03, Peter Collingbourne wrote:
> > On Tue, Dec 1, 2020 at 8:26 AM Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 01.12.20 09:21, Peter Collingbourne wrote:
> >>> Sleep on WFx until the VTIMER is due but allow ourselves to be woken
> >>> up on IPI.
> >>>
> >>> Signed-off-by: Peter Collingbourne <pcc@google.com>
> >>> ---
> >>> Alexander Graf wrote:
> >>>> I would love to take a patch from you here :). I'll still be stuck for a
> >>>> while with the sysreg sync rework that Peter asked for before I can look
> >>>> at WFI again.
> >>> Okay, here's a patch :) It's a relatively straightforward adaptation
> >>> of what we have in our fork, which can now boot Android to GUI while
> >>> remaining at around 4% CPU when idle.
> >>>
> >>> I'm not set up to boot a full Linux distribution at the moment so I
> >>> tested it on upstream QEMU by running a recent mainline Linux kernel
> >>> with a rootfs containing an init program that just does sleep(5)
> >>> and verified that the qemu process remains at low CPU usage during
> >>> the sleep. This was on top of your v2 plus the last patch of your v1
> >>> since it doesn't look like you have a replacement for that logic yet.
> >>
> >> How about something like this instead?
> >>
> >>
> >> Alex
> >>
> >>
> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> >> index 4360f64671..50384013ea 100644
> >> --- a/accel/hvf/hvf-cpus.c
> >> +++ b/accel/hvf/hvf-cpus.c
> >> @@ -337,16 +337,18 @@ static int hvf_init_vcpu(CPUState *cpu)
> >>        cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
> >>
> >>        /* init cpu signals */
> >> -    sigset_t set;
> >>        struct sigaction sigact;
> >>
> >>        memset(&sigact, 0, sizeof(sigact));
> >>        sigact.sa_handler = dummy_signal;
> >>        sigaction(SIG_IPI, &sigact, NULL);
> >>
> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
> >> -    sigdelset(&set, SIG_IPI);
> >> -    pthread_sigmask(SIG_SETMASK, &set, NULL);
> >> +    pthread_sigmask(SIG_BLOCK, NULL, &cpu->hvf->sigmask);
> >> +    sigdelset(&cpu->hvf->sigmask, SIG_IPI);
> >> +    pthread_sigmask(SIG_SETMASK, &cpu->hvf->sigmask, NULL);
> >> +
> >> +    pthread_sigmask(SIG_BLOCK, NULL, &cpu->hvf->sigmask_ipi);
> >> +    sigaddset(&cpu->hvf->sigmask_ipi, SIG_IPI);
> > There's no reason to unblock SIG_IPI while not in pselect and it can
> > easily lead to missed wakeups. The whole point of pselect is so that
> > you can guarantee that only one part of your program sees signals
> > without a possibility of them being missed.
>
>
> Hm, I think I start to agree with you here :). We can probably just
> leave SIG_IPI masked at all times and only unmask on pselect. The worst
> thing that will happen is a premature wakeup if we did get an IPI
> incoming while hvf->sleeping is set, but were either not running
> pselect() yet and bailed out or already finished pselect() execution.

Ack.

> >
> >>    #ifdef __aarch64__
> >>        r = hv_vcpu_create(&cpu->hvf->fd, (hv_vcpu_exit_t
> >> **)&cpu->hvf->exit, NULL);
> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
> >> index c56baa3ae8..6e237f2db0 100644
> >> --- a/include/sysemu/hvf_int.h
> >> +++ b/include/sysemu/hvf_int.h
> >> @@ -62,8 +62,9 @@ extern HVFState *hvf_state;
> >>    struct hvf_vcpu_state {
> >>        uint64_t fd;
> >>        void *exit;
> >> -    struct timespec ts;
> >>        bool sleeping;
> >> +    sigset_t sigmask;
> >> +    sigset_t sigmask_ipi;
> >>    };
> >>
> >>    void assert_hvf_ok(hv_return_t ret);
> >> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> >> index 0c01a03725..350b845e6e 100644
> >> --- a/target/arm/hvf/hvf.c
> >> +++ b/target/arm/hvf/hvf.c
> >> @@ -320,20 +320,24 @@ int hvf_arch_init_vcpu(CPUState *cpu)
> >>
> >>    void hvf_kick_vcpu_thread(CPUState *cpu)
> >>    {
> >> -    if (cpu->hvf->sleeping) {
> >> -        /*
> >> -         * When sleeping, make sure we always send signals. Also, clear the
> >> -         * timespec, so that an IPI that arrives between setting
> >> hvf->sleeping
> >> -         * and the nanosleep syscall still aborts the sleep.
> >> -         */
> >> -        cpu->thread_kicked = false;
> >> -        cpu->hvf->ts = (struct timespec){ };
> >> +    if (qatomic_read(&cpu->hvf->sleeping)) {
> >> +        /* When sleeping, send a signal to get out of pselect */
> >>            cpus_kick_thread(cpu);
> >>        } else {
> >>            hv_vcpus_exit(&cpu->hvf->fd, 1);
> >>        }
> >>    }
> >>
> >> +static void hvf_block_sig_ipi(CPUState *cpu)
> >> +{
> >> +    pthread_sigmask(SIG_SETMASK, &cpu->hvf->sigmask_ipi, NULL);
> >> +}
> >> +
> >> +static void hvf_unblock_sig_ipi(CPUState *cpu)
> >> +{
> >> +    pthread_sigmask(SIG_SETMASK, &cpu->hvf->sigmask, NULL);
> >> +}
> >> +
> >>    static int hvf_inject_interrupts(CPUState *cpu)
> >>    {
> >>        if (cpu->interrupt_request & CPU_INTERRUPT_FIQ) {
> >> @@ -354,6 +358,7 @@ int hvf_vcpu_exec(CPUState *cpu)
> >>        ARMCPU *arm_cpu = ARM_CPU(cpu);
> >>        CPUARMState *env = &arm_cpu->env;
> >>        hv_vcpu_exit_t *hvf_exit = cpu->hvf->exit;
> >> +    const uint32_t irq_mask = CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ;
> >>        hv_return_t r;
> >>        int ret = 0;
> >>
> >> @@ -491,8 +496,8 @@ int hvf_vcpu_exec(CPUState *cpu)
> >>                break;
> >>            }
> >>            case EC_WFX_TRAP:
> >> -            if (!(syndrome & WFX_IS_WFE) && !(cpu->interrupt_request &
> >> -                (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> >> +            if (!(syndrome & WFX_IS_WFE) &&
> >> +                !(cpu->interrupt_request & irq_mask)) {
> >>                    uint64_t cval, ctl, val, diff, now;
> > I don't think the access to cpu->interrupt_request is safe because it
> > is done while not under the iothread lock. That's why to avoid these
> > types of issues I would prefer to hold the lock almost all of the
> > time.
>
>
> In this branch, that's not a problem yet. On stale values, we either
> don't sleep (which is ok), or we go into the sleep path, and reevaluate
> cpu->interrupt_request atomically again after setting hvf->sleeping.

Okay, this may be a "benign race" (and it may be helped a little by
the M1's sequential consistency extension) but this is the sort of
thing that I'd prefer not to rely on. At least it should be an atomic
read.

> >
> >>                    /* Set up a local timer for vtimer if necessary ... */
> >> @@ -515,9 +520,7 @@ int hvf_vcpu_exec(CPUState *cpu)
> >>
> >>                    if (diff < INT64_MAX) {
> >>                        uint64_t ns = diff * gt_cntfrq_period_ns(arm_cpu);
> >> -                    struct timespec *ts = &cpu->hvf->ts;
> >> -
> >> -                    *ts = (struct timespec){
> >> +                    struct timespec ts = {
> >>                            .tv_sec = ns / NANOSECONDS_PER_SECOND,
> >>                            .tv_nsec = ns % NANOSECONDS_PER_SECOND,
> >>                        };
> >> @@ -526,27 +529,31 @@ int hvf_vcpu_exec(CPUState *cpu)
> >>                         * Waking up easily takes 1ms, don't go to sleep
> >> for smaller
> >>                         * time periods than 2ms.
> >>                         */
> >> -                    if (!ts->tv_sec && (ts->tv_nsec < (SCALE_MS * 2))) {
> >> +                    if (!ts.tv_sec && (ts.tv_nsec < (SCALE_MS * 2))) {
> >>                            advance_pc = true;
> >>                            break;
> >>                        }
> >>
> >> +                    /* block SIG_IPI for the sleep */
> >> +                    hvf_block_sig_ipi(cpu);
> >> +                    cpu->thread_kicked = false;
> >> +
> >>                        /* Set cpu->hvf->sleeping so that we get a SIG_IPI
> >> signal. */
> >> -                    cpu->hvf->sleeping = true;
> >> -                    smp_mb();
> >> +                    qatomic_set(&cpu->hvf->sleeping, true);
> > This doesn't protect against races because another thread could call
> > kvf_vcpu_kick_thread() at any time between when we return from
> > hv_vcpu_run() and when we set sleeping = true and we would miss the
> > wakeup (due to kvf_vcpu_kick_thread() seeing sleeping = false and
> > calling hv_vcpus_exit() instead of pthread_kill()). I don't think it
> > can be fixed by setting sleeping to true earlier either because no
> > matter how early you move it, there will always be a window where we
> > are going to pselect() but sleeping is false, resulting in a missed
> > wakeup.
>
>
> I don't follow. If anyone was sending us an IPI, it's because they want
> to notify us about an update to cpu->interrupt_request, right? In that
> case, the atomic read of that field below will catch it and bail out of
> the sleep sequence.

I think there are other possible IPI reasons, e.g. set halted to 1,
I/O events. Now we could check for halted below and maybe some of the
others but the code will be subtle and it seems like a game of
whack-a-mole to get them all. This is an example of what I was talking
about when I said that an approach that relies on the sleeping field
will be difficult to get right. I would strongly prefer to start with
a simple approach and maybe we can consider a more complicated one
later.

Peter

>
>
> >
> > Peter
> >
> >> -                    /* Bail out if we received an IRQ meanwhile */
> >> -                    if (cpu->thread_kicked || (cpu->interrupt_request &
> >> -                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> >> -                        cpu->hvf->sleeping = false;
> >> +                    /* Bail out if we received a kick meanwhile */
> >> +                    if (qatomic_read(&cpu->interrupt_request) & irq_mask) {
> >> + qatomic_set(&cpu->hvf->sleeping, false);
>
>
> ^^^
>
>
> Alex
>
>
> >> +                        hvf_unblock_sig_ipi(cpu);
> >>                            break;
> >>                        }
> >>
> >> -                    /* nanosleep returns on signal, so we wake up on
> >> kick. */
> >> -                    nanosleep(ts, NULL);
> >> +                    /* pselect returns on kick signal and consumes it */
> >> +                    pselect(0, 0, 0, 0, &ts, &cpu->hvf->sigmask);
> >>
> >>                        /* Out of sleep - either naturally or because of a
> >> kick */
> >> -                    cpu->hvf->sleeping = false;
> >> +                    qatomic_set(&cpu->hvf->sleeping, false);
> >> +                    hvf_unblock_sig_ipi(cpu);
> >>                    }
> >>
> >>                    advance_pc = true;
> >>