From: Marc Zyngier <maz@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Huacai Chen <chenhuacai@kernel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
Paul Mackerras <paulus@ozlabs.org>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
David Hildenbrand <david@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org,
kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
linux-kernel@vger.kernel.org, David Matlack <dmatlack@google.com>,
Jing Zhang <jingzhangos@google.com>
Subject: Re: [PATCH 07/14] KVM: Don't block+unblock when halt-polling is successful
Date: Thu, 30 Sep 2021 10:36:19 +0100 [thread overview]
Message-ID: <87czoqtnfw.wl-maz@kernel.org> (raw)
In-Reply-To: <YVNA+KTbLrxGQ6ML@google.com>
On Tue, 28 Sep 2021 17:21:12 +0100,
Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Sep 28, 2021, Marc Zyngier wrote:
> > On Mon, 27 Sep 2021 18:28:14 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Sun, Sep 26, 2021, Marc Zyngier wrote:
> > > > On Sun, 26 Sep 2021 07:27:28 +0100,
> > > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > >
> > > > > On 25/09/21 11:50, Marc Zyngier wrote:
> > > > > >> there is no need for arm64 to put/load
> > > > > >> the vGIC as KVM hasn't relinquished control of the vCPU in any way.
> > > > > >
> > > > > > This doesn't mean that there is no requirement for any state
> > > > > > change. The put/load on GICv4 is crucial for performance, and the VMCR
> > > > > > resync is a correctness requirement.
> > >
> > > Ah crud, I didn't blame that code beforehand, I simply assumed
> > > kvm_arch_vcpu_blocking() was purely for the blocking/schedule()
> > > sequence. The comment in arm64's kvm_arch_vcpu_blocking() about
> > > kvm_arch_vcpu_runnable() makes more sense now too.
> > >
> > > > > I wouldn't even say it's crucial for performance: halt polling cannot
> > > > > work and is a waste of time without (the current implementation of)
> > > > > put/load.
> > > >
> > > > Not quite. A non-V{LPI,SGI} could still be used as the a wake-up from
> > > > WFI (which is the only reason we end-up on this path). Only LPIs (and
> > > > SGIs on GICv4.1) can be directly injected, meaning that SPIs and PPIs
> > > > still follow the standard SW injection model.
> > > >
> > > > However, there is still the ICH_VMCR_EL2 requirement (to get the
> > > > up-to-date priority mask and group enable bits) for SW-injected
> > > > interrupt wake-up to work correctly, and I really don't want to save
> > > > that one eagerly on each shallow exit.
> > >
> > > IIUC, VMCR is resident in hardware while the guest is running, and
> > > KVM needs to retrieve the VMCR when processing interrupts to
> > > determine if a interrupt is above the priority threshold. If that's
> > > the case, then IMO handling the VMCR via an arch hook is
> > > unnecessarily fragile, e.g. any generic call that leads to
> > > kvm_arch_vcpu_runnable() needs to know that arm64 lazily retrieves a
> > > guest register.
> >
> > Not quite. We only need to retrieve the VMCR if we are in a situation
> > where we need to trigger a wake-up from WFI at the point where we have
> > not done a vcpu_put() yet. All the other cases where the interrupt is
> > injected are managed by the HW. And the only case where
> > kvm_arch_vcpu_runnable() gets called is when blocking.
> >
> > I also don't get why a hook would be fragile, as long as it has well
> > defined semantics.
>
> Generic KVM should not have to know that a seemingly benign arch hook,
> kvm_arch_vcpu_runnable(), cannot be safely called without first calling another
> arch hook. E.g. I suspect there's a (benign?) race in kvm_vcpu_on_spin(). If
> the loop is delayed between checking rcuwait_active() and vcpu_dy_runnable(),
> and the target vCPU is awakened during that period, KVM can call
> kvm_arch_vcpu_runnable() while the vCPU is running.
Humph. Indeed, there is a potential gold-plated turd there.
> It's kind of a counter-example to my below suggestion as putting the vGIC would
> indeed lead to state corruption if the vCPU is running, but I would argue that
> arm64 should override kvm_arch_dy_runnable() so that its correctness is guaranteed,
> e.g. by not calling kvm_arch_vcpu_runnable() if the vCPU is already running.
I'll work something out for that case.
> > > A better approach for VMCR would be to retrieve the value from
> > > hardware on-demand, e.g. via a hook in vgic_get_vmcr(), so that it's all but
> > > impossible to have bugs where KVM is working with a stale VMCR, e.g.
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> > > index 48c6067fc5ec..0784de0c4080 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> > > @@ -828,6 +828,13 @@ void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> > >
> > > void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> > > {
> > > + if (!vcpu->...->vmcr_available) {
> > > + preempt_disable();
> > > + kvm_vgic_vmcr_sync(vcpu);
> > > + preempt_enable();
> > > + vcpu->...->vmcr_available = true;
> > > + }
> > > +
> >
> > But most of the uses of vgic_get_vmcr() are in contexts where the vcpu
> > isn't running at all (such as save/restore). It really only operates
> > on the shadow state, and what you have above will only lead to state
> > corruption.
>
> Ignoring the kvm_arch_dy_runnable() case for the moment, how would
> it lead to corruption? The idea is that the 'vmcr_available' flag
> would be cleared when the vCPU is run, i.e. it tracks whether or not
> the shadow state may be stale.
I guess that 'vmcr_available' would have to be initialised to 'true'
at vcpu reset time so that the userspace side cannot trigger a read
from the HW.
> > > if (kvm_vgic_global_state.type == VGIC_V2)
> > > vgic_v2_get_vmcr(vcpu, vmcr);
> > > else
> > >
> > >
> > > Regarding vGIC v4, does KVM require it to be resident in hardware
> > > while the vCPU is loaded?
> >
> > It is a requirement. Otherwise, we end-up with an inconsistent state
> > between the delivery of doorbells and the state of the vgic.
>
> For my own understanding, does KVM require it to be resident in
> hardware while the vCPU is loaded but _not_ running? What I don't
> fully understand is how KVM can safely load/put the vCPU if that
> true, i.e. wouldn't there always be a window for badness?
No, that part is fine. It is when you start running the vcpu without
the GICv4 context loaded that ugly stuff happens (get a doorbell that
tells you to schedule the currently running vcpu, for example).
>
> > Also, reloading the GICv4 state can be pretty expensive (multiple MMIO
> > accesses), which is why we really don't want to do that on the hot path
> > (kvm_arch_vcpu_ioctl_run() *is* a hot path).
>
> I wasn't suggesting to reload GICv4 on every entry, it would only be reloaded
> if it was put between vcpu_load() and entry to the guest.
>
> > > If not, then we could do something like
> > > this, which would eliminate the arch hooks entirely if the VMCR is
> > > handled as above.
>
> ...
>
> > > @@ -813,6 +787,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > > */
> > > preempt_disable();
> > >
> > > + /*
> > > + * Reload vGIC v4 if necessary, as it may be put on-demand so
> > > + * that KVM can detect directly injected interrupts, e.g. when
> > > + * determining if the vCPU is runnable due to a pending event.
> > > + */
> > > + vgic_v4_load(vcpu);
> >
> > You'd need to detect that a previous put has been done.
>
> Not that it will likely matter, but doesn't the its_vpe.resident
> check automatically handle this?
Sort of. I eventually want to get rid of this as it papers over all
sort of sins. I introduced it exactly because of the nesting that
vcpu_block triggers, but this is a bit of a layering violation between
KVM and the underlying GICv4 driver.
>
> > But overall, it puts the complexity at the wrong place. WFI (aka
> > kvm_vcpu_block) is the place where we want to handle this synchronisation,
> > and not the run loop.
> >
> > Instead of having a well defined interface with the blocking code
> > where we implement the required synchronisation, you spray the vgic
> > crap all over, and it becomes much harder to reason about it. Guess
> > what, I'm not keen on it.
>
> My objection to the arch hooks is that, from generic KVM's
> perspective, the direct dependency is not on blocking, it's on
> calling kvm_arch_vcpu_runnable(). That's why I suggested handling
> this by tracking whether or not the VMCR is up-to-date/stale, as it
> allows generic KVM to safely call kvm_arch_vcpu_runnable() whenever
> the vCPU is loaded.
>
> I don't have a strong opinion on arm64 preferring the sync to be
> specific to WFI, but if that's the case then IMO this should be
> handled fully in arm64, e.g. a patch like so (or with a wrapper
> around the call to kvm_vcpu_block() if we want to guard against
> future calls into generic KVM)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index fe102cd2e518..312f3acd3ca3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -367,27 +367,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>
> void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> {
> - /*
> - * If we're about to block (most likely because we've just hit a
> - * WFI), we need to sync back the state of the GIC CPU interface
> - * so that we have the latest PMR and group enables. This ensures
> - * that kvm_arch_vcpu_runnable has up-to-date data to decide
> - * whether we have pending interrupts.
> - *
> - * For the same reason, we want to tell GICv4 that we need
> - * doorbells to be signalled, should an interrupt become pending.
> - */
> - preempt_disable();
> - kvm_vgic_vmcr_sync(vcpu);
> - vgic_v4_put(vcpu, true);
> - preempt_enable();
> +
> }
>
> void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> {
> - preempt_disable();
> - vgic_v4_load(vcpu);
> - preempt_enable();
> +
> }
>
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 275a27368a04..9870e824a27e 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -95,8 +95,28 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> } else {
> trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
> vcpu->stat.wfi_exit_stat++;
> +
> + /*
> + * Sync back the state of the GIC CPU interface so that we have
> + * the latest PMR and group enables. This ensures that
> + * kvm_arch_vcpu_runnable has up-to-date data to decide whether
> + * we have pending interrupts, e.g. when determining if the
> + * vCPU should block.
> + *
> + * For the same reason, we want to tell GICv4 that we need
> + * doorbells to be signalled, should an interrupt become pending.
> + */
> + preempt_disable();
> + kvm_vgic_vmcr_sync(vcpu);
> + vgic_v4_put(vcpu, true);
> + preempt_enable();
> +
> kvm_vcpu_block(vcpu);
> kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> +
> + preempt_disable();
> + vgic_v4_load(vcpu);
> + preempt_enable();
> }
>
> kvm_incr_pc(vcpu);
I actually largely prefer this approach, which is massively more
readable than the current setup. Feel free to wrap that in your
series.
I'll also have a look at the vcpu_dy_runnable() asap.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2021-09-30 9:36 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-25 0:55 [PATCH 00/14] KVM: Halt-polling fixes, cleanups and a new stat Sean Christopherson
2021-09-25 0:55 ` [PATCH 01/14] KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU Sean Christopherson
2021-09-27 6:54 ` Christian Borntraeger
2021-09-25 0:55 ` [PATCH 02/14] KVM: Update halt-polling stats if and only if halt-polling was attempted Sean Christopherson
2021-09-28 18:57 ` David Matlack
2021-09-25 0:55 ` [PATCH 03/14] KVM: Refactor and document halt-polling stats update helper Sean Christopherson
2021-09-28 19:01 ` David Matlack
2021-09-25 0:55 ` [PATCH 04/14] KVM: Reconcile discrepancies in halt-polling stats Sean Christopherson
2021-09-28 21:26 ` David Matlack
2021-09-25 0:55 ` [PATCH 05/14] KVM: s390: Clear valid_wakeup in kvm_s390_handle_wait(), not in arch hook Sean Christopherson
2021-09-27 6:58 ` Christian Borntraeger
2021-09-25 0:55 ` [PATCH 06/14] KVM: Drop obsolete kvm_arch_vcpu_block_finish() Sean Christopherson
2021-09-27 6:58 ` Christian Borntraeger
2021-09-28 21:28 ` David Matlack
2021-09-25 0:55 ` [PATCH 07/14] KVM: Don't block+unblock when halt-polling is successful Sean Christopherson
2021-09-25 9:50 ` Marc Zyngier
2021-09-26 6:27 ` Paolo Bonzini
2021-09-26 9:02 ` Marc Zyngier
2021-09-27 17:28 ` Sean Christopherson
2021-09-28 9:24 ` Marc Zyngier
2021-09-28 16:21 ` Sean Christopherson
2021-09-30 9:36 ` Marc Zyngier [this message]
2021-09-25 0:55 ` [PATCH 08/14] KVM: x86: Tweak halt emulation helper names to free up kvm_vcpu_halt() Sean Christopherson
2021-09-28 21:59 ` David Matlack
2021-09-25 0:55 ` [PATCH 09/14] KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt() Sean Christopherson
2021-09-27 7:06 ` Christian Borntraeger
2021-09-28 22:01 ` David Matlack
2021-09-25 0:55 ` [PATCH 10/14] KVM: Split out a kvm_vcpu_block() helper from kvm_vcpu_halt() Sean Christopherson
2021-09-27 7:41 ` Christian Borntraeger
2021-09-28 22:03 ` David Matlack
2021-09-25 0:55 ` [PATCH 11/14] KVM: stats: Add stat to detect if vcpu is currently blocking Sean Christopherson
2021-09-28 22:04 ` David Matlack
2021-09-25 0:55 ` [PATCH 12/14] KVM: Don't redo ktime_get() when calculating halt-polling stop/deadline Sean Christopherson
2021-09-28 22:08 ` David Matlack
2021-09-25 0:55 ` [PATCH 13/14] KVM: x86: Directly block (instead of "halting") UNINITIALIZED vCPUs Sean Christopherson
2021-09-28 22:12 ` David Matlack
2021-09-25 0:55 ` [PATCH 14/14] KVM: x86: Invoke kvm_vcpu_block() directly for non-HALTED wait states Sean Christopherson
2021-09-28 22:14 ` David Matlack
2021-09-27 7:22 ` disabling halt polling broken? (was Re: [PATCH 00/14] KVM: Halt-polling fixes, cleanups and a new stat) Christian Borntraeger
2021-09-27 14:59 ` Sean Christopherson
2021-09-27 15:03 ` Paolo Bonzini
2021-09-27 15:15 ` Sean Christopherson
2021-09-27 15:16 ` Christian Borntraeger
2021-09-27 16:58 ` David Matlack
2021-09-29 6:56 ` Christian Borntraeger
2021-09-27 17:24 ` Paolo Bonzini
2021-09-27 17:33 ` Sean Christopherson
2022-11-15 3:28 ` wangyanan (Y)
2022-11-16 17:19 ` David Matlack
2022-11-18 2:29 ` wangyanan (Y)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87czoqtnfw.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=alexandru.elisei@arm.com \
--cc=borntraeger@de.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=james.morse@arm.com \
--cc=jingzhangos@google.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=paulus@ozlabs.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).