From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0F3EEC433FE
	for <linux-kernel@archiver.kernel.org>; Tue, 28 Sep 2021 16:21:21 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E343E61157
	for <linux-kernel@archiver.kernel.org>; Tue, 28 Sep 2021 16:21:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241801AbhI1QW6 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 28 Sep 2021 12:22:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50436 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241766AbhI1QW5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 28 Sep 2021 12:22:57 -0400
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12716C061749
        for <linux-kernel@vger.kernel.org>; Tue, 28 Sep 2021 09:21:17 -0700 (PDT)
Received: by mail-pf1-x435.google.com with SMTP id q23so19318178pfs.9
        for <linux-kernel@vger.kernel.org>; Tue, 28 Sep 2021 09:21:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
        b=rRG5+ibV0VfTSTicp1bhIGdnPJs2eKiLq2XzebWggCrmgs3veBEvfmHcXBiVrkvlhr
         pasvKyJuVD/tCR/gXD4BcHA9nsxqDt4qCNpCzJiytnVVYjPlnxLaPBgP5mSWdyI9kW+J
         i4LEEcGYlnHMbwfaCAnCjMwTclB4O68YC5H5sFscTb65HQuzCWDCisyhMm5ffdKTn7Ay
         lNZC2KYMygu6dBZSzwm9+EdohM3Suyj+qhy/hk/Z14ubImxH8k78+tqDOqBOz7l+yJ6f
         Txn/VQ5RX8HjLrgWFg6NC/blpkElmvDMFM+9G7EoTZP0AkKUWZyACsImEB5K+QlbXbhB
         mH+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
        b=dPjfv9K6oJfA9rytGP6+CWt9N8CYn+crDUeVgdC7JH/qczlx4FMHKxp0pfAojZUK21
         ChZx9CqxTJoudsYDAOLOKuJXOs7rJGYS+ZvhE821p89NbHO0HiQq/CRlX8ISBk8ygjaY
         Vmn9l08rsjKdHv54yHS8bormtysVEQxFx4iRWgMdu1X6rrqY1WosbgxqWFVCvk7LNxqU
         +tVTo/bwXWGjHw+zVHCkai+KGIKUs4IqklMnqThIhYDbw6eeOf0Z6gRKUVn58CK0EZ5N
         SrFHTULhAcq/JFnbrPU1oXWPoN/4JL/WUQYmwZVa4n4POK7pjEaNqWtMB7wEwU1h6bdm
         Y1LQ==
X-Gm-Message-State: AOAM533pjgYH/0OvBTloajtNuBE2Qklau6Llxaz1N6muwr6gDB4yBbfM
        fszTMAJig7Y6vyX0gzJY67hqSg==
X-Google-Smtp-Source: ABdhPJwvRKMgNcMAnByHEA4jwqcNi7NdpV8g3FIwIP1nCr7TIZIvl4OYYda797EjrwyZo4NDgJC3GA==
X-Received: by 2002:a63:2d02:: with SMTP id t2mr5392663pgt.1.1632846076184;
        Tue, 28 Sep 2021 09:21:16 -0700 (PDT)
Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157])
        by smtp.gmail.com with ESMTPSA id p9sm3038231pfo.153.2021.09.28.09.21.15
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 28 Sep 2021 09:21:15 -0700 (PDT)
Date:   Tue, 28 Sep 2021 16:21:12 +0000
From:   Sean Christopherson <seanjc@google.com>
To:     Marc Zyngier <maz@kernel.org>
Cc:     Paolo Bonzini <pbonzini@redhat.com>,
        Huacai Chen <chenhuacai@kernel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
        Paul Mackerras <paulus@ozlabs.org>,
        Christian Borntraeger <borntraeger@de.ibm.com>,
        Janosch Frank <frankja@linux.ibm.com>,
        James Morse <james.morse@arm.com>,
        Alexandru Elisei <alexandru.elisei@arm.com>,
        Suzuki K Poulose <suzuki.poulose@arm.com>,
        David Hildenbrand <david@redhat.com>,
        Cornelia Huck <cohuck@redhat.com>,
        Claudio Imbrenda <imbrenda@linux.ibm.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
        linux-mips@vger.kernel.org, kvm@vger.kernel.org,
        kvm-ppc@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Jing Zhang <jingzhangos@google.com>
Subject: Re: [PATCH 07/14] KVM: Don't block+unblock when halt-polling is
 successful
Message-ID: <YVNA+KTbLrxGQ6ML@google.com>
References: <20210925005528.1145584-1-seanjc@google.com>
 <20210925005528.1145584-8-seanjc@google.com>
 <878rzlass2.wl-maz@kernel.org>
 <80d90ee6-0d43-3735-5c26-be8c3d72d493@redhat.com>
 <877df3btgb.wl-maz@kernel.org>
 <YVH/LjCqk/9PfDHn@google.com>
 <87o88dt5m5.wl-maz@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87o88dt5m5.wl-maz@kernel.org>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Sep 28, 2021, Marc Zyngier wrote:
> On Mon, 27 Sep 2021 18:28:14 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Sun, Sep 26, 2021, Marc Zyngier wrote:
> > > On Sun, 26 Sep 2021 07:27:28 +0100,
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > 
> > > > On 25/09/21 11:50, Marc Zyngier wrote:
> > > > >> there is no need for arm64 to put/load
> > > > >> the vGIC as KVM hasn't relinquished control of the vCPU in any way.
> > > > > 
> > > > > This doesn't mean that there is no requirement for any state
> > > > > change. The put/load on GICv4 is crucial for performance, and the VMCR
> > > > > resync is a correctness requirement.
> > 
> > Ah crud, I didn't blame that code beforehand, I simply assumed
> > kvm_arch_vcpu_blocking() was purely for the blocking/schedule()
> > sequence.  The comment in arm64's kvm_arch_vcpu_blocking() about
> > kvm_arch_vcpu_runnable() makes more sense now too.
> > 
> > > > I wouldn't even say it's crucial for performance: halt polling cannot
> > > > work and is a waste of time without (the current implementation of)
> > > > put/load.
> > > 
> > > Not quite. A non-V{LPI,SGI} could still be used as the a wake-up from
> > > WFI (which is the only reason we end-up on this path). Only LPIs (and
> > > SGIs on GICv4.1) can be directly injected, meaning that SPIs and PPIs
> > > still follow the standard SW injection model.
> > > 
> > > However, there is still the ICH_VMCR_EL2 requirement (to get the
> > > up-to-date priority mask and group enable bits) for SW-injected
> > > interrupt wake-up to work correctly, and I really don't want to save
> > > that one eagerly on each shallow exit.
> > 
> > IIUC, VMCR is resident in hardware while the guest is running, and
> > KVM needs to retrieve the VMCR when processing interrupts to
> > determine if a interrupt is above the priority threshold.  If that's
> > the case, then IMO handling the VMCR via an arch hook is
> > unnecessarily fragile, e.g. any generic call that leads to
> > kvm_arch_vcpu_runnable() needs to know that arm64 lazily retrieves a
> > guest register.
> 
> Not quite. We only need to retrieve the VMCR if we are in a situation
> where we need to trigger a wake-up from WFI at the point where we have
> not done a vcpu_put() yet. All the other cases where the interrupt is
> injected are managed by the HW. And the only case where
> kvm_arch_vcpu_runnable() gets called is when blocking.
> 
> I also don't get why a hook would be fragile, as long as it has well
> defined semantics.

Generic KVM should not have to know that a seemingly benign arch hook,
kvm_arch_vcpu_runnable(), cannot be safely called without first calling another
arch hook.  E.g. I suspect there's a (benign?) race in kvm_vcpu_on_spin().  If
the loop is delayed between checking rcuwait_active() and vcpu_dy_runnable(),
and the target vCPU is awakened during that period, KVM can call
kvm_arch_vcpu_runnable() while the vCPU is running.

It's kind of a counter-example to my below suggestion as putting the vGIC would
indeed lead to state corruption if the vCPU is running, but I would argue that
arm64 should override kvm_arch_dy_runnable() so that its correctness is guaranteed,
e.g. by not calling kvm_arch_vcpu_runnable() if the vCPU is already running.

> > A better approach for VMCR would be to retrieve the value from
> > hardware on-demand, e.g. via a hook in vgic_get_vmcr(), so that it's all but
> > impossible to have bugs where KVM is working with a stale VMCR, e.g.
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> > index 48c6067fc5ec..0784de0c4080 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> > @@ -828,6 +828,13 @@ void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  
> >  void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  {
> > +       if (!vcpu->...->vmcr_available) {
> > +               preempt_disable();
> > +               kvm_vgic_vmcr_sync(vcpu);
> > +               preempt_enable();
> > +               vcpu->...->vmcr_available = true;
> > +       }
> > +
> 
> But most of the uses of vgic_get_vmcr() are in contexts where the vcpu
> isn't running at all (such as save/restore). It really only operates
> on the shadow state, and what you have above will only lead to state
> corruption.

Ignoring the kvm_arch_dy_runnable() case for the moment, how would it lead to
corruption?  The idea is that the 'vmcr_available' flag would be cleared when the
vCPU is run, i.e. it tracks whether or not the shadow state may be stale.

> >         if (kvm_vgic_global_state.type == VGIC_V2)
> >                 vgic_v2_get_vmcr(vcpu, vmcr);
> >         else
> > 
> > 
> > Regarding vGIC v4, does KVM require it to be resident in hardware
> > while the vCPU is loaded?
> 
> It is a requirement. Otherwise, we end-up with an inconsistent state
> between the delivery of doorbells and the state of the vgic.

For my own understanding, does KVM require it to be resident in hardware while
the vCPU is loaded but _not_ running?  What I don't fully understand is how KVM
can safely load/put the vCPU if that true, i.e. wouldn't there always be a window
for badness?

> Also, reloading the GICv4 state can be pretty expensive (multiple MMIO
> accesses), which is why we really don't want to do that on the hot path
> (kvm_arch_vcpu_ioctl_run() *is* a hot path).

I wasn't suggesting to reload GICv4 on every entry, it would only be reloaded
if it was put between vcpu_load() and entry to the guest.

> > If not, then we could do something like
> > this, which would eliminate the arch hooks entirely if the VMCR is
> > handled as above.

...

> > @@ -813,6 +787,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >                  */
> >                 preempt_disable();
> > 
> > +               /*
> > +                * Reload vGIC v4 if necessary, as it may be put on-demand so
> > +                * that KVM can detect directly injected interrupts, e.g. when
> > +                * determining if the vCPU is runnable due to a pending event.
> > +                */
> > +               vgic_v4_load(vcpu);
> 
> You'd need to detect that a previous put has been done.

Not that it will likely matter, but doesn't the its_vpe.resident check automatically
handle this?

> But overall, it puts the complexity at the wrong place. WFI (aka
> kvm_vcpu_block) is the place where we want to handle this synchronisation,
> and not the run loop.
> 
> Instead of having a well defined interface with the blocking code
> where we implement the required synchronisation, you spray the vgic
> crap all over, and it becomes much harder to reason about it. Guess
> what, I'm not keen on it.

My objection to the arch hooks is that, from generic KVM's perspective, the
direct dependency is not on blocking, it's on calling kvm_arch_vcpu_runnable().
That's why I suggested handling this by tracking whether or not the VMCR is
up-to-date/stale, as it allows generic KVM to safely call kvm_arch_vcpu_runnable()
whenever the vCPU is loaded.

I don't have a strong opinion on arm64 preferring the sync to be specific to
WFI, but if that's the case then IMO this should be handled fully in arm64, e.g.
a patch like so (or with a wrapper around the call to kvm_vcpu_block() if we
want to guard against future calls into generic KVM)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..312f3acd3ca3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -367,27 +367,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)

 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
-       /*
-        * If we're about to block (most likely because we've just hit a
-        * WFI), we need to sync back the state of the GIC CPU interface
-        * so that we have the latest PMR and group enables. This ensures
-        * that kvm_arch_vcpu_runnable has up-to-date data to decide
-        * whether we have pending interrupts.
-        *
-        * For the same reason, we want to tell GICv4 that we need
-        * doorbells to be signalled, should an interrupt become pending.
-        */
-       preempt_disable();
-       kvm_vgic_vmcr_sync(vcpu);
-       vgic_v4_put(vcpu, true);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 {
-       preempt_disable();
-       vgic_v4_load(vcpu);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 275a27368a04..9870e824a27e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -95,8 +95,28 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
        } else {
                trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
                vcpu->stat.wfi_exit_stat++;
+
+               /*
+                * Sync back the state of the GIC CPU interface so that we have
+                * the latest PMR and group enables. This ensures that
+                * kvm_arch_vcpu_runnable has up-to-date data to decide whether
+                * we have pending interrupts, e.g. when determining if the
+                * vCPU should block.
+                *
+                * For the same reason, we want to tell GICv4 that we need
+                * doorbells to be signalled, should an interrupt become pending.
+                */
+               preempt_disable();
+               kvm_vgic_vmcr_sync(vcpu);
+               vgic_v4_put(vcpu, true);
+               preempt_enable();
+
                kvm_vcpu_block(vcpu);
                kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+
+               preempt_disable();
+               vgic_v4_load(vcpu);
+               preempt_enable();
        }

        kvm_incr_pc(vcpu);


From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Tw+U=OS=lists.cs.columbia.edu=kvmarm-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0CA1C433EF
	for <kvmarm@archiver.kernel.org>; Tue, 28 Sep 2021 16:21:22 +0000 (UTC)
Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253])
	by mail.kernel.org (Postfix) with ESMTP id 5D2566124B
	for <kvmarm@archiver.kernel.org>; Tue, 28 Sep 2021 16:21:22 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5D2566124B
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.cs.columbia.edu
Received: from localhost (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id BA7614B10D;
	Tue, 28 Sep 2021 12:21:21 -0400 (EDT)
X-Virus-Scanned: at lists.cs.columbia.edu
Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail
	(fail, message has been altered) header.i=@google.com
Received: from mm01.cs.columbia.edu ([127.0.0.1])
	by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id H9BgRnHF4OsV; Tue, 28 Sep 2021 12:21:20 -0400 (EDT)
Received: from mm01.cs.columbia.edu (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id 271684B0EC;
	Tue, 28 Sep 2021 12:21:20 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
 by mm01.cs.columbia.edu (Postfix) with ESMTP id 25F8E4B0DF
 for <kvmarm@lists.cs.columbia.edu>; Tue, 28 Sep 2021 12:21:19 -0400 (EDT)
X-Virus-Scanned: at lists.cs.columbia.edu
Received: from mm01.cs.columbia.edu ([127.0.0.1])
 by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id y9UtAFAA3Vgm for <kvmarm@lists.cs.columbia.edu>;
 Tue, 28 Sep 2021 12:21:17 -0400 (EDT)
Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com
 [209.85.215.178])
 by mm01.cs.columbia.edu (Postfix) with ESMTPS id 7AA13407F1
 for <kvmarm@lists.cs.columbia.edu>; Tue, 28 Sep 2021 12:21:17 -0400 (EDT)
Received: by mail-pg1-f178.google.com with SMTP id s75so5206256pgs.5
 for <kvmarm@lists.cs.columbia.edu>; Tue, 28 Sep 2021 09:21:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
 b=rRG5+ibV0VfTSTicp1bhIGdnPJs2eKiLq2XzebWggCrmgs3veBEvfmHcXBiVrkvlhr
 pasvKyJuVD/tCR/gXD4BcHA9nsxqDt4qCNpCzJiytnVVYjPlnxLaPBgP5mSWdyI9kW+J
 i4LEEcGYlnHMbwfaCAnCjMwTclB4O68YC5H5sFscTb65HQuzCWDCisyhMm5ffdKTn7Ay
 lNZC2KYMygu6dBZSzwm9+EdohM3Suyj+qhy/hk/Z14ubImxH8k78+tqDOqBOz7l+yJ6f
 Txn/VQ5RX8HjLrgWFg6NC/blpkElmvDMFM+9G7EoTZP0AkKUWZyACsImEB5K+QlbXbhB
 mH+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
 b=mCg9hy8+X17b/Q1/FGFY/7HEBn8eZBBYkToQz4LrTecz89m76IUaPUjdfbcqBlxY/I
 ksJxeWkeyL/adOZ2DwjwH0qkQzX/Unh6nXgerPsqgsplnmm4uFc38ABZ682wVqw4xNXw
 ebWq+7ZIicjCrzpTg9aO12d8H0etvpO0GgLz4DcKRLWH2hKlgYkx6gE0CAF2eca8YnTH
 uluGZbLm78YYAFDzfLCr++BVLgujYU0uVuF0uvUh/I1fEQAId3WsFkthiIsjlWO8dVE7
 J4xgnUMsU9OARnVyZ+G2/HJd42QoNFE1YZ08HQrbp2I7zcupkA55oI5wBzxUwTJ2J8ub
 eBTQ==
X-Gm-Message-State: AOAM531aZ7Tfhi4ZkZ6h4Jw3mkRj88vYBfLkwvtR3kElYPe5LP7C2dbh
 Nx/0ZP7cihRfAum//w98AbJ2mA==
X-Google-Smtp-Source: ABdhPJwvRKMgNcMAnByHEA4jwqcNi7NdpV8g3FIwIP1nCr7TIZIvl4OYYda797EjrwyZo4NDgJC3GA==
X-Received: by 2002:a63:2d02:: with SMTP id t2mr5392663pgt.1.1632846076184;
 Tue, 28 Sep 2021 09:21:16 -0700 (PDT)
Received: from google.com (157.214.185.35.bc.googleusercontent.com.
 [35.185.214.157])
 by smtp.gmail.com with ESMTPSA id p9sm3038231pfo.153.2021.09.28.09.21.15
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 28 Sep 2021 09:21:15 -0700 (PDT)
Date: Tue, 28 Sep 2021 16:21:12 +0000
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH 07/14] KVM: Don't block+unblock when halt-polling is
 successful
Message-ID: <YVNA+KTbLrxGQ6ML@google.com>
References: <20210925005528.1145584-1-seanjc@google.com>
 <20210925005528.1145584-8-seanjc@google.com>
 <878rzlass2.wl-maz@kernel.org>
 <80d90ee6-0d43-3735-5c26-be8c3d72d493@redhat.com>
 <877df3btgb.wl-maz@kernel.org> <YVH/LjCqk/9PfDHn@google.com>
 <87o88dt5m5.wl-maz@kernel.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <87o88dt5m5.wl-maz@kernel.org>
Cc: Wanpeng Li <wanpengli@tencent.com>, kvm@vger.kernel.org,
 David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org,
 Paul Mackerras <paulus@ozlabs.org>, Claudio Imbrenda <imbrenda@linux.ibm.com>,
 kvmarm@lists.cs.columbia.edu, Janosch Frank <frankja@linux.ibm.com>,
 Joerg Roedel <joro@8bytes.org>, Huacai Chen <chenhuacai@kernel.org>,
 Christian Borntraeger <borntraeger@de.ibm.com>,
 Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, kvm-ppc@vger.kernel.org,
 David Matlack <dmatlack@google.com>, linux-arm-kernel@lists.infradead.org,
 Jim Mattson <jmattson@google.com>, Cornelia Huck <cohuck@redhat.com>,
 linux-mips@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
 Vitaly Kuznetsov <vkuznets@redhat.com>
X-BeenThere: kvmarm@lists.cs.columbia.edu
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Where KVM/ARM decisions are made <kvmarm.lists.cs.columbia.edu>
List-Unsubscribe: <https://lists.cs.columbia.edu/mailman/options/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=unsubscribe>
List-Archive: <https://lists.cs.columbia.edu/pipermail/kvmarm>
List-Post: <mailto:kvmarm@lists.cs.columbia.edu>
List-Help: <mailto:kvmarm-request@lists.cs.columbia.edu?subject=help>
List-Subscribe: <https://lists.cs.columbia.edu/mailman/listinfo/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: kvmarm-bounces@lists.cs.columbia.edu
Sender: kvmarm-bounces@lists.cs.columbia.edu

On Tue, Sep 28, 2021, Marc Zyngier wrote:
> On Mon, 27 Sep 2021 18:28:14 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Sun, Sep 26, 2021, Marc Zyngier wrote:
> > > On Sun, 26 Sep 2021 07:27:28 +0100,
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > 
> > > > On 25/09/21 11:50, Marc Zyngier wrote:
> > > > >> there is no need for arm64 to put/load
> > > > >> the vGIC as KVM hasn't relinquished control of the vCPU in any way.
> > > > > 
> > > > > This doesn't mean that there is no requirement for any state
> > > > > change. The put/load on GICv4 is crucial for performance, and the VMCR
> > > > > resync is a correctness requirement.
> > 
> > Ah crud, I didn't blame that code beforehand, I simply assumed
> > kvm_arch_vcpu_blocking() was purely for the blocking/schedule()
> > sequence.  The comment in arm64's kvm_arch_vcpu_blocking() about
> > kvm_arch_vcpu_runnable() makes more sense now too.
> > 
> > > > I wouldn't even say it's crucial for performance: halt polling cannot
> > > > work and is a waste of time without (the current implementation of)
> > > > put/load.
> > > 
> > > Not quite. A non-V{LPI,SGI} could still be used as the a wake-up from
> > > WFI (which is the only reason we end-up on this path). Only LPIs (and
> > > SGIs on GICv4.1) can be directly injected, meaning that SPIs and PPIs
> > > still follow the standard SW injection model.
> > > 
> > > However, there is still the ICH_VMCR_EL2 requirement (to get the
> > > up-to-date priority mask and group enable bits) for SW-injected
> > > interrupt wake-up to work correctly, and I really don't want to save
> > > that one eagerly on each shallow exit.
> > 
> > IIUC, VMCR is resident in hardware while the guest is running, and
> > KVM needs to retrieve the VMCR when processing interrupts to
> > determine if a interrupt is above the priority threshold.  If that's
> > the case, then IMO handling the VMCR via an arch hook is
> > unnecessarily fragile, e.g. any generic call that leads to
> > kvm_arch_vcpu_runnable() needs to know that arm64 lazily retrieves a
> > guest register.
> 
> Not quite. We only need to retrieve the VMCR if we are in a situation
> where we need to trigger a wake-up from WFI at the point where we have
> not done a vcpu_put() yet. All the other cases where the interrupt is
> injected are managed by the HW. And the only case where
> kvm_arch_vcpu_runnable() gets called is when blocking.
> 
> I also don't get why a hook would be fragile, as long as it has well
> defined semantics.

Generic KVM should not have to know that a seemingly benign arch hook,
kvm_arch_vcpu_runnable(), cannot be safely called without first calling another
arch hook.  E.g. I suspect there's a (benign?) race in kvm_vcpu_on_spin().  If
the loop is delayed between checking rcuwait_active() and vcpu_dy_runnable(),
and the target vCPU is awakened during that period, KVM can call
kvm_arch_vcpu_runnable() while the vCPU is running.

It's kind of a counter-example to my below suggestion as putting the vGIC would
indeed lead to state corruption if the vCPU is running, but I would argue that
arm64 should override kvm_arch_dy_runnable() so that its correctness is guaranteed,
e.g. by not calling kvm_arch_vcpu_runnable() if the vCPU is already running.

> > A better approach for VMCR would be to retrieve the value from
> > hardware on-demand, e.g. via a hook in vgic_get_vmcr(), so that it's all but
> > impossible to have bugs where KVM is working with a stale VMCR, e.g.
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> > index 48c6067fc5ec..0784de0c4080 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> > @@ -828,6 +828,13 @@ void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  
> >  void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  {
> > +       if (!vcpu->...->vmcr_available) {
> > +               preempt_disable();
> > +               kvm_vgic_vmcr_sync(vcpu);
> > +               preempt_enable();
> > +               vcpu->...->vmcr_available = true;
> > +       }
> > +
> 
> But most of the uses of vgic_get_vmcr() are in contexts where the vcpu
> isn't running at all (such as save/restore). It really only operates
> on the shadow state, and what you have above will only lead to state
> corruption.

Ignoring the kvm_arch_dy_runnable() case for the moment, how would it lead to
corruption?  The idea is that the 'vmcr_available' flag would be cleared when the
vCPU is run, i.e. it tracks whether or not the shadow state may be stale.

> >         if (kvm_vgic_global_state.type == VGIC_V2)
> >                 vgic_v2_get_vmcr(vcpu, vmcr);
> >         else
> > 
> > 
> > Regarding vGIC v4, does KVM require it to be resident in hardware
> > while the vCPU is loaded?
> 
> It is a requirement. Otherwise, we end-up with an inconsistent state
> between the delivery of doorbells and the state of the vgic.

For my own understanding, does KVM require it to be resident in hardware while
the vCPU is loaded but _not_ running?  What I don't fully understand is how KVM
can safely load/put the vCPU if that true, i.e. wouldn't there always be a window
for badness?

> Also, reloading the GICv4 state can be pretty expensive (multiple MMIO
> accesses), which is why we really don't want to do that on the hot path
> (kvm_arch_vcpu_ioctl_run() *is* a hot path).

I wasn't suggesting to reload GICv4 on every entry, it would only be reloaded
if it was put between vcpu_load() and entry to the guest.

> > If not, then we could do something like
> > this, which would eliminate the arch hooks entirely if the VMCR is
> > handled as above.

...

> > @@ -813,6 +787,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >                  */
> >                 preempt_disable();
> > 
> > +               /*
> > +                * Reload vGIC v4 if necessary, as it may be put on-demand so
> > +                * that KVM can detect directly injected interrupts, e.g. when
> > +                * determining if the vCPU is runnable due to a pending event.
> > +                */
> > +               vgic_v4_load(vcpu);
> 
> You'd need to detect that a previous put has been done.

Not that it will likely matter, but doesn't the its_vpe.resident check automatically
handle this?

> But overall, it puts the complexity at the wrong place. WFI (aka
> kvm_vcpu_block) is the place where we want to handle this synchronisation,
> and not the run loop.
> 
> Instead of having a well defined interface with the blocking code
> where we implement the required synchronisation, you spray the vgic
> crap all over, and it becomes much harder to reason about it. Guess
> what, I'm not keen on it.

My objection to the arch hooks is that, from generic KVM's perspective, the
direct dependency is not on blocking, it's on calling kvm_arch_vcpu_runnable().
That's why I suggested handling this by tracking whether or not the VMCR is
up-to-date/stale, as it allows generic KVM to safely call kvm_arch_vcpu_runnable()
whenever the vCPU is loaded.

I don't have a strong opinion on arm64 preferring the sync to be specific to
WFI, but if that's the case then IMO this should be handled fully in arm64, e.g.
a patch like so (or with a wrapper around the call to kvm_vcpu_block() if we
want to guard against future calls into generic KVM)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..312f3acd3ca3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -367,27 +367,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)

 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
-       /*
-        * If we're about to block (most likely because we've just hit a
-        * WFI), we need to sync back the state of the GIC CPU interface
-        * so that we have the latest PMR and group enables. This ensures
-        * that kvm_arch_vcpu_runnable has up-to-date data to decide
-        * whether we have pending interrupts.
-        *
-        * For the same reason, we want to tell GICv4 that we need
-        * doorbells to be signalled, should an interrupt become pending.
-        */
-       preempt_disable();
-       kvm_vgic_vmcr_sync(vcpu);
-       vgic_v4_put(vcpu, true);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 {
-       preempt_disable();
-       vgic_v4_load(vcpu);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 275a27368a04..9870e824a27e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -95,8 +95,28 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
        } else {
                trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
                vcpu->stat.wfi_exit_stat++;
+
+               /*
+                * Sync back the state of the GIC CPU interface so that we have
+                * the latest PMR and group enables. This ensures that
+                * kvm_arch_vcpu_runnable has up-to-date data to decide whether
+                * we have pending interrupts, e.g. when determining if the
+                * vCPU should block.
+                *
+                * For the same reason, we want to tell GICv4 that we need
+                * doorbells to be signalled, should an interrupt become pending.
+                */
+               preempt_disable();
+               kvm_vgic_vmcr_sync(vcpu);
+               vgic_v4_put(vcpu, true);
+               preempt_enable();
+
                kvm_vcpu_block(vcpu);
                kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+
+               preempt_disable();
+               vgic_v4_load(vcpu);
+               preempt_enable();
        }

        kvm_incr_pc(vcpu);


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=7ONh=OS=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 824D5C433F5
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 28 Sep 2021 16:23:31 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 4FD0060462
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 28 Sep 2021 16:23:31 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4FD0060462
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=OszdmbvS5bfDmtWMdBHAaA26CGlqAEpEnNIzBEza1M0=; b=4tNWy40XMKAk3Q
	EkUvJI+NfniHR/LuNlOTXgTd4DH8GZ5jWux8+QCDvH0yzTGBDLNCX8xQi+/0KMQOW9A/XJq6oCOA/
	DFZ6IClVe7QqCMeNgdQUJ2F96ZMG5bofkS86k2I2yf6n57lZ6IDsWOFsZvjYZXf+tgjqAAPeDMKkN
	m4E77fpGIwJ7clpF9XNpAh9VM+KCWz/AWwR+y6/S2h5ALQamNV1LdgriAMYFpne1eOlONOiFrIphS
	UxPjNTOrixtFWcNubXLApMyAVxa7NIzbHUsTP6HHCgpeQgFh0k9qNg+Ncc/0Io/IXto0Ji6DgaeAJ
	MQW+5SNAveCscKYFgoCQ==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1mVFrB-0082kC-IT; Tue, 28 Sep 2021 16:21:25 +0000
Received: from mail-pg1-x52d.google.com ([2607:f8b0:4864:20::52d])
 by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux))
 id 1mVFr6-0082iT-Sa
 for linux-arm-kernel@lists.infradead.org; Tue, 28 Sep 2021 16:21:22 +0000
Received: by mail-pg1-x52d.google.com with SMTP id h3so21704366pgb.7
 for <linux-arm-kernel@lists.infradead.org>;
 Tue, 28 Sep 2021 09:21:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
 b=rRG5+ibV0VfTSTicp1bhIGdnPJs2eKiLq2XzebWggCrmgs3veBEvfmHcXBiVrkvlhr
 pasvKyJuVD/tCR/gXD4BcHA9nsxqDt4qCNpCzJiytnVVYjPlnxLaPBgP5mSWdyI9kW+J
 i4LEEcGYlnHMbwfaCAnCjMwTclB4O68YC5H5sFscTb65HQuzCWDCisyhMm5ffdKTn7Ay
 lNZC2KYMygu6dBZSzwm9+EdohM3Suyj+qhy/hk/Z14ubImxH8k78+tqDOqBOz7l+yJ6f
 Txn/VQ5RX8HjLrgWFg6NC/blpkElmvDMFM+9G7EoTZP0AkKUWZyACsImEB5K+QlbXbhB
 mH+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=4W1FddI98pFlnr+A0lvtfqj/bE0RlEbF+OZi5QyXfNQ=;
 b=4JzLgLphLUus5L3hkL45uKKmesoygq4nsrhnHwZhgxY23s9MZ2DOAbM9Y25BFkUO9j
 6eralwOzgTnS1SEZnfzWHsegIDYrC3RThZ/x9X4hdcNqnlJFo5SK8vtCnlZvdOgyJrIS
 G3tsixe4i0JdpD/sX6kCsJdgTeBH9Db375IrleupJTnnM9/ALy04wBGUkOW+fQWs2OGV
 Gl9pGdnPf9EUy5MKIy5Dl8ygaPvO5LFs0ZuDNuhs+Xlmo2IoEsATc/6Zef/FhlTaEvaf
 ChZa3jSEq7EQVIChZBrUbFCWWO+E/b0FOjGY9Zgs1CU1fovY8KXpr8U1IOITY4Wi29vq
 iRxw==
X-Gm-Message-State: AOAM532r702osvKyorC1G0nbkcer+cbhJ9sgsjapSAmdJQ36dde7NO1t
 AfEowAAoKwvWsYUJ0WYFx9vvvw==
X-Google-Smtp-Source: ABdhPJwvRKMgNcMAnByHEA4jwqcNi7NdpV8g3FIwIP1nCr7TIZIvl4OYYda797EjrwyZo4NDgJC3GA==
X-Received: by 2002:a63:2d02:: with SMTP id t2mr5392663pgt.1.1632846076184;
 Tue, 28 Sep 2021 09:21:16 -0700 (PDT)
Received: from google.com (157.214.185.35.bc.googleusercontent.com.
 [35.185.214.157])
 by smtp.gmail.com with ESMTPSA id p9sm3038231pfo.153.2021.09.28.09.21.15
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 28 Sep 2021 09:21:15 -0700 (PDT)
Date: Tue, 28 Sep 2021 16:21:12 +0000
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Huacai Chen <chenhuacai@kernel.org>,
 Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
 Paul Mackerras <paulus@ozlabs.org>,
 Christian Borntraeger <borntraeger@de.ibm.com>,
 Janosch Frank <frankja@linux.ibm.com>, James Morse <james.morse@arm.com>,
 Alexandru Elisei <alexandru.elisei@arm.com>,
 Suzuki K Poulose <suzuki.poulose@arm.com>,
 David Hildenbrand <david@redhat.com>, Cornelia Huck <cohuck@redhat.com>,
 Claudio Imbrenda <imbrenda@linux.ibm.com>,
 Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>,
 Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
 linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
 linux-mips@vger.kernel.org, kvm@vger.kernel.org,
 kvm-ppc@vger.kernel.org, linux-kernel@vger.kernel.org,
 David Matlack <dmatlack@google.com>, Jing Zhang <jingzhangos@google.com>
Subject: Re: [PATCH 07/14] KVM: Don't block+unblock when halt-polling is
 successful
Message-ID: <YVNA+KTbLrxGQ6ML@google.com>
References: <20210925005528.1145584-1-seanjc@google.com>
 <20210925005528.1145584-8-seanjc@google.com>
 <878rzlass2.wl-maz@kernel.org>
 <80d90ee6-0d43-3735-5c26-be8c3d72d493@redhat.com>
 <877df3btgb.wl-maz@kernel.org> <YVH/LjCqk/9PfDHn@google.com>
 <87o88dt5m5.wl-maz@kernel.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <87o88dt5m5.wl-maz@kernel.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210928_092120_979782_4B300CFD 
X-CRM114-Status: GOOD (  57.35  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, Sep 28, 2021, Marc Zyngier wrote:
> On Mon, 27 Sep 2021 18:28:14 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Sun, Sep 26, 2021, Marc Zyngier wrote:
> > > On Sun, 26 Sep 2021 07:27:28 +0100,
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > 
> > > > On 25/09/21 11:50, Marc Zyngier wrote:
> > > > >> there is no need for arm64 to put/load
> > > > >> the vGIC as KVM hasn't relinquished control of the vCPU in any way.
> > > > > 
> > > > > This doesn't mean that there is no requirement for any state
> > > > > change. The put/load on GICv4 is crucial for performance, and the VMCR
> > > > > resync is a correctness requirement.
> > 
> > Ah crud, I didn't blame that code beforehand, I simply assumed
> > kvm_arch_vcpu_blocking() was purely for the blocking/schedule()
> > sequence.  The comment in arm64's kvm_arch_vcpu_blocking() about
> > kvm_arch_vcpu_runnable() makes more sense now too.
> > 
> > > > I wouldn't even say it's crucial for performance: halt polling cannot
> > > > work and is a waste of time without (the current implementation of)
> > > > put/load.
> > > 
> > > Not quite. A non-V{LPI,SGI} could still be used as the a wake-up from
> > > WFI (which is the only reason we end-up on this path). Only LPIs (and
> > > SGIs on GICv4.1) can be directly injected, meaning that SPIs and PPIs
> > > still follow the standard SW injection model.
> > > 
> > > However, there is still the ICH_VMCR_EL2 requirement (to get the
> > > up-to-date priority mask and group enable bits) for SW-injected
> > > interrupt wake-up to work correctly, and I really don't want to save
> > > that one eagerly on each shallow exit.
> > 
> > IIUC, VMCR is resident in hardware while the guest is running, and
> > KVM needs to retrieve the VMCR when processing interrupts to
> > determine if a interrupt is above the priority threshold.  If that's
> > the case, then IMO handling the VMCR via an arch hook is
> > unnecessarily fragile, e.g. any generic call that leads to
> > kvm_arch_vcpu_runnable() needs to know that arm64 lazily retrieves a
> > guest register.
> 
> Not quite. We only need to retrieve the VMCR if we are in a situation
> where we need to trigger a wake-up from WFI at the point where we have
> not done a vcpu_put() yet. All the other cases where the interrupt is
> injected are managed by the HW. And the only case where
> kvm_arch_vcpu_runnable() gets called is when blocking.
> 
> I also don't get why a hook would be fragile, as long as it has well
> defined semantics.

Generic KVM should not have to know that a seemingly benign arch hook,
kvm_arch_vcpu_runnable(), cannot be safely called without first calling another
arch hook.  E.g. I suspect there's a (benign?) race in kvm_vcpu_on_spin().  If
the loop is delayed between checking rcuwait_active() and vcpu_dy_runnable(),
and the target vCPU is awakened during that period, KVM can call
kvm_arch_vcpu_runnable() while the vCPU is running.

It's kind of a counter-example to my below suggestion as putting the vGIC would
indeed lead to state corruption if the vCPU is running, but I would argue that
arm64 should override kvm_arch_dy_runnable() so that its correctness is guaranteed,
e.g. by not calling kvm_arch_vcpu_runnable() if the vCPU is already running.

> > A better approach for VMCR would be to retrieve the value from
> > hardware on-demand, e.g. via a hook in vgic_get_vmcr(), so that it's all but
> > impossible to have bugs where KVM is working with a stale VMCR, e.g.
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> > index 48c6067fc5ec..0784de0c4080 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> > @@ -828,6 +828,13 @@ void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  
> >  void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  {
> > +       if (!vcpu->...->vmcr_available) {
> > +               preempt_disable();
> > +               kvm_vgic_vmcr_sync(vcpu);
> > +               preempt_enable();
> > +               vcpu->...->vmcr_available = true;
> > +       }
> > +
> 
> But most of the uses of vgic_get_vmcr() are in contexts where the vcpu
> isn't running at all (such as save/restore). It really only operates
> on the shadow state, and what you have above will only lead to state
> corruption.

Ignoring the kvm_arch_dy_runnable() case for the moment, how would it lead to
corruption?  The idea is that the 'vmcr_available' flag would be cleared when the
vCPU is run, i.e. it tracks whether or not the shadow state may be stale.

> >         if (kvm_vgic_global_state.type == VGIC_V2)
> >                 vgic_v2_get_vmcr(vcpu, vmcr);
> >         else
> > 
> > 
> > Regarding vGIC v4, does KVM require it to be resident in hardware
> > while the vCPU is loaded?
> 
> It is a requirement. Otherwise, we end-up with an inconsistent state
> between the delivery of doorbells and the state of the vgic.

For my own understanding, does KVM require it to be resident in hardware while
the vCPU is loaded but _not_ running?  What I don't fully understand is how KVM
can safely load/put the vCPU if that true, i.e. wouldn't there always be a window
for badness?

> Also, reloading the GICv4 state can be pretty expensive (multiple MMIO
> accesses), which is why we really don't want to do that on the hot path
> (kvm_arch_vcpu_ioctl_run() *is* a hot path).

I wasn't suggesting to reload GICv4 on every entry, it would only be reloaded
if it was put between vcpu_load() and entry to the guest.

> > If not, then we could do something like
> > this, which would eliminate the arch hooks entirely if the VMCR is
> > handled as above.

...

> > @@ -813,6 +787,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >                  */
> >                 preempt_disable();
> > 
> > +               /*
> > +                * Reload vGIC v4 if necessary, as it may be put on-demand so
> > +                * that KVM can detect directly injected interrupts, e.g. when
> > +                * determining if the vCPU is runnable due to a pending event.
> > +                */
> > +               vgic_v4_load(vcpu);
> 
> You'd need to detect that a previous put has been done.

Not that it will likely matter, but doesn't the its_vpe.resident check automatically
handle this?

> But overall, it puts the complexity at the wrong place. WFI (aka
> kvm_vcpu_block) is the place where we want to handle this synchronisation,
> and not the run loop.
> 
> Instead of having a well defined interface with the blocking code
> where we implement the required synchronisation, you spray the vgic
> crap all over, and it becomes much harder to reason about it. Guess
> what, I'm not keen on it.

My objection to the arch hooks is that, from generic KVM's perspective, the
direct dependency is not on blocking, it's on calling kvm_arch_vcpu_runnable().
That's why I suggested handling this by tracking whether or not the VMCR is
up-to-date/stale, as it allows generic KVM to safely call kvm_arch_vcpu_runnable()
whenever the vCPU is loaded.

I don't have a strong opinion on arm64 preferring the sync to be specific to
WFI, but if that's the case then IMO this should be handled fully in arm64, e.g.
a patch like so (or with a wrapper around the call to kvm_vcpu_block() if we
want to guard against future calls into generic KVM)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..312f3acd3ca3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -367,27 +367,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)

 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
-       /*
-        * If we're about to block (most likely because we've just hit a
-        * WFI), we need to sync back the state of the GIC CPU interface
-        * so that we have the latest PMR and group enables. This ensures
-        * that kvm_arch_vcpu_runnable has up-to-date data to decide
-        * whether we have pending interrupts.
-        *
-        * For the same reason, we want to tell GICv4 that we need
-        * doorbells to be signalled, should an interrupt become pending.
-        */
-       preempt_disable();
-       kvm_vgic_vmcr_sync(vcpu);
-       vgic_v4_put(vcpu, true);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 {
-       preempt_disable();
-       vgic_v4_load(vcpu);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 275a27368a04..9870e824a27e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -95,8 +95,28 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
        } else {
                trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
                vcpu->stat.wfi_exit_stat++;
+
+               /*
+                * Sync back the state of the GIC CPU interface so that we have
+                * the latest PMR and group enables. This ensures that
+                * kvm_arch_vcpu_runnable has up-to-date data to decide whether
+                * we have pending interrupts, e.g. when determining if the
+                * vCPU should block.
+                *
+                * For the same reason, we want to tell GICv4 that we need
+                * doorbells to be signalled, should an interrupt become pending.
+                */
+               preempt_disable();
+               kvm_vgic_vmcr_sync(vcpu);
+               vgic_v4_put(vcpu, true);
+               preempt_enable();
+
                kvm_vcpu_block(vcpu);
                kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+
+               preempt_disable();
+               vgic_v4_load(vcpu);
+               preempt_enable();
        }

        kvm_incr_pc(vcpu);


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 28 Sep 2021 16:21:12 +0000
Subject: Re: [PATCH 07/14] KVM: Don't block+unblock when halt-polling is successful
Message-Id: <YVNA+KTbLrxGQ6ML@google.com>
List-Id: <kvm-ppc.vger.kernel.org>
References: <20210925005528.1145584-1-seanjc@google.com>
 <20210925005528.1145584-8-seanjc@google.com>
 <878rzlass2.wl-maz@kernel.org>
 <80d90ee6-0d43-3735-5c26-be8c3d72d493@redhat.com>
 <877df3btgb.wl-maz@kernel.org>
 <YVH/LjCqk/9PfDHn@google.com>
 <87o88dt5m5.wl-maz@kernel.org>
In-Reply-To: <87o88dt5m5.wl-maz@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: Marc Zyngier <maz@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Paul Mackerras <paulus@ozlabs.org>, Christian Borntraeger <borntraeger@de.ibm.com>, Janosch Frank <frankja@linux.ibm.com>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, David Hildenbrand <david@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack <dmatlack@google.com>, Jing Zhang <jingzhangos@google.com>

On Tue, Sep 28, 2021, Marc Zyngier wrote:
> On Mon, 27 Sep 2021 18:28:14 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Sun, Sep 26, 2021, Marc Zyngier wrote:
> > > On Sun, 26 Sep 2021 07:27:28 +0100,
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > 
> > > > On 25/09/21 11:50, Marc Zyngier wrote:
> > > > >> there is no need for arm64 to put/load
> > > > >> the vGIC as KVM hasn't relinquished control of the vCPU in any way.
> > > > > 
> > > > > This doesn't mean that there is no requirement for any state
> > > > > change. The put/load on GICv4 is crucial for performance, and the VMCR
> > > > > resync is a correctness requirement.
> > 
> > Ah crud, I didn't blame that code beforehand, I simply assumed
> > kvm_arch_vcpu_blocking() was purely for the blocking/schedule()
> > sequence.  The comment in arm64's kvm_arch_vcpu_blocking() about
> > kvm_arch_vcpu_runnable() makes more sense now too.
> > 
> > > > I wouldn't even say it's crucial for performance: halt polling cannot
> > > > work and is a waste of time without (the current implementation of)
> > > > put/load.
> > > 
> > > Not quite. A non-V{LPI,SGI} could still be used as the a wake-up from
> > > WFI (which is the only reason we end-up on this path). Only LPIs (and
> > > SGIs on GICv4.1) can be directly injected, meaning that SPIs and PPIs
> > > still follow the standard SW injection model.
> > > 
> > > However, there is still the ICH_VMCR_EL2 requirement (to get the
> > > up-to-date priority mask and group enable bits) for SW-injected
> > > interrupt wake-up to work correctly, and I really don't want to save
> > > that one eagerly on each shallow exit.
> > 
> > IIUC, VMCR is resident in hardware while the guest is running, and
> > KVM needs to retrieve the VMCR when processing interrupts to
> > determine if a interrupt is above the priority threshold.  If that's
> > the case, then IMO handling the VMCR via an arch hook is
> > unnecessarily fragile, e.g. any generic call that leads to
> > kvm_arch_vcpu_runnable() needs to know that arm64 lazily retrieves a
> > guest register.
> 
> Not quite. We only need to retrieve the VMCR if we are in a situation
> where we need to trigger a wake-up from WFI at the point where we have
> not done a vcpu_put() yet. All the other cases where the interrupt is
> injected are managed by the HW. And the only case where
> kvm_arch_vcpu_runnable() gets called is when blocking.
> 
> I also don't get why a hook would be fragile, as long as it has well
> defined semantics.

Generic KVM should not have to know that a seemingly benign arch hook,
kvm_arch_vcpu_runnable(), cannot be safely called without first calling another
arch hook.  E.g. I suspect there's a (benign?) race in kvm_vcpu_on_spin().  If
the loop is delayed between checking rcuwait_active() and vcpu_dy_runnable(),
and the target vCPU is awakened during that period, KVM can call
kvm_arch_vcpu_runnable() while the vCPU is running.

It's kind of a counter-example to my below suggestion as putting the vGIC would
indeed lead to state corruption if the vCPU is running, but I would argue that
arm64 should override kvm_arch_dy_runnable() so that its correctness is guaranteed,
e.g. by not calling kvm_arch_vcpu_runnable() if the vCPU is already running.

> > A better approach for VMCR would be to retrieve the value from
> > hardware on-demand, e.g. via a hook in vgic_get_vmcr(), so that it's all but
> > impossible to have bugs where KVM is working with a stale VMCR, e.g.
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> > index 48c6067fc5ec..0784de0c4080 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> > @@ -828,6 +828,13 @@ void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  
> >  void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> >  {
> > +       if (!vcpu->...->vmcr_available) {
> > +               preempt_disable();
> > +               kvm_vgic_vmcr_sync(vcpu);
> > +               preempt_enable();
> > +               vcpu->...->vmcr_available = true;
> > +       }
> > +
> 
> But most of the uses of vgic_get_vmcr() are in contexts where the vcpu
> isn't running at all (such as save/restore). It really only operates
> on the shadow state, and what you have above will only lead to state
> corruption.

Ignoring the kvm_arch_dy_runnable() case for the moment, how would it lead to
corruption?  The idea is that the 'vmcr_available' flag would be cleared when the
vCPU is run, i.e. it tracks whether or not the shadow state may be stale.

> >         if (kvm_vgic_global_state.type = VGIC_V2)
> >                 vgic_v2_get_vmcr(vcpu, vmcr);
> >         else
> > 
> > 
> > Regarding vGIC v4, does KVM require it to be resident in hardware
> > while the vCPU is loaded?
> 
> It is a requirement. Otherwise, we end-up with an inconsistent state
> between the delivery of doorbells and the state of the vgic.

For my own understanding, does KVM require it to be resident in hardware while
the vCPU is loaded but _not_ running?  What I don't fully understand is how KVM
can safely load/put the vCPU if that true, i.e. wouldn't there always be a window
for badness?

> Also, reloading the GICv4 state can be pretty expensive (multiple MMIO
> accesses), which is why we really don't want to do that on the hot path
> (kvm_arch_vcpu_ioctl_run() *is* a hot path).

I wasn't suggesting to reload GICv4 on every entry, it would only be reloaded
if it was put between vcpu_load() and entry to the guest.

> > If not, then we could do something like
> > this, which would eliminate the arch hooks entirely if the VMCR is
> > handled as above.

...

> > @@ -813,6 +787,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >                  */
> >                 preempt_disable();
> > 
> > +               /*
> > +                * Reload vGIC v4 if necessary, as it may be put on-demand so
> > +                * that KVM can detect directly injected interrupts, e.g. when
> > +                * determining if the vCPU is runnable due to a pending event.
> > +                */
> > +               vgic_v4_load(vcpu);
> 
> You'd need to detect that a previous put has been done.

Not that it will likely matter, but doesn't the its_vpe.resident check automatically
handle this?

> But overall, it puts the complexity at the wrong place. WFI (aka
> kvm_vcpu_block) is the place where we want to handle this synchronisation,
> and not the run loop.
> 
> Instead of having a well defined interface with the blocking code
> where we implement the required synchronisation, you spray the vgic
> crap all over, and it becomes much harder to reason about it. Guess
> what, I'm not keen on it.

My objection to the arch hooks is that, from generic KVM's perspective, the
direct dependency is not on blocking, it's on calling kvm_arch_vcpu_runnable().
That's why I suggested handling this by tracking whether or not the VMCR is
up-to-date/stale, as it allows generic KVM to safely call kvm_arch_vcpu_runnable()
whenever the vCPU is loaded.

I don't have a strong opinion on arm64 preferring the sync to be specific to
WFI, but if that's the case then IMO this should be handled fully in arm64, e.g.
a patch like so (or with a wrapper around the call to kvm_vcpu_block() if we
want to guard against future calls into generic KVM)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..312f3acd3ca3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -367,27 +367,12 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)

 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
-       /*
-        * If we're about to block (most likely because we've just hit a
-        * WFI), we need to sync back the state of the GIC CPU interface
-        * so that we have the latest PMR and group enables. This ensures
-        * that kvm_arch_vcpu_runnable has up-to-date data to decide
-        * whether we have pending interrupts.
-        *
-        * For the same reason, we want to tell GICv4 that we need
-        * doorbells to be signalled, should an interrupt become pending.
-        */
-       preempt_disable();
-       kvm_vgic_vmcr_sync(vcpu);
-       vgic_v4_put(vcpu, true);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 {
-       preempt_disable();
-       vgic_v4_load(vcpu);
-       preempt_enable();
+
 }

 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 275a27368a04..9870e824a27e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -95,8 +95,28 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
        } else {
                trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
                vcpu->stat.wfi_exit_stat++;
+
+               /*
+                * Sync back the state of the GIC CPU interface so that we have
+                * the latest PMR and group enables. This ensures that
+                * kvm_arch_vcpu_runnable has up-to-date data to decide whether
+                * we have pending interrupts, e.g. when determining if the
+                * vCPU should block.
+                *
+                * For the same reason, we want to tell GICv4 that we need
+                * doorbells to be signalled, should an interrupt become pending.
+                */
+               preempt_disable();
+               kvm_vgic_vmcr_sync(vcpu);
+               vgic_v4_put(vcpu, true);
+               preempt_enable();
+
                kvm_vcpu_block(vcpu);
                kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+
+               preempt_disable();
+               vgic_v4_load(vcpu);
+               preempt_enable();
        }

        kvm_incr_pc(vcpu);