linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Huang, Kai" <kai.huang@intel.com>
Cc: "chenhuacai@kernel.org" <chenhuacai@kernel.org>,
	"maz@kernel.org" <maz@kernel.org>,
	"frankja@linux.ibm.com" <frankja@linux.ibm.com>,
	"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
	"palmer@dabbelt.com" <palmer@dabbelt.com>,
	"paul.walmsley@sifive.com" <paul.walmsley@sifive.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"dwmw2@infradead.org" <dwmw2@infradead.org>,
	"aleksandar.qemu.devel@gmail.com"
	<aleksandar.qemu.devel@gmail.com>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>,
	"paul@xen.org" <paul@xen.org>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"anup@brainfault.org" <anup@brainfault.org>,
	"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"farosas@linux.ibm.com" <farosas@linux.ibm.com>,
	"david@redhat.com" <david@redhat.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"Yao, Yuan" <yuan.yao@intel.com>,
	"alexandru.elisei@arm.com" <alexandru.elisei@arm.com>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mpe@ellerman.id.au" <mpe@ellerman.id.au>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
	"kvm-riscv@lists.infradead.org" <kvm-riscv@lists.infradead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"philmd@linaro.org" <philmd@linaro.org>,
	"atishp@atishpatra.org" <atishp@atishpatra.org>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"Gao, Chao" <chao.gao@intel.com>
Subject: Re: [PATCH v2 42/50] KVM: Disable CPU hotplug during hardware enabling/disabling
Date: Fri, 2 Dec 2022 16:31:18 +0000	[thread overview]
Message-ID: <Y4ooVrDTkscy68vg@google.com> (raw)
In-Reply-To: <8b1053781e859aa95a08c10b0e8a06912a2b42a2.camel@intel.com>

On Fri, Dec 02, 2022, Huang, Kai wrote:
> On Wed, 2022-11-30 at 23:09 +0000, Sean Christopherson wrote:
> > From: Chao Gao <chao.gao@intel.com>
> > 
> > Disable CPU hotplug when enabling/disabling hardware to prevent the
> > corner case where if the following sequence occurs:
> > 
> >   1. A hotplugged CPU marks itself online in cpu_online_mask
> >   2. The hotplugged CPU enables interrupt before invoking KVM's ONLINE
> >      callback
> >   3  hardware_{en,dis}able_all() is invoked on another CPU
> > 
> > the hotplugged CPU will be included in on_each_cpu() and thus get sent
> > through hardware_{en,dis}able_nolock() before kvm_online_cpu() is called.
> 
> Should we explicitly call out what is the consequence of such case, otherwise
> it's hard to tell whether this truly is an issue?
>
> IIUC, since now the compatibility check has already been moved to
> kvm_arch_hardware_enable(), the consequence is hardware_enable_all() will fail
> if the now online cpu isn't compatible, which will results in failing to create
> the first VM.  This isn't ideal since the incompatible cpu should be rejected to
> go online instead.

Actually, in that specific scenario, KVM should not reject the CPU.  E.g. if KVM
is autoloaded (common with systemd and/or qemu-kvm installed), but not used by
userspace, then KVM is overstepping by rejecting the incompatible CPU since the
user likely cares more about onlining a CPU than they do about KVM.

> > KVM currently fudges around this race by keeping track of which CPUs have
> > done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which
> > cpus have virtualization enabled"), but that's an inefficient, convoluted,
> > and hacky solution.

...

> > +	/*
> > +	 * Compatibility checks are done when loading KVM and when enabling
> > +	 * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > +	 * compatible, i.e. KVM should never perform a compatibility check on
> > +	 * an offline CPU.
> > +	 */
> > +	WARN_ON(!cpu_online(cpu));
> 
> IMHO this chunk logically should belong to previous patch.  IIUC disabling CPU
> hotplug during hardware_enable_all() doesn't have relationship to this WARN().

Hmm, yeah, I agree.  I'll move it.

> >  static int hardware_enable_all(void)
> >  {
> >  	int r = 0;
> >  
> > +	/*
> > +	 * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> > +	 * is called, and so on_each_cpu() between them includes the CPU that
> > +	 * is being onlined.  As a result, hardware_enable_nolock() may get
> > +	 * invoked before kvm_online_cpu(), which also enables hardware if the
> > +	 * usage count is non-zero.  Disable CPU hotplug to avoid attempting to
> > +	 * enable hardware multiple times.
> 
> It won't enable hardware multiple times, right?  Since hardware_enable_nolock()
> has below check:
> 
>         if (cpumask_test_cpu(cpu, cpus_hardware_enabled))                      
>                 return;                                                        
>                                                                                                                                                    
>         cpumask_set_cpu(cpu, cpus_hardware_enabled);     
> 
> IIUC the only issue is the one that I replied in the changelog.
> 
> Or perhaps I am missing something?

You're not missing anything in terms of code.  What the comment means by "attempting"
in this case is calling hardware_enable_nolock().  As called out in the changelog,
guarding against this race with cpus_hardware_enabled is a hack, i.e. KVM should
not have to rely on a per-CPU flag.

 : KVM currently fudges around this race by keeping track of which CPUs have
 : done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which
 : cpus have virtualization enabled"), but that's an inefficient, convoluted,
 : and hacky solution.

I actually considered removing the per-CPU flag, but decided not to because it's
simpler to blast

	on_each_cpu(hardware_disable_nolock, ...)

in kvm_reboot() and if enabling hardware fails on one or more CPUs, and taking a
#UD on VMXOFF in the latter case is really unnecessary, i.e. the flag is nice to
have for other reasons.

That said, after this patch, KVM should be able to WARN in the enable path.  I'll
test that and do a more thorough audit; unless I'm missing something, I'll add a
patch to do:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b8c6bfb46066..ee896fa2f196 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5027,7 +5027,7 @@ static int kvm_usage_count;
 
 static int __hardware_enable_nolock(void)
 {
-       if (__this_cpu_read(hardware_enabled))
+       if (WARN_ON_ONCE(__this_cpu_read(hardware_enabled)))
                return 0;
 
        if (kvm_arch_hardware_enable()) {


  reply	other threads:[~2022-12-02 16:32 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 23:08 [PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 01/50] KVM: Register /dev/kvm as the _very_ last thing during initialization Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 02/50] KVM: Initialize IRQ FD after arch hardware setup Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 03/50] KVM: Allocate cpus_hardware_enabled " Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 04/50] KVM: Teardown VFIO ops earlier in kvm_exit() Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 05/50] KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 06/50] KVM: s390: Move hardware setup/unsetup to init/exit Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 07/50] KVM: x86: Do timer initialization after XCR0 configuration Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 08/50] KVM: x86: Move hardware setup/unsetup to init/exit Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 09/50] KVM: Drop arch hardware (un)setup hooks Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 10/50] KVM: VMX: Reset eVMCS controls in VP assist page during hardware disabling Sean Christopherson
2022-12-01 15:42   ` Vitaly Kuznetsov
2022-11-30 23:08 ` [PATCH v2 11/50] KVM: VMX: Don't bother disabling eVMCS static key on module exit Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 12/50] KVM: VMX: Move Hyper-V eVMCS initialization to helper Sean Christopherson
2022-12-01 15:22   ` Vitaly Kuznetsov
2022-11-30 23:08 ` [PATCH v2 13/50] KVM: x86: Move guts of kvm_arch_init() to standalone helper Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 14/50] KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace Sean Christopherson
2022-11-30 23:08 ` [PATCH v2 15/50] KVM: x86: Serialize vendor module initialization (hardware setup) Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 16/50] KVM: arm64: Simplify the CPUHP logic Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 17/50] KVM: arm64: Free hypervisor allocations if vector slot init fails Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 18/50] KVM: arm64: Unregister perf callbacks if hypervisor finalization fails Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 19/50] KVM: arm64: Do arm/arch initialization without bouncing through kvm_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 20/50] KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 21/50] KVM: MIPS: Hardcode callbacks to hardware virtualization extensions Sean Christopherson
2022-12-01 22:00   ` Philippe Mathieu-Daudé
2022-12-01 22:49     ` Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 22/50] KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 23/50] KVM: MIPS: Register die notifier prior to kvm_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 24/50] KVM: RISC-V: Do arch init directly in riscv_kvm_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 25/50] KVM: RISC-V: Tag init functions and data with __init, __ro_after_init Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 26/50] KVM: PPC: Move processor compatibility check to module init Sean Christopherson
2022-12-01  5:21   ` Michael Ellerman
2022-12-01 16:38     ` Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 27/50] KVM: s390: Do s390 specific init without bouncing through kvm_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 28/50] KVM: s390: Mark __kvm_s390_init() and its descendants as __init Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 29/50] KVM: Drop kvm_arch_{init,exit}() hooks Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 30/50] KVM: VMX: Make VMCS configuration/capabilities structs read-only after init Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 31/50] KVM: x86: Do CPU compatibility checks in x86 code Sean Christopherson
2022-12-02 12:16   ` Huang, Kai
2022-12-05 20:52   ` Isaku Yamahata
2022-12-05 21:12     ` Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 32/50] KVM: Drop kvm_arch_check_processor_compat() hook Sean Christopherson
2022-12-02 12:18   ` Huang, Kai
2022-11-30 23:09 ` [PATCH v2 33/50] KVM: x86: Use KBUILD_MODNAME to specify vendor module name Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 34/50] KVM: x86: Unify pr_fmt to use module name for all KVM modules Sean Christopherson
2022-12-01 10:43   ` Paul Durrant
2022-11-30 23:09 ` [PATCH v2 35/50] KVM: VMX: Use current CPU's info to perform "disabled by BIOS?" checks Sean Christopherson
2022-12-02 12:18   ` Huang, Kai
2022-11-30 23:09 ` [PATCH v2 36/50] KVM: x86: Do VMX/SVM support checks directly in vendor code Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 37/50] KVM: VMX: Shuffle support checks and hardware enabling code around Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 38/50] KVM: SVM: Check for SVM support in CPU compatibility checks Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 39/50] KVM: x86: Move CPU compat checks hook to kvm_x86_ops (from kvm_x86_init_ops) Sean Christopherson
2022-12-02 13:01   ` Huang, Kai
2022-12-05 21:04   ` Isaku Yamahata
2022-11-30 23:09 ` [PATCH v2 40/50] KVM: x86: Do compatibility checks when onlining CPU Sean Christopherson
2022-12-02 13:03   ` Huang, Kai
2022-12-02 13:36   ` Huang, Kai
2022-12-02 16:04     ` Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 41/50] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section Sean Christopherson
2022-12-02 13:06   ` Huang, Kai
2022-12-02 16:08     ` Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 42/50] KVM: Disable CPU hotplug during hardware enabling/disabling Sean Christopherson
2022-12-02 12:59   ` Huang, Kai
2022-12-02 16:31     ` Sean Christopherson [this message]
2022-11-30 23:09 ` [PATCH v2 43/50] KVM: Ensure CPU is stable during low level hardware enable/disable Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 44/50] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 45/50] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 46/50] KVM: Use a per-CPU variable to track which CPUs have enabled virtualization Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 47/50] KVM: Make hardware_enable_failed a local variable in the "enable all" path Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 48/50] KVM: Register syscore (suspend/resume) ops early in kvm_init() Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 49/50] KVM: Opt out of generic hardware enabling on s390 and PPC Sean Christopherson
2022-11-30 23:09 ` [PATCH v2 50/50] KVM: Clean up error labels in kvm_init() Sean Christopherson
2022-12-02  8:02 ` [PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling Chao Gao
2022-12-27 13:02 ` Paolo Bonzini
2022-12-28 11:22   ` Marc Zyngier
2022-12-28 11:58     ` Paolo Bonzini
2022-12-29 20:52     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4ooVrDTkscy68vg@google.com \
    --to=seanjc@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=alexandru.elisei@arm.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=atishp@atishpatra.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=chao.gao@intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=farman@linux.ibm.com \
    --cc=farosas@linux.ibm.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=kai.huang@intel.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maz@kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=yuan.yao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).