All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Sean Christopherson <seanjc@google.com>,
	Zeng Guang <guang.zeng@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
	Tony Luck <tony.luck@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Kim Phillips <kim.phillips@amd.com>,
	Jarkko Sakkinen <jarkko@kernel.org>,
	Jethro Beekman <jethro@fortanix.com>,
	Kai Huang <kai.huang@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Robert Hu <robert.hu@intel.com>, Gao Chao <chao.gao@intel.com>
Subject: Re: [PATCH v8 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally
Date: Tue, 26 Apr 2022 11:14:14 +0300	[thread overview]
Message-ID: <6475522c58aec5db3ee0a5ccd3230c63a2f013a9.camel@redhat.com> (raw)
In-Reply-To: <080d6ced254e56dbad2910447f81c5ea976fc419.camel@redhat.com>

On Tue, 2022-04-19 at 17:07 +0300, Maxim Levitsky wrote:
> On Fri, 2022-04-15 at 14:39 +0000, Sean Christopherson wrote:
> > On Mon, Apr 11, 2022, Zeng Guang wrote:
> > > From: Maxim Levitsky <mlevitsk@redhat.com>
> > > 
> > > No normal guest has any reason to change physical APIC IDs, and
> > > allowing this introduces bugs into APIC acceleration code.
> > > 
> > > And Intel recent hardware just ignores writes to APIC_ID in
> > > xAPIC mode. More background can be found at:
> > > https://lore.kernel.org/lkml/Yfw5ddGNOnDqxMLs@google.com/
> > > 
> > > Looks there is no much value to support writable xAPIC ID in
> > > guest except supporting some old and crazy use cases which
> > > probably would fail on real hardware. So, make xAPIC ID
> > > read-only for KVM guests.
> > 
> > AFAIK, the plan is to add a capability to let userspace opt-in to a fully read-only
> > APIC ID[*], but I haven't seen patches...
> > 
> > Maxim?
> 
> Yep, I will start working on this pretty much today.
> 
> I was busy last 3 weeks stablilizing nested AVIC
> (I am getting ~600,000 IPIs/s instead of ~40,000 in L2 VM with nested AVIC!),
> with 700,000-900,000 IPIs native with AVIC, 
> almost bare metal IPI performance in L2!
> 
> (the result is from test which I will soon publish makes all 
> vCPUs send IPIs in round robin fashion, and a vCPU sends IPI only after 
> it received it from previous vCPU - the number is total
> number of IPIs send on 8 vCPUs).
> 
> 
> The fact that the dreadful AVIC errata dominates my testing again,
> supports my feeling that I mostly fixed nested AVIC bugs.'
> Tomorrow I'll send RFC v2 of the patches.
> 
> 
> About read-only apic ID cap, 
> I have few questions before I start implementing it:
> 
> Paolo gave me recently an idea to make the APIC ID always read-only for 
> guest writes, and only allow host APIC ID changes (unless the cap is set).
> 
> I am kind of torn about it - assuming that no guest writes APIC ID this will work just fine 
> in empty logical sense, but if a guest actually writes an apic id, 
> while it will migrate fine to a newer KVM, but then after a reboot 
> it won't be able to set its APIC ID again.
> 
> On the other hand, versus fully unconditional read-only apic id, 
> that will support that very unlikely case if the hypervisor
> itself is actually the one that for some reason changes the apic id,
> from the initial value it gave.
> 
> 
> In terms of what I need:
> 
> - In nested AVIC I strongly prefer read-only apic ids, and I can
>   make nested AVIC be conditional on the new cap.
>   IPI virtualization also can be made conditional on the new cap.
> 
> 
> - I also would love to remove broken code from *non nested* AVIC, 
>   which tries to support APIC ID change. 
> 
>   I can make non nested AVIC *also* depend on the new cap, 
>   but that will technically be a regression, since this way users of 
>   older qemu and new kernel will suddenly have their AVIC inhibited. 
> 
>   I don't know if that is such a big issue though because AVIC is
>   (sadly) by default disabled anyway.
> 
>   If that is not possible the other way to solve this is to inhibit AVIC
>   as soon as the guest tries to change APIC ID.
> 
> - I would also want to remove the ability to relocate apic base,
>   likely depending on the new cap as well, but if there are objections
>   I can drop this. I don't need this, but it is just nice to do while we
>   are at it.
> 
> 
> Paolo, Sean, and everyone else: What do you think?

Palo, Sean, Any update?

After thinking more about this, I actualy think I will do something
different, something that actually was proposed here, and I was against it:


1. I will add new inhibit APICV_INHIBIT_REASON_RO_SETTINGS, which will be set
first time any vCPU touches apic id and/or apic base because why not...

That will take care of non nested case cleanly, and will take care of IPIv
for now (as long as it does't support nesting).

2. For my nested AVIC, I will do 2 things:

   a. My code never reads L1 apic ids, and always uses vcpu_id, thus
      in theory, if I just ignore the problem, and the guest changes apic ids,
      the nested AVIC will just keep on using initial apic ids, thus there is  no danger
      of CVE like issue if the guest tries to change theses ids in the 'right' time.

   b. on each nested vm entry I'll just check that apic id is not changed from the default,
      if AVIC is enabled for the nested guest.

      if so the nested entry will fail (best with kvm_vm_bugged) to get attention of
      the user, but I can just fail it with standard vm exit reason of 0xFFFFFFFF.

      Chances that there is a modern VM, which runs nested guests, on AMD, and changes APIC IDs,
      IMHO are too low to bother, plus one can always disable nested AVIC for this case by tweaking
      the nested CPUID.


This way there will no need to patch qemu, propogate the new cap to the machines, etc, etc.

What do you think?

Best regards,
	Maxim Levitsky



> 
> Also:
> Suggestions for the name for the new cap? Is this all right if
> the same cap would make both apic id read only and apic base
> (this is also something Paolo suggested to me)
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> > [*] https://lore.kernel.org/all/c903e82ed2a1e98f66910c35b5aabdcf56e08e72.camel@redhat.com
> > 



  reply	other threads:[~2022-04-26  8:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11  9:04 [PATCH v8 0/9] IPI virtualization support for VM Zeng Guang
2022-04-11  9:04 ` [PATCH v8 1/9] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
2022-04-11  9:04 ` [PATCH v8 2/9] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
2022-04-11  9:04 ` [PATCH v8 3/9] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
2022-04-11  9:04 ` [PATCH v8 4/9] KVM: VMX: Report tertiary_exec_control field in dump_vmcs() Zeng Guang
2022-04-11  9:04 ` [PATCH v8 5/9] KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode Zeng Guang
2022-04-11  9:04 ` [PATCH v8 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally Zeng Guang
2022-04-15 14:39   ` Sean Christopherson
2022-04-19 14:07     ` Maxim Levitsky
2022-04-26  8:14       ` Maxim Levitsky [this message]
2022-04-26 14:00         ` Chao Gao
2022-04-11  9:04 ` [PATCH v8 7/9] KVM: Move kvm_arch_vcpu_precreate() under kvm->lock Zeng Guang
2022-04-15 15:00   ` Sean Christopherson
2022-04-15 15:11     ` Sean Christopherson
2022-04-11  9:04 ` [PATCH v8 8/9] KVM: x86: Allow userspace set maximum VCPU id for VM Zeng Guang
2022-04-15 15:01   ` Sean Christopherson
2022-04-11  9:04 ` [PATCH v8 9/9] KVM: VMX: enable IPI virtualization Zeng Guang
2022-04-15 15:25   ` Sean Christopherson
2022-04-18  9:25     ` Chao Gao
2022-04-18 15:14       ` Sean Christopherson
2022-04-19  0:00         ` Chao Gao
2022-04-18 12:49     ` Zeng Guang
2022-04-15 15:45   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6475522c58aec5db3ee0a5ccd3230c63a2f013a9.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=guang.zeng@intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=jethro@fortanix.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kai.huang@intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kim.phillips@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=robert.hu@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.