linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Chao Gao <chao.gao@intel.com>
Cc: Zeng Guang <guang.zeng@intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Kim Phillips <kim.phillips@amd.com>,
	Jarkko Sakkinen <jarkko@kernel.org>,
	Jethro Beekman <jethro@fortanix.com>,
	"Huang, Kai" <kai.huang@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Hu, Robert" <robert.hu@intel.com>
Subject: Re: [PATCH v6 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally
Date: Sun, 13 Mar 2022 17:09:08 +0200	[thread overview]
Message-ID: <fbf929e0793a6b4df59ec9d95a018d1f6737db35.camel@redhat.com> (raw)
In-Reply-To: <20220313135335.GA18405@gao-cwp>

On Sun, 2022-03-13 at 21:53 +0800, Chao Gao wrote:
> On Sun, Mar 13, 2022 at 12:59:36PM +0200, Maxim Levitsky wrote:
> > On Sun, 2022-03-13 at 11:19 +0200, Maxim Levitsky wrote:
> > > On Fri, 2022-03-11 at 21:28 +0800, Zeng Guang wrote:
> > > > On 3/11/2022 12:26 PM, Sean Christopherson wrote:
> > > > > On Wed, Mar 09, 2022, Maxim Levitsky wrote:
> > > > > > On Wed, 2022-03-09 at 06:01 +0000, Sean Christopherson wrote:
> > > > > > > > Could you share the links?
> > > > > > > 
> > > > > > > Doh, sorry (they're both in this one).
> > > > > > > 
> > > > > > > https://lore.kernel.org/all/20220301135526.136554-5-mlevitsk@redhat.com
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > My opinion on this subject is very simple: we need to draw the line somewhere.
> > > > > 
> > > > > ...
> > > > > 
> > > > > 
> > > > > Since the goal is to simplify KVM, can we try the inhibit route and see what the
> > > > > code looks like before making a decision?  I think it might actually yield a less
> > > > > awful KVM than the readonly approach, especially if the inhibit is "sticky", i.e.
> > > > > we don't try to remove the inhibit on subsequent changes.
> > > > > 
> > > > > Killing the VM, as proposed, is very user unfriendly as the user will have no idea
> > > > > why the VM was killed.  WARN is out of the question because this is user triggerable.
> > > > > Returning an emulation error would be ideal, but getting that result up through
> > > > > apic_mmio_write() could be annoying and end up being more complex.
> > > > > 
> > > > > The touchpoints will all be the same, unless I'm missing something the difference
> > > > > should only be a call to set an inhibit instead killing the VM.
> > > > 
> > > > Introduce an inhibition - APICV_INHIBIT_REASON_APICID_CHG to deactivate
> > > > APICv once KVM guest would try to change APIC ID in xapic mode, and same
> > > > sanity check in KVM_{SET,GET}_LAPIC for live migration. KVM will keep
> > > > alive but obviously lose benefit from hardware acceleration in this way.
> > > > 
> > > > So how do you think the proposal like this ?
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 6dcccb304775..30d825c069be 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1046,6 +1046,7 @@ struct kvm_x86_msr_filter {
> > > >  #define APICV_INHIBIT_REASON_X2APIC    5
> > > >  #define APICV_INHIBIT_REASON_BLOCKIRQ  6
> > > >  #define APICV_INHIBIT_REASON_ABSENT    7
> > > > +#define APICV_INHIBIT_REASON_APICID_CHG 8
> > > > 
> > > >  struct kvm_arch {
> > > >         unsigned long n_used_mmu_pages;
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 22929b5b3f9b..66cd54fa4515 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -2044,10 +2044,19 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> > > > 
> > > >         switch (reg) {
> > > >         case APIC_ID:           /* Local APIC ID */
> > > > -               if (!apic_x2apic_mode(apic))
> > > > -                       kvm_apic_set_xapic_id(apic, val >> 24);
> > > > -               else
> > > > +               if (apic_x2apic_mode(apic)) {
> > > >                         ret = 1;
> > > > +                       break;
> > > > +               }
> > > > +               /*
> > > > +                * If changing APIC ID with any APIC acceleration enabled,
> > > > +                * deactivate APICv to avoid unexpected issues.
> > > > +                */
> > > > +               if (enable_apicv && (val >> 24) != apic->vcpu->vcpu_id)
> > > > +                       kvm_request_apicv_update(apic->vcpu->kvm,
> > > > +                               false, APICV_INHIBIT_REASON_APICID_CHG);
> > > > +
> > > > +               kvm_apic_set_xapic_id(apic, val >> 24);
> > > >                 break;
> > > > 
> > > >         case APIC_TASKPRI:
> > > > @@ -2628,11 +2637,19 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > >  static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > >                 struct kvm_lapic_state *s, bool set)
> > > >  {
> > > > -       if (apic_x2apic_mode(vcpu->arch.apic)) {
> > > > -               u32 *id = (u32 *)(s->regs + APIC_ID);
> > > > -               u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > > -               u64 icr;
> > > > +       u32 *id = (u32 *)(s->regs + APIC_ID);
> > > > +       u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > > +       u64 icr;
> > > > +       if (!apic_x2apic_mode(vcpu->arch.apic)) {
> > > > +               /*
> > > > +                * If APIC ID changed with any APIC acceleration enabled,
> > > > +                * deactivate APICv to avoid unexpected issues.
> > > > +                */
> > > > +               if (enable_apicv && (*id >> 24) != vcpu->vcpu_id)
> > > > +                       kvm_request_apicv_update(vcpu->kvm,
> > > > +                               false, APICV_INHIBIT_REASON_APICID_CHG);
> > > > +       } else {
> > > >                 if (vcpu->kvm->arch.x2apic_format) {
> > > >                         if (*id != vcpu->vcpu_id)
> > > >                                 return -EINVAL;
> > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > index 82d56f8055de..f78754bdc1d0 100644
> > > > --- a/arch/x86/kvm/svm/avic.c
> > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > @@ -931,7 +931,8 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
> > > >                           BIT(APICV_INHIBIT_REASON_IRQWIN) |
> > > >                           BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
> > > >                           BIT(APICV_INHIBIT_REASON_X2APIC) |
> > > > -                         BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
> > > > +                         BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> > > > +                         BIT(APICV_INHIBIT_REASON_APICID_CHG);
> > > > 
> > > >         return supported & BIT(bit);
> > > >  }
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index 7beba7a9f247..91265f0784bd 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -7751,7 +7751,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> > > >         ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
> > > >                           BIT(APICV_INHIBIT_REASON_ABSENT) |
> > > >                           BIT(APICV_INHIBIT_REASON_HYPERV) |
> > > > -                         BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
> > > > +                         BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> > > > +                         BIT(APICV_INHIBIT_REASON_APICID_CHG);
> > > > 
> > > >         return supported & BIT(bit);
> > > >  }
> > > > 
> > > > 
> > > > 
> > > 
> > > This won't work with nested AVIC - we can't just inhibit a nested guest using its own AVIC,
> > > because migration happens.
> > 
> > I mean because host decided to change its apic id, which it can in theory do any time,
> > even after the nested guest has started. Seriously, the only reason guest has to change apic id,
> > is to try to exploit some security hole.
> 
> Hi
> 
> Thanks for the information.  
> 
> IIUC, you mean KVM applies APICv inhibition only to L1 VM, leaving APICv
> enabled for L2 VM. Shouldn't KVM disable APICv for L2 VM in this case?
> It looks like a generic issue in dynamically toggling APICv scheme,
> e.g., qemu can set KVM_GUESTDBG_BLOCKIRQ after nested guest has started.
> 

That is the problem - you can't disable it for L2, unless you are willing to emulate it in software.
Or in other words, when nested guest uses a hardware feature, you can't at some point say to it:
sorry buddy - hardware feature disappeared.

It is *currently* not a problem for APICv because it doesn't do IPI virtualization,
and even with these patches, it doesn't do this for nesting.
It does become when you allow nested guest to use this which I did in the nested AVIC code.


and writable apic ids do pose a large problem, since nested AVIC, will target L1's apic ids,
and when they can change under you without any notice, and even worse be duplicate,
it is just nightmare.

About KVM_GUESTDBG_BLOCKIRQ - yes, but that is a best effort hack anyway, 
which not supposed to 100% work - running gdbstub with nested is broken in many ways anyway.

Best regards,
	Maxim Levitsky



  reply	other threads:[~2022-03-13 15:09 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-25  8:22 [PATCH v6 0/9] IPI virtualization support for VM Zeng Guang
2022-02-25  8:22 ` [PATCH v6 1/9] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
2022-02-25 14:09   ` Maxim Levitsky
2022-02-25  8:22 ` [PATCH v6 2/9] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
2022-02-25 14:24   ` Maxim Levitsky
2022-02-25  8:22 ` [PATCH v6 3/9] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
2022-02-25 14:30   ` Maxim Levitsky
2022-02-25  8:22 ` [PATCH v6 4/9] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Zeng Guang
2022-02-25 14:31   ` Maxim Levitsky
2022-02-25  8:22 ` [PATCH v6 5/9] KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode Zeng Guang
2022-02-25 14:44   ` Maxim Levitsky
2022-02-25 15:29     ` Chao Gao
2022-02-25  8:22 ` [PATCH v6 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally Zeng Guang
2022-02-25 14:46   ` Maxim Levitsky
2022-02-25 14:56     ` David Woodhouse
2022-02-25 15:11       ` Maxim Levitsky
2022-02-25 15:42         ` David Woodhouse
2022-02-25 16:12           ` Maxim Levitsky
2022-03-01  8:03     ` Chao Gao
2022-03-08 23:04   ` Sean Christopherson
2022-03-09  5:21     ` Chao Gao
2022-03-09  6:01       ` Sean Christopherson
2022-03-09 12:59         ` Maxim Levitsky
2022-03-11  4:26           ` Sean Christopherson
     [not found]             ` <29c76393-4884-94a8-f224-08d313b73f71@intel.com>
2022-03-13  9:19               ` Maxim Levitsky
2022-03-13 10:59                 ` Maxim Levitsky
2022-03-13 13:53                   ` Chao Gao
2022-03-13 15:09                     ` Maxim Levitsky [this message]
2022-03-14  4:09                       ` Chao Gao
2022-03-15 15:10                       ` Chao Gao
2022-03-15 15:30                         ` Maxim Levitsky
2022-03-16 11:50                           ` Chao Gao
2022-02-25  8:22 ` [PATCH v6 7/9] KVM: VMX: enable IPI virtualization Zeng Guang
2022-02-25 17:19   ` Maxim Levitsky
2022-03-01  9:21     ` Chao Gao
2022-03-02  6:45       ` Chao Gao
2022-02-25  8:22 ` [PATCH v6 8/9] KVM: x86: Allow userspace set maximum VCPU id for VM Zeng Guang
2022-02-25 17:22   ` Maxim Levitsky
2022-02-25  8:22 ` [PATCH v6 9/9] KVM: VMX: Optimize memory allocation for PID-pointer table Zeng Guang
2022-02-25 17:29   ` Maxim Levitsky
2022-03-01  9:23     ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbf929e0793a6b4df59ec9d95a018d1f6737db35.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=guang.zeng@intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=jethro@fortanix.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kai.huang@intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kim.phillips@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=robert.hu@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).