From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f194.google.com ([74.125.82.194]:33307 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751460AbdF0Dlg (ORCPT ); Mon, 26 Jun 2017 23:41:36 -0400 MIME-Version: 1.0 In-Reply-To: <20170622151040.16231-1-rkrcmar@redhat.com> References: <20170622151040.16231-1-rkrcmar@redhat.com> From: Wanpeng Li Date: Tue, 27 Jun 2017 11:41:34 +0800 Message-ID: Subject: Re: [PATCH] KVM: x86: fix singlestepping over syscall To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: kvm , Paolo Bonzini , P J P , Steve Rutherford , Andrew Honig , Andy Lutomirski , "# v3 . 10+" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: stable-owner@vger.kernel.org List-ID: 2017-06-22 23:10 GMT+08:00 Radim Kr=C4=8Dm=C3=A1=C5=99 = : > From: Paolo Bonzini > > TF is handled a bit differently for syscall and sysret, compared > to the other instructions: TF is checked after the instruction completes, > so that the OS can disable #DB at a syscall by adding TF to FMASK. > When the sysret is executed the #DB is taken "as if" the syscall insn > just completed. > > KVM emulates syscall so that it can trap 32-bit syscall on Intel processo= rs. We have a discussion to not expose syscall/sysret to Intel 32-bit guest two years ago. https://lkml.org/lkml/2015/11/19/225 The syscall/sysret just makes sense against long mode instead of compatibility/legacy mode of Intel CPU. We will get a #UD in 32-bit guest, and syscall emulation is introduced by commit 66bb2ccd (KVM: x86 emulator: add syscall emulation) to handle it. So why we still expose syscall/sysret to Intel 32-bit guest? > Fix the behavior, otherwise you could get #DB on a user stack which is no= t > nice. This does not affect Linux guests, as they use an IST or task gate > for #DB. > > This fixes CVE-2017-7518. > > Cc: stable@vger.kernel.org > Reported-by: Andy Lutomirski > Signed-off-by: Paolo Bonzini > Signed-off-by: Radim Kr=C4=8Dm=C3=A1=C5=99 > --- > The patch is included in the following pull request to Linus. > > arch/x86/include/asm/kvm_emulate.h | 1 + > arch/x86/kvm/emulate.c | 1 + > arch/x86/kvm/x86.c | 62 ++++++++++++++++++++------------= ------ > 3 files changed, 34 insertions(+), 30 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kv= m_emulate.h > index 055962615779..722d0e568863 100644 > --- a/arch/x86/include/asm/kvm_emulate.h > +++ b/arch/x86/include/asm/kvm_emulate.h > @@ -296,6 +296,7 @@ struct x86_emulate_ctxt { > > bool perm_ok; /* do not check permissions if true */ > bool ud; /* inject an #UD if host doesn't support insn */ > + bool tf; /* TF value before instruction (after for syscall= /sysret) */ > > bool have_exception; > struct x86_exception exception; > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 0816ab2e8adc..80890dee66ce 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -2742,6 +2742,7 @@ static int em_syscall(struct x86_emulate_ctxt *ctxt= ) > ctxt->eflags &=3D ~(X86_EFLAGS_VM | X86_EFLAGS_IF); > } > > + ctxt->tf =3D (ctxt->eflags & X86_EFLAGS_TF) !=3D 0; > return X86EMUL_CONTINUE; > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 87d3cb901935..0e846f0cb83b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5313,6 +5313,8 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu= ) > kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); > > ctxt->eflags =3D kvm_get_rflags(vcpu); > + ctxt->tf =3D (ctxt->eflags & X86_EFLAGS_TF) !=3D 0; > + I guess this is used for "the sysret is executed the #DB is taken "as if" the syscall insn just completed", however, there is no sysret emulation, so how the #DB is taken after the sysret? Regards, Wanpeng Li > ctxt->eip =3D kvm_rip_read(vcpu); > ctxt->mode =3D (!is_protmode(vcpu)) ? X86EMUL_MODE_= REAL : > (ctxt->eflags & X86_EFLAGS_VM) ? X86EMUL_MODE_VM= 86 : > @@ -5528,36 +5530,25 @@ static int kvm_vcpu_check_hw_bp(unsigned long add= r, u32 type, u32 dr7, > return dr6; > } > > -static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned lo= ng rflags, int *r) > +static void kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu, int *r) > { > struct kvm_run *kvm_run =3D vcpu->run; > > - /* > - * rflags is the old, "raw" value of the flags. The new value ha= s > - * not been saved yet. > - * > - * This is correct even for TF set by the guest, because "the > - * processor will not generate this exception after the instructi= on > - * that sets the TF flag". > - */ > - if (unlikely(rflags & X86_EFLAGS_TF)) { > - if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > - kvm_run->debug.arch.dr6 =3D DR6_BS | DR6_FIXED_1 = | > - DR6_RTM; > - kvm_run->debug.arch.pc =3D vcpu->arch.singlestep_= rip; > - kvm_run->debug.arch.exception =3D DB_VECTOR; > - kvm_run->exit_reason =3D KVM_EXIT_DEBUG; > - *r =3D EMULATE_USER_EXIT; > - } else { > - /* > - * "Certain debug exceptions may clear bit 0-3. = The > - * remaining contents of the DR6 register are nev= er > - * cleared by the processor". > - */ > - vcpu->arch.dr6 &=3D ~15; > - vcpu->arch.dr6 |=3D DR6_BS | DR6_RTM; > - kvm_queue_exception(vcpu, DB_VECTOR); > - } > + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > + kvm_run->debug.arch.dr6 =3D DR6_BS | DR6_FIXED_1 | DR6_RT= M; > + kvm_run->debug.arch.pc =3D vcpu->arch.singlestep_rip; > + kvm_run->debug.arch.exception =3D DB_VECTOR; > + kvm_run->exit_reason =3D KVM_EXIT_DEBUG; > + *r =3D EMULATE_USER_EXIT; > + } else { > + /* > + * "Certain debug exceptions may clear bit 0-3. The > + * remaining contents of the DR6 register are never > + * cleared by the processor". > + */ > + vcpu->arch.dr6 &=3D ~15; > + vcpu->arch.dr6 |=3D DR6_BS | DR6_RTM; > + kvm_queue_exception(vcpu, DB_VECTOR); > } > } > > @@ -5567,7 +5558,17 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu = *vcpu) > int r =3D EMULATE_DONE; > > kvm_x86_ops->skip_emulated_instruction(vcpu); > - kvm_vcpu_check_singlestep(vcpu, rflags, &r); > + > + /* > + * rflags is the old, "raw" value of the flags. The new value ha= s > + * not been saved yet. > + * > + * This is correct even for TF set by the guest, because "the > + * processor will not generate this exception after the instructi= on > + * that sets the TF flag". > + */ > + if (unlikely(rflags & X86_EFLAGS_TF)) > + kvm_vcpu_do_singlestep(vcpu, &r); > return r =3D=3D EMULATE_DONE; > } > EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction); > @@ -5726,8 +5727,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, > toggle_interruptibility(vcpu, ctxt->interruptibility); > vcpu->arch.emulate_regs_need_sync_to_vcpu =3D false; > kvm_rip_write(vcpu, ctxt->eip); > - if (r =3D=3D EMULATE_DONE) > - kvm_vcpu_check_singlestep(vcpu, rflags, &r); > + if (r =3D=3D EMULATE_DONE && > + (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLE= STEP))) > + kvm_vcpu_do_singlestep(vcpu, &r); > if (!ctxt->have_exception || > exception_type(ctxt->exception.vector) =3D=3D EXCPT_T= RAP) > __kvm_set_rflags(vcpu, ctxt->eflags); > -- > 2.13.1 >