From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63A75C4321A for ; Tue, 11 Jun 2019 07:38:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 27AF42145D for ; Tue, 11 Jun 2019 07:38:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NNch+St/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404248AbfFKHh4 (ORCPT ); Tue, 11 Jun 2019 03:37:56 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:36336 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404009AbfFKHh4 (ORCPT ); Tue, 11 Jun 2019 03:37:56 -0400 Received: by mail-oi1-f194.google.com with SMTP id w7so8211120oic.3; Tue, 11 Jun 2019 00:37:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=uyWNTU55JT9GK2aN646mLXTRkgWwGnf383ImPAHfNpc=; b=NNch+St/7hAl3ovYZbYCVbvfTDW9aaAdG3YFRwtyjwKjabESyVU20yJQkrnKixD3V/ MoQhUTsGszn5tB3sqV40NutooZumVOOm53GRxEuKgaiDULicGCieRRvk7R5bj/uoQa02 ZcPZNO38kFAJy/g9yETKVRDRXYZLLc6KGC1ouRVEKOAMX/wrHNeyluk2fPksMft1XpGV XGPrYHRVJCCTdzOYbSokuPBI43FsdrG+exejkRj1mJqxgRrLLhO3DvzjPBDmriC5mzoi Am+ztDUZIWoZrvOT3pXdFwClzI9uq6Ltf0K6ph2Bp0y52xWs23mGDCwtR0vMYsk/8S9I sNQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=uyWNTU55JT9GK2aN646mLXTRkgWwGnf383ImPAHfNpc=; b=jcANhyl9a0lLMLyucPCpFaBiYgG4dFvLQ8HSk2o5Yo5jpgUTXHVh7xqk3QqUa1VWKt o/MqIKytkcGnzRxHWEl9hf/so1rC3OMtK6GYbYANV/nWYtjn+ZfZ5kvVahzqBOcOxdRE vDXqSJfE88AdQlQV5WUmpdxtGGxQiXmTDBPo/Uc1xklaXHYnqMhQQfFZ/eMGh4DX7PCB VJSMQZ2SOUtlX1vStOys5CYKtPmK3mZHKgQQ7CT7DRxpT95MDGccCLlgjwyXoJ3Web/h 4X87bsauwfGoWbgYwcbdWnFexPVpfC1aJEiBQql1niFlboJ0TnPo8i3usFbg37UE+57h jKHQ== X-Gm-Message-State: APjAAAX1V3szujy6jUA7FdUqZpcldCXzEI7zshSpl8TaYJ9nja3wGkJM h1ZjU2FpZWOo+H75UkzpdCE1EjEa8XLTACuZ8rQ= X-Google-Smtp-Source: APXvYqw+OUl8lt7EinVIXV0dyYFcWVFGvEUraTxANWi0wvDXljmpbDt7OmvSt0YYFLlgqxH/x4HFS/kwFzsk6v9ZG5M= X-Received: by 2002:aca:3305:: with SMTP id z5mr12567515oiz.141.1560238675747; Tue, 11 Jun 2019 00:37:55 -0700 (PDT) MIME-Version: 1.0 References: <1558418814-6822-1-git-send-email-wanpengli@tencent.com> <1558418814-6822-2-git-send-email-wanpengli@tencent.com> <627e4189-3709-1fb2-a9bc-f1a577712fe0@redhat.com> In-Reply-To: <627e4189-3709-1fb2-a9bc-f1a577712fe0@redhat.com> From: Wanpeng Li Date: Tue, 11 Jun 2019 15:38:40 +0800 Message-ID: Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts To: Paolo Bonzini Cc: LKML , kvm , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sean Christopherson , Liran Alon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini wrote: > > On 21/05/19 08:06, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Allow guest reads CORE cstate when exposing host CPU power management c= apabilities > > to the guest. PKG cstate is restricted to avoid a guest to get the whol= e package > > information in multi-tenant scenario. > > > > Cc: Paolo Bonzini > > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > > Cc: Sean Christopherson > > Cc: Liran Alon > > Signed-off-by: Wanpeng Li > > --- > > v1 -> v2: > > * use a separate bit for KVM_CAP_X86_DISABLE_EXITS > > > > Documentation/virtual/kvm/api.txt | 1 + > > arch/x86/include/asm/kvm_host.h | 1 + > > arch/x86/kvm/vmx/vmx.c | 6 ++++++ > > arch/x86/kvm/x86.c | 5 ++++- > > arch/x86/kvm/x86.h | 5 +++++ > > include/uapi/linux/kvm.h | 4 +++- > > tools/include/uapi/linux/kvm.h | 4 +++- > > 7 files changed, 23 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/= kvm/api.txt > > index 33cd92d..91fd86f 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are > > #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) > > #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) > > #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) > > +#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) > > > > Enabling this capability on a VM provides userspace with a way to no > > longer intercept some instructions for improved latency in some > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm= _host.h > > index d5457c7..1ce8289 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -882,6 +882,7 @@ struct kvm_arch { > > bool mwait_in_guest; > > bool hlt_in_guest; > > bool pause_in_guest; > > + bool cstate_in_guest; > > > > unsigned long irq_sources_bitmap; > > s64 kvmclock_offset; > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 0861c71..da24f18 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct k= vm *kvm, unsigned int id) > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, M= SR_TYPE_RW); > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, = MSR_TYPE_RW); > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, = MSR_TYPE_RW); > > + if (kvm_cstate_in_guest(kvm)) { > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES= , MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RES= IDENCY, MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RES= IDENCY, MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RES= IDENCY, MSR_TYPE_R); > > I think I have changed my mind on the implementation of this, sorry. > > 1) We should emulate these MSRs always, otherwise the guest API changes > between different values of KVM_CAP_X86_DISABLE_EXITS which is not > intended. Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live > migration, so it should be possible to set the MSRs in the host to > change the delta between the host and guest values. > > 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are > disabled (i.e. exit happens), the MSRs will be purely emulated. > C3/C6/C7 residency will never increase (it will remain the value that is > set by the host). When the VM executes an hlt vmexit, it should save > the current TSC. When it comes back, the C1 residency MSR should be > increased by the time that has passed. > > 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but > KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen), > C3/C6/C7 residency will also never increase, but the C1 residency value > should be read using rdmsr from the host, with a delta added from the > host value. > > 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both > disabled (i.e. mwait exits do not happen), all four residency values > should be read using rdmsr from the host, with a delta added from the > host value. > > 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and > KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense > so it's okay not to be very optimized. In this case, the residency > value should be read as in (4), but hlt vmexits will be accounted as in > (2) so we need to be careful not to double-count the residency during > hlt. This means doing four rdmsr before the beginning of the hlt vmexit > and four at the end of the hlt vmexit. MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we can avoid the complex logic to handle C1 now. :) Regards, Wanpeng Li > > Therefore the data structure should be something like > > struct kvm_residency_msr { > u64 value; > bool delta_from_host; > bool count_with_host; > } > > u64 kvm_residency_read_host(struct kvm_residency_msr *msr) > { > u64 unscaled_value =3D rdmsrl(msr->index); > // apply TSC scaling... > return ... > } > > u64 kvm_residency_read(struct kvm_residency_msr *msr) > { > return msr->value + > (msr->delta_from_host ? kvm_residency_read_host(msr) : 0)= ; > } > > void kvm_residency_write(struct kvm_residency_msr *msr, > u64 value) > { > msr->value =3D value - > (msr->delta_from_host ? kvm_residency_read_host(msr) : 0)= ; > } > > // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT > // or KVM_CAP_DISABLE_EXITS_MWAIT is set > // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT > is set > void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index, > bool count_with_host) > { > /* Preserve value on calls after the first */ > u64 value =3D msr->index ? kvm_residency_read(msr) : 0; > msr->delta_from_host =3D msr->count_with_host =3D count_with_host= ; > msr->index =3D index; > kvm_residency_write(msr, value); > } > > // The following functions are called from hlt vmexits. > > void kvm_residency_start_hlt(struct kvm_residency_msr *msr) > { > if (msr->count_with_host) { > WARN_ON(msr->delta_from_host); > msr->value +=3D kvm_residency_read_host(msr); > msr->delta_from_host =3D false; > } > } > > // host_tsc_waited is 0 except for MSR_CORE_C1_RES > void kvm_residency_end_hlt(struct kvm_residency_msr *msr, > u64 host_tsc_waited) > { > if (msr->count_with_host) { > WARN_ON(!msr->delta_from_host); > msr->value -=3D kvm_residency_read_host(msr); > msr->delta_from_host =3D true; > } > if (host_tsc_waited) { > // ... apply TSC scaling to host_tsc_waited ... > msr->value +=3D ...; > } > } > > Thanks, > > Paolo