linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jim Mattson <jmattson@google.com>
To: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Joerg Roedel <joro@8bytes.org>, Xiaoyao Li <xiaoyao.li@intel.com>,
	kvm list <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 2/7] KVM: VMX: Expose IA32_PKRS MSR
Date: Wed, 12 Aug 2020 14:21:47 -0700	[thread overview]
Message-ID: <CALMp9eQiyRxJ0jkvVi+fWMZcDQbvyCcuTwH1wrYV-u_E004Bhg@mail.gmail.com> (raw)
In-Reply-To: <20200807084841.7112-3-chenyi.qiang@intel.com>

On Fri, Aug 7, 2020 at 1:46 AM Chenyi Qiang <chenyi.qiang@intel.com> wrote:
>
> Protection Keys for Supervisor Pages (PKS) uses IA32_PKRS MSR (PKRS) at
> index 0x6E1 to allow software to manage supervisor protection key
> rights. For performance consideration, PKRS intercept will be disabled
> so that the guest can access the PKRS without VM exits.
> PKS introduces dedicated control fields in VMCS to switch PKRS, which
> only does the retore part. In addition, every VM exit saves PKRS into
> the guest-state area in VMCS, while VM enter won't save the host value
> due to the expectation that the host won't change the MSR often. Update
> the host's value in VMCS manually if the MSR has been changed by the
> kernel since the last time the VMCS was run.
> The function get_current_pkrs() in arch/x86/mm/pkeys.c exports the
> per-cpu variable pkrs_cache to avoid frequent rdmsr of PKRS.
>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---

> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 11e4df560018..df2c2e733549 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -289,6 +289,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
>         dest->ds_sel = src->ds_sel;
>         dest->es_sel = src->es_sel;
>  #endif
> +       dest->pkrs = src->pkrs;

Why isn't this (and other PKRS code) inside the #ifdef CONFIG_X86_64?
PKRS isn't usable outside of long mode, is it?

>  }
>
>  static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
> index 7a3675fddec2..39ec3d0c844b 100644
> --- a/arch/x86/kvm/vmx/vmcs.h
> +++ b/arch/x86/kvm/vmx/vmcs.h
> @@ -40,6 +40,7 @@ struct vmcs_host_state {
>  #ifdef CONFIG_X86_64
>         u16           ds_sel, es_sel;
>  #endif
> +       u32           pkrs;

One thing that does seem odd throughout is that PKRS is a 64-bit
resource, but we store and pass around only 32-bits. Yes, the top 32
bits are currently reserved, but the same can be said of, say, cr4, a
few lines above this. Yet, we store and pass around cr4 as 64-bits.
How do we decide?

>  };
>
>  struct vmcs_controls_shadow {

> @@ -1163,6 +1164,20 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>          */
>         host_state->ldt_sel = kvm_read_ldt();
>
> +       /*
> +        * Update the host pkrs vmcs field before vcpu runs.
> +        * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure
> +        * kvm_cpu_cap_has(X86_FEATURE_PKS) &&
> +        * guest_cpuid_has(vcpu, X86_FEATURE_VMX).
> +        */
> +       if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) {
> +               host_pkrs = get_current_pkrs();
> +               if (unlikely(host_pkrs != host_state->pkrs)) {
> +                       vmcs_write64(HOST_IA32_PKRS, host_pkrs);
> +                       host_state->pkrs = host_pkrs;
> +               }
> +       }
> +
>  #ifdef CONFIG_X86_64
>         savesegment(ds, host_state->ds_sel);
>         savesegment(es, host_state->es_sel);
> @@ -1951,6 +1966,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 else
>                         msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>                 break;
> +       case MSR_IA32_PKRS:
> +               if (!kvm_cpu_cap_has(X86_FEATURE_PKS) ||
> +                   (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_PKS)))

Could this be simplified if we required
kvm_cpu_cap_has(X86_FEATURE_PKS) as a prerequisite for
guest_cpuid_has(vcpu, X86_FEATURE_PKS)? If not, what is the expected
behavior if guest_cpuid_has(vcpu, X86_FEATURE_PKS) is true and
kvm_cpu_cap_has(X86_FEATURE_PKS)  is false?

> +                       return 1;
> +               msr_info->data = vmcs_read64(GUEST_IA32_PKRS);
> +               break;
>         case MSR_TSC_AUX:
>                 if (!msr_info->host_initiated &&
>                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))

> @@ -7230,6 +7267,26 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>                 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>  }
>
> +static void vmx_update_pkrs_cfg(struct kvm_vcpu *vcpu)
> +{
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> +       unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> +       bool pks_supported = guest_cpuid_has(vcpu, X86_FEATURE_PKS);
> +
> +       /*
> +        * set intercept for PKRS when the guest doesn't support pks
> +        */
> +       vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PKRS, MSR_TYPE_RW, !pks_supported);
> +
> +       if (pks_supported) {
> +               vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS);
> +               vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS);
> +       } else {
> +               vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS);
> +               vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS);
> +       }
> +}

Not just your change, but it is unclear to me what the expected
behavior is when CPUID bits are modified while the guest is running.
For example, it seems that this code results in immediate behavioral
changes for an L1 guest; however, if an L2 guest is active, then there
are no behavioral changes until the next emulated VM-entry from L1 to
L2. Is that right?

On a more general note, do you have any kvm-unit-tests that exercise
the new code?

  reply	other threads:[~2020-08-12 21:22 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07  8:48 [RFC 0/7] KVM: PKS Virtualization support Chenyi Qiang
2020-08-07  8:48 ` [RFC 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
2020-08-10 23:17   ` Jim Mattson
2020-08-07  8:48 ` [RFC 2/7] KVM: VMX: Expose IA32_PKRS MSR Chenyi Qiang
2020-08-12 21:21   ` Jim Mattson [this message]
2020-08-13  5:42     ` Chenyi Qiang
2020-08-13 17:31       ` Jim Mattson
2020-08-18  7:27         ` Chenyi Qiang
2020-08-18 18:23           ` Jim Mattson
2020-08-22  3:28             ` Sean Christopherson
2021-01-26 18:01   ` Paolo Bonzini
2021-01-27  7:55     ` Chenyi Qiang
2021-02-01  9:53     ` Chenyi Qiang
2021-02-01 10:05       ` Paolo Bonzini
2020-08-07  8:48 ` [RFC 3/7] KVM: MMU: Rename the pkru to pkr Chenyi Qiang
2021-01-26 18:16   ` Paolo Bonzini
2020-08-07  8:48 ` [RFC 4/7] KVM: MMU: Refactor pkr_mask to cache condition Chenyi Qiang
2021-01-26 18:16   ` Paolo Bonzini
2021-01-27  3:14     ` Chenyi Qiang
2020-08-07  8:48 ` [RFC 5/7] KVM: MMU: Add support for PKS emulation Chenyi Qiang
2021-01-26 18:23   ` Paolo Bonzini
2021-01-27  3:00     ` Chenyi Qiang
2021-01-27  8:37       ` Paolo Bonzini
2020-08-07  8:48 ` [RFC 6/7] KVM: X86: Expose PKS to guest and userspace Chenyi Qiang
2020-08-13 19:04   ` Jim Mattson
2020-08-14  2:33     ` Chenyi Qiang
2020-09-30  4:36     ` Sean Christopherson
2021-01-26 18:24       ` Paolo Bonzini
2021-01-26 19:56         ` Sean Christopherson
2021-01-26 20:05           ` Paolo Bonzini
2020-08-07  8:48 ` [RFC 7/7] KVM: VMX: Enable PKS for nested VM Chenyi Qiang
2020-08-11  0:05   ` Jim Mattson
2020-08-12 15:00     ` Sean Christopherson
2020-08-12 18:32       ` Jim Mattson
2020-08-13  4:52     ` Chenyi Qiang
2020-08-13 17:52       ` Jim Mattson
2020-08-14 10:07         ` Chenyi Qiang
2020-08-14 17:34           ` Jim Mattson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALMp9eQiyRxJ0jkvVi+fWMZcDQbvyCcuTwH1wrYV-u_E004Bhg@mail.gmail.com \
    --to=jmattson@google.com \
    --cc=chenyi.qiang@intel.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).