KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Alexander Graf <graf@amazon.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"Jim Mattson" <jmattson@google.com>,
	Joerg Roedel <joro@8bytes.org>,
	"KarimAllah Raslan" <karahmed@amazon.de>,
	Aaron Lewis <aaronlewis@google.com>, <kvm@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 1/3] KVM: x86: Deflect unknown MSR accesses to user space
Date: Mon, 3 Aug 2020 13:34:28 +0200
Message-ID: <4155720f-73ce-0857-134f-52a1f09d15cd@amazon.com> (raw)
In-Reply-To: <873654q89j.fsf@vitty.brq.redhat.com>



On 03.08.20 13:27, Vitaly Kuznetsov wrote:
> Alexander Graf <graf@amazon.com> writes:
> 
>> MSRs are weird. Some of them are normal control registers, such as EFER.
>> Some however are registers that really are model specific, not very
>> interesting to virtualization workloads, and not performance critical.
>> Others again are really just windows into package configuration.
>>
>> Out of these MSRs, only the first category is necessary to implement in
>> kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
>> certain CPU models and MSRs that contain information on the package level
>> are much better suited for user space to process. However, over time we have
>> accumulated a lot of MSRs that are not the first category, but still handled
>> by in-kernel KVM code.
>>
>> This patch adds a generic interface to handle WRMSR and RDMSR from user
>> space. With this, any future MSR that is part of the latter categories can
>> be handled in user space.
>>
>> Furthermore, it allows us to replace the existing "ignore_msrs" logic with
>> something that applies per-VM rather than on the full system. That way you
>> can run productive VMs in parallel to experimental ones where you don't care
>> about proper MSR handling.
>>
>> Signed-off-by: Alexander Graf <graf@amazon.com>
>>
>> ---
>>
>> v1 -> v2:
>>
>>    - s/ETRAP_TO_USER_SPACE/ENOENT/g
>>    - deflect all #GP injection events to user space, not just unknown MSRs.
>>      That was we can also deflect allowlist errors later
>>    - fix emulator case
>>
>> v2 -> v3:
>>
>>    - return r if r == X86EMUL_IO_NEEDED
>>    - s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g
>>    - s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g
>>    - Use complete_userspace_io logic instead of reply field
>>    - Simplify trapping code
>> ---
>>   Documentation/virt/kvm/api.rst  |  62 +++++++++++++++++++
>>   arch/x86/include/asm/kvm_host.h |   6 ++
>>   arch/x86/kvm/emulate.c          |  18 +++++-
>>   arch/x86/kvm/x86.c              | 106 ++++++++++++++++++++++++++++++--
>>   include/trace/events/kvm.h      |   2 +-
>>   include/uapi/linux/kvm.h        |  10 +++
>>   6 files changed, 197 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 320788f81a05..79c3e2fdfae4 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -5155,6 +5155,35 @@ Note that KVM does not skip the faulting instruction as it does for
>>   KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
>>   if it decides to decode and emulate the instruction.
>>
>> +::
>> +
>> +             /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
>> +             struct {
>> +                     __u8 error;
>> +                     __u8 pad[3];
>> +                     __u32 index;
>> +                     __u64 data;
>> +             } msr;
>> +
>> +Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
>> +enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
>> +will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
>> +exit for writes.
>> +
>> +For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
>> +wants to read. To respond to this request with a successful read, user space
>> +writes the respective data into the "data" field and must continue guest
>> +execution to ensure the read data is transferred into guest register state.
>> +
>> +If the RDMSR request was unsuccessful, user space indicates that with a "1" in
>> +the "error" field. This will inject a #GP into the guest when the VCPU is
>> +executed again.
>> +
>> +For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
>> +wants to write. Once finished processing the event, user space must continue
>> +vCPU execution. If the MSR write was unsuccessful, user space also sets the
>> +"error" field to "1".
>> +
>>   ::
>>
>>                /* Fix the size of the union. */
>> @@ -5844,6 +5873,28 @@ controlled by the kvm module parameter halt_poll_ns. This capability allows
>>   the maximum halt time to specified on a per-VM basis, effectively overriding
>>   the module parameter for the target VM.
>>
>> +7.21 KVM_CAP_X86_USER_SPACE_MSR
>> +-------------------------------
>> +
>> +:Architectures: x86
>> +:Target: VM
>> +:Parameters: args[0] is 1 if user space MSR handling is enabled, 0 otherwise
>> +:Returns: 0 on success; -1 on error
>> +
>> +This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
>> +into user space.
>> +
>> +When a guest requests to read or write an MSR, KVM may not implement all MSRs
>> +that are relevant to a respective system. It also does not differentiate by
>> +CPU type.
>> +
>> +To allow more fine grained control over MSR handling, user space may enable
>> +this capability. With it enabled, MSR accesses that would usually trigger
>> +a #GP event inside the guest by KVM will instead trigger KVM_EXIT_X86_RDMSR
>> +and KVM_EXIT_X86_WRMSR exit notifications which user space can then handle to
>> +implement model specific MSR handling and/or user notifications to inform
>> +a user that an MSR was not handled.
>> +
>>   8. Other capabilities.
>>   ======================
>>
>> @@ -6151,3 +6202,14 @@ KVM can therefore start protected VMs.
>>   This capability governs the KVM_S390_PV_COMMAND ioctl and the
>>   KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>>   guests when the state change is invalid.
>> +
>> +8.24 KVM_CAP_X86_USER_SPACE_MSR
>> +----------------------------
>> +
>> +:Architectures: x86
>> +
>> +This capability indicates that KVM supports deflection of MSR reads and
>> +writes to user space. It can be enabled on a VM level. If enabled, MSR
>> +accesses that would usually trigger a #GP by KVM into the guest will
>> +instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
>> +KVM_EXIT_X86_WRMSR exit notifications.
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index be5363b21540..809eed0dbdea 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -829,6 +829,9 @@ struct kvm_vcpu_arch {
>>
>>        /* AMD MSRC001_0015 Hardware Configuration */
>>        u64 msr_hwcr;
>> +
>> +     /* User space is handling an MSR request */
>> +     bool pending_user_msr;
>>   };
>>
>>   struct kvm_lpage_info {
>> @@ -1002,6 +1005,9 @@ struct kvm_arch {
>>        bool guest_can_read_msr_platform_info;
>>        bool exception_payload_enabled;
>>
>> +     /* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
>> +     bool user_space_msr_enabled;
>> +
>>        struct kvm_pmu_event_filter *pmu_event_filter;
>>        struct task_struct *nx_lpage_recovery_thread;
>>   };
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index d0e2825ae617..744ab9c92b73 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -3689,11 +3689,18 @@ static int em_dr_write(struct x86_emulate_ctxt *ctxt)
>>
>>   static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
>>   {
>> +     u64 msr_index = reg_read(ctxt, VCPU_REGS_RCX);
>>        u64 msr_data;
>> +     int r;
>>
>>        msr_data = (u32)reg_read(ctxt, VCPU_REGS_RAX)
>>                | ((u64)reg_read(ctxt, VCPU_REGS_RDX) << 32);
>> -     if (ctxt->ops->set_msr(ctxt, reg_read(ctxt, VCPU_REGS_RCX), msr_data))
>> +     r = ctxt->ops->set_msr(ctxt, msr_index, msr_data);
>> +
>> +     if (r == X86EMUL_IO_NEEDED)
>> +             return r;
>> +
>> +     if (r)
>>                return emulate_gp(ctxt, 0);
>>
>>        return X86EMUL_CONTINUE;
>> @@ -3701,9 +3708,16 @@ static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
>>
>>   static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
>>   {
>> +     u64 msr_index = reg_read(ctxt, VCPU_REGS_RCX);
>>        u64 msr_data;
>> +     int r;
>> +
>> +     r = ctxt->ops->get_msr(ctxt, msr_index, &msr_data);
>> +
>> +     if (r == X86EMUL_IO_NEEDED)
>> +             return r;
>>
>> -     if (ctxt->ops->get_msr(ctxt, reg_read(ctxt, VCPU_REGS_RCX), &msr_data))
>> +     if (r)
>>                return emulate_gp(ctxt, 0);
>>
>>        *reg_write(ctxt, VCPU_REGS_RAX) = (u32)msr_data;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 88c593f83b28..24c72250f6df 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1549,12 +1549,75 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_set_msr);
>>
>> +static int complete_emulated_msr(struct kvm_vcpu *vcpu, bool is_read)
>> +{
>> +     BUG_ON(!vcpu->arch.pending_user_msr);
>> +
>> +     if (vcpu->run->msr.error) {
>> +             kvm_inject_gp(vcpu, 0);
>> +     } else if (is_read) {
>> +             kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
>> +             kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
>> +     }
>> +
>> +     return kvm_skip_emulated_instruction(vcpu);
>> +}
>> +
>> +static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
>> +{
>> +     return complete_emulated_msr(vcpu, true);
>> +}
>> +
>> +static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
>> +{
>> +     return complete_emulated_msr(vcpu, false);
>> +}
>> +
>> +static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index)
>> +{
>> +     if (!vcpu->kvm->arch.user_space_msr_enabled)
>> +             return 0;
>> +
>> +     vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR;
>> +     vcpu->run->msr.error = 0;
>> +     vcpu->run->msr.index = index;
>> +     vcpu->arch.pending_user_msr = true;
>> +     vcpu->arch.complete_userspace_io = complete_emulated_rdmsr;
>> +
>> +     return 1;
>> +}
>> +
>> +static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data)
>> +{
>> +     if (!vcpu->kvm->arch.user_space_msr_enabled)
>> +             return 0;
>> +
>> +     vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR;
>> +     vcpu->run->msr.error = 0;
>> +     vcpu->run->msr.index = index;
>> +     vcpu->run->msr.data = data;
>> +     vcpu->arch.pending_user_msr = true;
>> +     vcpu->arch.complete_userspace_io = complete_emulated_wrmsr;
> 
> I'm probably missing something but where do we reset
> vcpu->arch.pending_user_msr? Shouldn't it be done in
> complete_emulated_msr()?

It's even worse than that: We don't need it at all. I'll remove it for v4.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




  reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 21:49 [PATCH v3 0/3] Allow user space to restrict and augment MSR emulation Alexander Graf
2020-07-31 21:49 ` [PATCH v3 1/3] KVM: x86: Deflect unknown MSR accesses to user space Alexander Graf
2020-07-31 23:36   ` Jim Mattson
2020-08-03 10:08     ` Alexander Graf
2020-08-03 11:27   ` Vitaly Kuznetsov
2020-08-03 11:34     ` Alexander Graf [this message]
2020-07-31 21:49 ` [PATCH v3 2/3] KVM: x86: Introduce allow list for MSR emulation Alexander Graf
2020-08-03 11:37   ` Vitaly Kuznetsov
2020-08-03 20:50     ` Alexander Graf
2020-08-03 21:23       ` Sean Christopherson
2020-07-31 21:49 ` [PATCH v3 3/3] KVM: selftests: Add test for user space MSR handling Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4155720f-73ce-0857-134f-52a1f09d15cd@amazon.com \
    --to=graf@amazon.com \
    --cc=aaronlewis@google.com \
    --cc=corbet@lwn.net \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=karahmed@amazon.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git