linux-doc.vger.kernel.org archive mirror

Hey Sean,

does this this patch go into the right direction?

Julian

On Wed, 2024-05-08 at 15:25 +0200, Julian Stecklina wrote:
> From: Thomas Prescher 
> 
> When a vCPU is interrupted by a signal while running a nested guest,
> KVM will exit to userspace with L2 state. However, userspace has no
> way to know whether it sees L1 or L2 state (besides calling
> KVM_GET_STATS_FD, which does not have a stable ABI).
> 
> This causes multiple problems:
> 
> The simplest one is L2 state corruption when userspace marks the sregs
> as dirty. See this mailing list thread [1] for a complete discussion.
> 
> Another problem is that if userspace decides to continue by emulating
> instructions, it will unknowingly emulate with L2 state as if L1
> doesn't exist, which can be considered a weird guest escape.
> 
> This patch introduces a new flag KVM_RUN_X86_GUEST_MODE in the kvm_run
> data structure, which is set when the vCPU exited while running a
> nested guest. Userspace can then handle this situation.
> 
> To see whether this functionality is available, this patch also
> introduces a new capability KVM_CAP_X86_GUEST_MODE.
> 
> [1]
> https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@cyberus-technology.de/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55
> 
> Signed-off-by: Thomas Prescher 
> Signed-off-by: Julian Stecklina 
> ---
>  Documentation/virt/kvm/api.rst  | 17 +++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/x86.c              |  3 +++
>  include/uapi/linux/kvm.h        |  1 +
>  4 files changed, 22 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 0b5a33ee71ee..7748c3eb98e0 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6419,6 +6419,9 @@ affect the device's behavior. Current defined flags::
>    #define KVM_RUN_X86_SMM     (1 << 0)
>    /* x86, set if bus lock detected in VM */
>    #define KVM_RUN_BUS_LOCK    (1 << 1)
> +  /* x86, set if the VCPU exited from a nested (L2) guest */
> +  #define KVM_RUN_X86_GUEST_MODE (1 << 2)
> +
>    /* arm64, set for KVM_EXIT_DEBUG */
>    #define KVM_DEBUG_ARCH_HSR_HIGH_VALID  (1 << 0)
>  
> @@ -8063,6 +8066,20 @@ error/annotated fault.
>  
>  See KVM_EXIT_MEMORY_FAULT for more information.
>  
> +7.34 KVM_CAP_X86_GUEST_MODE
> +------------------------------
> +
> +:Architectures: x86
> +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> +
> +The presence of this capability indicates that KVM_RUN will update the
> +KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
> +vCPU was executing nested guest code when it exited.
> +
> +KVM exits with the register state of either the L1 or L2 guest
> +depending on which executed at the time of an exit. Userspace must
> +take care to differentiate between these cases.
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index ef11aa4cab42..ff4ed82a2d06 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_ioapic_state {
>  
>  #define KVM_RUN_X86_SMM		 (1 << 0)
>  #define KVM_RUN_X86_BUS_LOCK     (1 << 1)
> +#define KVM_RUN_X86_GUEST_MODE   (1 << 2)
>  
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 91478b769af0..64f2cba9345e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4714,6 +4714,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
> ext)
>  	case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
>  	case KVM_CAP_IRQFD_RESAMPLE:
>  	case KVM_CAP_MEMORY_FAULT_INFO:
> +	case KVM_CAP_X86_GUEST_MODE:
>  		r = 1;
>  		break;
>  	case KVM_CAP_EXIT_HYPERCALL:
> @@ -10200,6 +10201,8 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
>  
>  	if (is_smm(vcpu))
>  		kvm_run->flags |= KVM_RUN_X86_SMM;
> +	if (is_guest_mode(vcpu))
> +		kvm_run->flags |= KVM_RUN_X86_GUEST_MODE;
>  }
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..ccb12f6a656d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_MEMORY_ATTRIBUTES 233
>  #define KVM_CAP_GUEST_MEMFD 234
>  #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_GUEST_MODE 236
>  
>  struct kvm_irq_routing_irqchip {
>  	__u32 irqchip;

linux-doc.vger.kernel.org archive mirror

Re: [PATCH] KVM: x86: add KVM_RUN_X86_GUEST_MODE kvm_run flag

Re: [PATCH v8 11/12] Documentation: driver-api: pps: Add Intel Timed I/O PPS generator

Re: [PATCH v3 1/2] proc: restrict /proc/pid/mem access via param knobs

Re: [PATCH v3 1/2] proc: restrict /proc/pid/mem access via param knobs

Re: [PATCH net-next v9 04/14] netdev: support binding dma-buf to netdevice

Re: [PATCH net-next v9 04/14] netdev: support binding dma-buf to netdevice

Re: [PATCH net-next v9 12/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags

Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg

Re: [PATCH] Documentation: sound: Fix trailing whitespaces

[PATCH] Documentation: sound: Fix trailing whitespaces

[PATCH v7] usb-storage: Optimize scan delay more precisely

Re: [PATCH] Documentation: kunit: Clarify test filter format

Re: [PATCH v2] Documentation: kunit: Clarify test filter format

Re: [PATCH v2] Documentation: kunit: Clarify test filter format

Re: [PATCH v2 1/1] lib: add version into /proc/allocinfo output

Re: [PATCH v18 16/21] fsverity: expose verified fsverity built-in signatures to LSMs

Re: [PATCH v18 20/21] Documentation: add ipe documentation

Re: [PATCH v4 02/11] riscv: add ISA extensions validation

[PATCH v5 5/9] drm/mipi-dsi: Introduce mipi_dsi_*_write_seq_multi()

[PATCH v5 0/9] drm/mipi-dsi: Reduce bloat and add funcs for cleaner init seqs

Re: [PATCH net-next v9 00/14] Device Memory TCP

Re: [PATCH v2 1/1] lib: add version into /proc/allocinfo output

Re: [PATCH net-next v13 2/4] ethtool: provide customized dim profile management

Re: [PATCH v2 1/1] lib: add version into /proc/allocinfo output

Re: [PATCH] riscv: Extend sv39 linear mapping max size to 128G