linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Nicolas Saenz Julienne <nsaenz@amazon.com>, kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com,
	anelkz@amazon.com, graf@amazon.com, dwmw@amazon.co.uk,
	jgowans@amazon.com, corbert@lwn.net, kys@microsoft.com,
	haiyangz@microsoft.com, decui@microsoft.com, x86@kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [RFC 13/33] KVM: Allow polling vCPUs for events
Date: Tue, 28 Nov 2023 09:30:53 +0200	[thread overview]
Message-ID: <433aa3b530bcf92b4b843153d6e3919cdb623308.camel@redhat.com> (raw)
In-Reply-To: <20231108111806.92604-14-nsaenz@amazon.com>

On Wed, 2023-11-08 at 11:17 +0000, Nicolas Saenz Julienne wrote:
> A number of use cases have surfaced where it'd be beneficial to have a
> vCPU stop its execution in user-space, as opposed to having it sleep
> in-kernel. Be it in order to make better use of the pCPU's time while
> the vCPU is halted, or to implement security features like Hyper-V's
> VSM.


> 
> A problem with this approach is that user-space has no way of knowing
> whether the vCPU has pending events (interrupts, timers, etc...), so we
> need a new interface to query if they are. poll() turned out to be a
> very good fit.
> 
> So enable polling vCPUs. The poll() interface considers a vCPU has a
> pending event if it didn't enter the guest since being kicked by an
> event source (being kicked forces a guest exit). Kicking a vCPU that has
> pollers wakes up the polling threads.
> 
> NOTES:
>  - There is a race between the 'vcpu->kicked' check in the polling
>    thread and the vCPU thread re-entering the guest. This hardly affects
>    the use-cases stated above, but needs to be fixed.
> 
>  - This was tested alongside a WIP Hyper-V Virtual Trust Level
>    implementation which makes ample use of the poll() interface.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> ---
>  arch/x86/kvm/x86.c       |  2 ++
>  include/linux/kvm_host.h |  2 ++
>  virt/kvm/kvm_main.c      | 30 ++++++++++++++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 57f9c58e1e32..bf4891bc044e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10788,6 +10788,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		goto cancel_injection;
>  	}
>  
> +	WRITE_ONCE(vcpu->kicked, false);
> +
>  	if (req_immediate_exit) {
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  		static_call(kvm_x86_request_immediate_exit)(vcpu);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 687589ce9f63..71e1e8cf8936 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -336,6 +336,7 @@ struct kvm_vcpu {
>  #endif
>  	int mode;
>  	u64 requests;
> +	bool kicked;
>  	unsigned long guest_debug;
>  
>  	struct mutex mutex;
> @@ -395,6 +396,7 @@ struct kvm_vcpu {
>  	 */
>  	struct kvm_memory_slot *last_used_slot;
>  	u64 last_used_slot_gen;
> +	wait_queue_head_t wqh;
>  };
>  
>  /*
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ad9aab898a0c..fde004a0ac46 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -497,12 +497,14 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>  	kvm_vcpu_set_dy_eligible(vcpu, false);
>  	vcpu->preempted = false;
>  	vcpu->ready = false;
> +	vcpu->kicked = false;
>  	preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
>  	vcpu->last_used_slot = NULL;
>  
>  	/* Fill the stats id string for the vcpu */
>  	snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
>  		 task_pid_nr(current), id);
> +	init_waitqueue_head(&vcpu->wqh);
>  }
>  
>  static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
> @@ -3970,6 +3972,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>  		if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>  			smp_send_reschedule(cpu);
>  	}
> +
> +	if (!cmpxchg(&vcpu->kicked, false, true))
> +		wake_up_interruptible(&vcpu->wqh);
> +
>  out:
>  	put_cpu();
>  }
> @@ -4174,6 +4180,29 @@ static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
>  	return 0;
>  }
>  
> +static __poll_t kvm_vcpu_poll(struct file *file, poll_table *wait)
> +{
> +	struct kvm_vcpu *vcpu = file->private_data;
> +
> +	poll_wait(file, &vcpu->wqh, wait);
> +
> +	/*
> +	 * Make sure we read vcpu->kicked after adding the vcpu into
> +	 * the waitqueue list. Otherwise we might have the following race:
> +	 *
> +	 *   READ_ONCE(vcpu->kicked)
> +	 *					cmpxchg(&vcpu->kicked, false, true))
> +	 *					wake_up_interruptible(&vcpu->wqh)
> +	 *   list_add_tail(wait, &vcpu->wqh)
> +	 */
> +	smp_mb();
> +	if (READ_ONCE(vcpu->kicked)) {
> +		return EPOLLIN;
> +	}
> +
> +	return 0;
> +}
> +
>  static int kvm_vcpu_release(struct inode *inode, struct file *filp)
>  {
>  	struct kvm_vcpu *vcpu = filp->private_data;
> @@ -4186,6 +4215,7 @@ static const struct file_operations kvm_vcpu_fops = {
>  	.release        = kvm_vcpu_release,
>  	.unlocked_ioctl = kvm_vcpu_ioctl,
>  	.mmap           = kvm_vcpu_mmap,
> +	.poll		= kvm_vcpu_poll,
>  	.llseek		= noop_llseek,
>  	KVM_COMPAT(kvm_vcpu_compat_ioctl),
>  };



A few ideas on the design:

I think that we can do this in a simpler way.


I am thinking about the following API:

-> vCPU does vtlcall and KVM exits to the userspace.

-> The userspace sets the vCPU to the new MP runstate (KVM_MP_STATE_HALTED_USERSPACE) which is just like regular halt
but once vCPU is ready to run, the KVM instead exits to userspace.


-> The userspace does another KVM_RUN which blocks till an event comes and exits back to userspace.

-> The userspace can now decide what to do, and it might for example send signal to vCPU thread which runs VTL0,
to kick it out of the guest mode, and resume the VTL1 vCPU.


Best regards,
	Maxim Levitsky 







  reply	other threads:[~2023-11-28  7:31 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 11:17 [RFC 0/33] KVM: x86: hyperv: Introduce VSM support Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 01/33] KVM: x86: Decouple lapic.h from hyperv.h Nicolas Saenz Julienne
2023-11-08 16:11   ` Sean Christopherson
2023-11-08 11:17 ` [RFC 02/33] KVM: x86: Introduce KVM_CAP_APIC_ID_GROUPS Nicolas Saenz Julienne
2023-11-08 12:11   ` Alexander Graf
2023-11-08 17:47   ` Sean Christopherson
2023-11-10 18:46     ` Nicolas Saenz Julienne
2023-11-28  6:56   ` Maxim Levitsky
2023-12-01 15:25     ` Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 03/33] KVM: x86: hyper-v: Introduce XMM output support Nicolas Saenz Julienne
2023-11-08 11:44   ` Alexander Graf
2023-11-08 12:11     ` Vitaly Kuznetsov
2023-11-08 12:16       ` Alexander Graf
2023-11-28  6:57         ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 04/33] KVM: x86: hyper-v: Move hypercall page handling into separate function Nicolas Saenz Julienne
2023-11-28  7:01   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page Nicolas Saenz Julienne
2023-11-08 11:53   ` Alexander Graf
2023-11-08 14:10     ` Nicolas Saenz Julienne
2023-11-28  7:08   ` Maxim Levitsky
2023-11-28 16:33     ` Sean Christopherson
2023-12-01 16:19     ` Nicolas Saenz Julienne
2023-12-01 16:32       ` Sean Christopherson
2023-12-01 16:50         ` Nicolas Saenz Julienne
2023-12-01 17:47           ` Sean Christopherson
2023-12-01 18:15             ` Nicolas Saenz Julienne
2023-12-05 19:21               ` Sean Christopherson
2023-12-05 20:04                 ` Maxim Levitsky
2023-12-06  0:07                   ` Sean Christopherson
2023-12-06 16:19                     ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 06/33] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs Nicolas Saenz Julienne
2023-11-28  7:14   ` Maxim Levitsky
2023-12-01 16:31     ` Nicolas Saenz Julienne
2023-12-05 15:02       ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 07/33] KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_VSM Nicolas Saenz Julienne
2023-11-28  7:16   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 08/33] KVM: x86: Don't use hv_timer if CAP_HYPERV_VSM enabled Nicolas Saenz Julienne
2023-11-28  7:21   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 09/33] KVM: x86: hyper-v: Introduce per-VTL vcpu helpers Nicolas Saenz Julienne
2023-11-08 12:21   ` Alexander Graf
2023-11-08 14:04     ` Nicolas Saenz Julienne
2023-11-28  7:25   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 10/33] KVM: x86: hyper-v: Introduce KVM_HV_GET_VSM_STATE Nicolas Saenz Julienne
2023-11-28  7:26   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 11/33] KVM: x86: hyper-v: Handle GET/SET_VP_REGISTER hcall in user-space Nicolas Saenz Julienne
2023-11-08 12:14   ` Alexander Graf
2023-11-28  7:26     ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 12/33] KVM: x86: hyper-v: Handle VSM hcalls " Nicolas Saenz Julienne
2023-11-28  7:28   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 13/33] KVM: Allow polling vCPUs for events Nicolas Saenz Julienne
2023-11-28  7:30   ` Maxim Levitsky [this message]
2023-11-08 11:17 ` [RFC 14/33] KVM: x86: Add VTL to the MMU role Nicolas Saenz Julienne
2023-11-08 17:26   ` Sean Christopherson
2023-11-10 18:52     ` Nicolas Saenz Julienne
2023-11-28  7:34       ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 15/33] KVM: x86/mmu: Introduce infrastructure to handle non-executable faults Nicolas Saenz Julienne
2023-11-28  7:34   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 16/33] KVM: x86/mmu: Expose R/W/X flags during memory fault exits Nicolas Saenz Julienne
2023-11-28  7:36   ` Maxim Levitsky
2023-11-28 16:31     ` Sean Christopherson
2023-11-08 11:17 ` [RFC 17/33] KVM: x86/mmu: Allow setting memory attributes if VSM enabled Nicolas Saenz Julienne
2023-11-28  7:39   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 18/33] KVM: x86: Decouple kvm_get_memory_attributes() from struct kvm's mem_attr_array Nicolas Saenz Julienne
2023-11-08 16:59   ` Sean Christopherson
2023-11-28  7:41   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 19/33] KVM: x86: Decouple kvm_range_has_memory_attributes() " Nicolas Saenz Julienne
2023-11-28  7:42   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 20/33] KVM: x86/mmu: Decouple hugepage_has_attrs() " Nicolas Saenz Julienne
2023-11-28  7:43   ` Maxim Levitsky
2023-11-08 11:17 ` [RFC 21/33] KVM: Pass memory attribute array as a MMU notifier argument Nicolas Saenz Julienne
2023-11-08 17:08   ` Sean Christopherson
2023-11-08 11:17 ` [RFC 22/33] KVM: Decouple kvm_ioctl_set_mem_attributes() from kvm's mem_attr_array Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 23/33] KVM: Expose memory attribute helper functions unanimously Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 24/33] KVM: x86: hyper-v: Introduce KVM VTL device Nicolas Saenz Julienne
2023-11-08 11:17 ` [RFC 25/33] KVM: Introduce a set of new memory attributes Nicolas Saenz Julienne
2023-11-08 12:30   ` Alexander Graf
2023-11-08 16:43     ` Sean Christopherson
2023-11-08 11:17 ` [RFC 26/33] KVM: x86: hyper-vsm: Allow setting per-VTL " Nicolas Saenz Julienne
2023-11-28  7:44   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 27/33] KVM: x86/mmu/hyper-v: Validate memory faults against per-VTL memprots Nicolas Saenz Julienne
2023-11-28  7:46   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 28/33] x86/hyper-v: Introduce memory intercept message structure Nicolas Saenz Julienne
2023-11-28  7:53   ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 29/33] KVM: VMX: Save instruction length on EPT violation Nicolas Saenz Julienne
2023-11-08 12:40   ` Alexander Graf
2023-11-08 16:15     ` Sean Christopherson
2023-11-08 17:11       ` Alexander Graf
2023-11-08 17:20   ` Sean Christopherson
2023-11-08 17:27     ` Alexander Graf
2023-11-08 18:19       ` Jim Mattson
2023-11-08 11:18 ` [RFC 30/33] KVM: x86: hyper-v: Introduce KVM_REQ_HV_INJECT_INTERCEPT request Nicolas Saenz Julienne
2023-11-08 12:45   ` Alexander Graf
2023-11-08 13:38     ` Nicolas Saenz Julienne
2023-11-28  8:19       ` Maxim Levitsky
2023-11-08 11:18 ` [RFC 31/33] KVM: x86: hyper-v: Inject intercept on VTL memory protection fault Nicolas Saenz Julienne
2023-11-08 11:18 ` [RFC 32/33] KVM: x86: hyper-v: Implement HVCALL_TRANSLATE_VIRTUAL_ADDRESS Nicolas Saenz Julienne
2023-11-08 12:49   ` Alexander Graf
2023-11-08 13:44     ` Nicolas Saenz Julienne
2023-11-08 11:18 ` [RFC 33/33] Documentation: KVM: Introduce "Emulating Hyper-V VSM with KVM" Nicolas Saenz Julienne
2023-11-28  8:19   ` Maxim Levitsky
2023-11-08 11:40 ` [RFC 0/33] KVM: x86: hyperv: Introduce VSM support Alexander Graf
2023-11-08 14:41   ` Nicolas Saenz Julienne
2023-11-08 16:55 ` Sean Christopherson
2023-11-08 18:33   ` Sean Christopherson
2023-11-10 17:56     ` Nicolas Saenz Julienne
2023-11-10 19:32       ` Sean Christopherson
2023-11-11 11:55         ` Nicolas Saenz Julienne
2023-11-10 19:04   ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=433aa3b530bcf92b4b843153d6e3919cdb623308.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=anelkz@amazon.com \
    --cc=corbert@lwn.net \
    --cc=decui@microsoft.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=haiyangz@microsoft.com \
    --cc=jgowans@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nsaenz@amazon.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).