All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals
@ 2017-02-08 11:10 Paolo Bonzini
  2017-02-08 13:18 ` Christian Borntraeger
  2017-02-08 13:26 ` Radim Krčmář
  0 siblings, 2 replies; 4+ messages in thread
From: Paolo Bonzini @ 2017-02-08 11:10 UTC (permalink / raw)
  To: linux-kernel, kvm

The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:

          VCPU thread                     service thread
   --------------------------------------------------------------
        check flag
                                          set flag
                                          raise signal
        (signal handler does nothing)
        KVM_RUN

However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).

As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:

          VCPU thread                     service thread
   --------------------------------------------------------------
                                          raise signal
        signal handler
          set run->immediate_exit
        KVM_RUN
          check run->immediate_exit

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virtual/kvm/api.txt | 13 ++++++++++++-
 include/uapi/linux/kvm.h          |  1 +
 virt/kvm/kvm_main.c               | 14 +++++++++++---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e4f2cdcf78eb..925b1b6be073 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3389,7 +3389,18 @@ struct kvm_run {
 Request that KVM_RUN return when it becomes possible to inject external
 interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
 
-	__u8 padding1[7];
+	__u8 immediate_exit;
+
+This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
+clears the field and exits immediately, returning -EINTR.  In the common
+scenario where a signal is used to "kick" a VCPU out of KVM_RUN, this
+field can be used as an alternative to KVM_SET_SIGNAL_MASK: instead of
+blocking the signal outside KVM_RUN, the signal handler should set
+run->immediate_exit to a nonzero value.
+
+This field is only available if KVM_CAP_IMMEDIATE_EXIT is.
+
+	__u8 padding1[6];
 
 	/* out */
 	__u32 exit_reason;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7964b970b9ad..0aa43b1e5b1d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -881,6 +881,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SPAPR_RESIZE_HPT 133
 #define KVM_CAP_PPC_MMU_RADIX 134
 #define KVM_CAP_PPC_MMU_HASH_V3 135
+#define KVM_CAP_IMMEDIATE_EXIT 136
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 482612b4e496..5abb8df00db1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2550,7 +2550,8 @@ static long kvm_vcpu_ioctl(struct file *filp,
 	if (r)
 		return r;
 	switch (ioctl) {
-	case KVM_RUN:
+	case KVM_RUN: {
+		struct kvm_run *run;
 		r = -EINVAL;
 		if (arg)
 			goto out;
@@ -2564,9 +2565,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
 				synchronize_rcu();
 			put_pid(oldpid);
 		}
-		r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
-		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
+		run = vcpu->run;
+		if (run->immediate_exit) {
+			WRITE_ONCE(run->immediate_exit, 0);
+			return -EINTR;
+		}
+		r = kvm_arch_vcpu_ioctl_run(vcpu, run);
+		trace_kvm_userspace_exit(run->exit_reason, r);
 		break;
+	}
 	case KVM_GET_REGS: {
 		struct kvm_regs *kvm_regs;
 
@@ -2927,6 +2934,7 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
 	case KVM_CAP_CHECK_EXTENSION_VM:
+	case KVM_CAP_IMMEDIATE_EXIT:
 		return 1;
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 	case KVM_CAP_IRQ_ROUTING:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals
  2017-02-08 11:10 [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals Paolo Bonzini
@ 2017-02-08 13:18 ` Christian Borntraeger
  2017-02-08 14:10   ` Paolo Bonzini
  2017-02-08 13:26 ` Radim Krčmář
  1 sibling, 1 reply; 4+ messages in thread
From: Christian Borntraeger @ 2017-02-08 13:18 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm

On 02/08/2017 12:10 PM, Paolo Bonzini wrote:
> The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
> a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
> to a dummy signal handler; by blocking the signal outside KVM_RUN and
> unblocking it inside, this possible race is closed:
> 
>           VCPU thread                     service thread
>    --------------------------------------------------------------
>         check flag
>                                           set flag
>                                           raise signal
>         (signal handler does nothing)
>         KVM_RUN
> 
> However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
> tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
> remote NUMA node, because it is on the node of a thread's creator.
> Taking this lock can be very expensive if there are many userspace
> exits (as is the case for SMP Windows VMs without Hyper-V reference
> time counter).
> 
> As an alternative, we can put the flag directly in kvm_run so that
> KVM can see it:
> 
>           VCPU thread                     service thread
>    --------------------------------------------------------------
>                                           raise signal
>         signal handler
>           set run->immediate_exit
>         KVM_RUN
>           check run->immediate_exit

So the idea is to have both, a signal and this flag and you want userspace
to set this flag in its signal handler? So we no longer block this signal
in QEMU then. Makes sense.
Do you have the QEMU patch ready, to do a better review of the whole idea?

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virtual/kvm/api.txt | 13 ++++++++++++-
>  include/uapi/linux/kvm.h          |  1 +
>  virt/kvm/kvm_main.c               | 14 +++++++++++---
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index e4f2cdcf78eb..925b1b6be073 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3389,7 +3389,18 @@ struct kvm_run {
>  Request that KVM_RUN return when it becomes possible to inject external
>  interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
> 
> -	__u8 padding1[7];
> +	__u8 immediate_exit;
> +
> +This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
> +clears the field and exits immediately, returning -EINTR.  In the common
> +scenario where a signal is used to "kick" a VCPU out of KVM_RUN, this
> +field can be used as an alternative to KVM_SET_SIGNAL_MASK: instead of
> +blocking the signal outside KVM_RUN, the signal handler should set
> +run->immediate_exit to a nonzero value.
> +
> +This field is only available if KVM_CAP_IMMEDIATE_EXIT is.
> +
> +	__u8 padding1[6];
> 
>  	/* out */
>  	__u32 exit_reason;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 7964b970b9ad..0aa43b1e5b1d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -881,6 +881,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_SPAPR_RESIZE_HPT 133
>  #define KVM_CAP_PPC_MMU_RADIX 134
>  #define KVM_CAP_PPC_MMU_HASH_V3 135
> +#define KVM_CAP_IMMEDIATE_EXIT 136
> 
>  #ifdef KVM_CAP_IRQ_ROUTING
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 482612b4e496..5abb8df00db1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2550,7 +2550,8 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  	if (r)
>  		return r;
>  	switch (ioctl) {
> -	case KVM_RUN:
> +	case KVM_RUN: {
> +		struct kvm_run *run;
>  		r = -EINVAL;
>  		if (arg)
>  			goto out;
> @@ -2564,9 +2565,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  				synchronize_rcu();
>  			put_pid(oldpid);
>  		}
> -		r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
> -		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> +		run = vcpu->run;
> +		if (run->immediate_exit) {
> +			WRITE_ONCE(run->immediate_exit, 0);
> +			return -EINTR;
> +		}
> +		r = kvm_arch_vcpu_ioctl_run(vcpu, run);
> +		trace_kvm_userspace_exit(run->exit_reason, r);
>  		break;
> +	}
>  	case KVM_GET_REGS: {
>  		struct kvm_regs *kvm_regs;
> 
> @@ -2927,6 +2934,7 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #endif
>  	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
>  	case KVM_CAP_CHECK_EXTENSION_VM:
> +	case KVM_CAP_IMMEDIATE_EXIT:
>  		return 1;
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>  	case KVM_CAP_IRQ_ROUTING:
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals
  2017-02-08 11:10 [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals Paolo Bonzini
  2017-02-08 13:18 ` Christian Borntraeger
@ 2017-02-08 13:26 ` Radim Krčmář
  1 sibling, 0 replies; 4+ messages in thread
From: Radim Krčmář @ 2017-02-08 13:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

2017-02-08 12:10+0100, Paolo Bonzini:
> The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
> a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
> to a dummy signal handler; by blocking the signal outside KVM_RUN and
> unblocking it inside, this possible race is closed:
> 
>           VCPU thread                     service thread
>    --------------------------------------------------------------
>         check flag
>                                           set flag
>                                           raise signal
>         (signal handler does nothing)
>         KVM_RUN
> 
> However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
> tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
> remote NUMA node, because it is on the node of a thread's creator.
> Taking this lock can be very expensive if there are many userspace
> exits (as is the case for SMP Windows VMs without Hyper-V reference
> time counter).
> 
> As an alternative, we can put the flag directly in kvm_run so that
> KVM can see it:
> 
>           VCPU thread                     service thread
>    --------------------------------------------------------------
>                                           raise signal
>         signal handler
>           set run->immediate_exit
>         KVM_RUN
>           check run->immediate_exit
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> @@ -2564,9 +2565,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  				synchronize_rcu();
>  			put_pid(oldpid);
>  		}
> -		r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
> -		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> +		run = vcpu->run;
> +		if (run->immediate_exit) {
> +			WRITE_ONCE(run->immediate_exit, 0);
> +			return -EINTR;
> +		}

QEMU also uses self-kick to complete IO, but run->immediate_exit is
checked too soon for that.  I think we should move it at least into
kvm_arch_vcpu_ioctl_run(), to cover two uses of the interrupt mask.

(I don't remember the reason behind QEMU's mask on SIGBUS any more.)

Thanks.

> +		r = kvm_arch_vcpu_ioctl_run(vcpu, run);
> +		trace_kvm_userspace_exit(run->exit_reason, r);
>  		break;
> +	}
>  	case KVM_GET_REGS: {
>  		struct kvm_regs *kvm_regs;
>  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals
  2017-02-08 13:18 ` Christian Borntraeger
@ 2017-02-08 14:10   ` Paolo Bonzini
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2017-02-08 14:10 UTC (permalink / raw)
  To: Christian Borntraeger, linux-kernel, kvm



On 08/02/2017 14:18, Christian Borntraeger wrote:
>>           VCPU thread                     service thread
>>    --------------------------------------------------------------
>>                                           raise signal
>>         signal handler
>>           set run->immediate_exit
>>         KVM_RUN
>>           check run->immediate_exit
> So the idea is to have both, a signal and this flag and you want userspace
> to set this flag in its signal handler?

Yes.  This flag can also replace qemu_cpu_kick_self.

> So we no longer block this signal
> in QEMU then. Makes sense.
> Do you have the QEMU patch ready, to do a better review of the whole idea?

I have something that seems to work, but I've not stressed it at all and
it depends on a few cleanups to the SIGBUS handling code (which is
currently x86-specific).

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-08 14:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08 11:10 [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals Paolo Bonzini
2017-02-08 13:18 ` Christian Borntraeger
2017-02-08 14:10   ` Paolo Bonzini
2017-02-08 13:26 ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.