All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM exit to userspace on WFI
@ 2023-10-20 18:45 ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-20 18:45 UTC (permalink / raw)
  To: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Lukas Jünger

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

Hi all,

I am looking for a way to have KVM_RUN exit back to userspace once the
vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
flag is currently ignored by arm64. So my solution thus far is to
patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
Any chance to get this (or something similar) mainline?

-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

[-- Attachment #2: kvm.patch --]
[-- Type: text/x-patch, Size: 1339 bytes --]

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 54d26f13f..7be42e3f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -215,6 +215,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_WFX_EXIT:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5ab52150..d0386faeb 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -88,6 +88,11 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->run->request_interrupt_window) {
+		vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
+		return 0;
+	}
+
 	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 83a2185d9..1073269f2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1124,6 +1124,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
 #define KVM_CAP_S390_ZPCI_OP 221
+#define KVM_CAP_ARM_WFX_EXIT 222
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* KVM exit to userspace on WFI
@ 2023-10-20 18:45 ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-20 18:45 UTC (permalink / raw)
  To: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Lukas Jünger

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

Hi all,

I am looking for a way to have KVM_RUN exit back to userspace once the
vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
flag is currently ignored by arm64. So my solution thus far is to
patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
Any chance to get this (or something similar) mainline?

-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

[-- Attachment #2: kvm.patch --]
[-- Type: text/x-patch, Size: 1339 bytes --]

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 54d26f13f..7be42e3f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -215,6 +215,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_WFX_EXIT:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5ab52150..d0386faeb 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -88,6 +88,11 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->run->request_interrupt_window) {
+		vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
+		return 0;
+	}
+
 	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 83a2185d9..1073269f2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1124,6 +1124,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
 #define KVM_CAP_S390_ZPCI_OP 221
+#define KVM_CAP_ARM_WFX_EXIT 222
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-20 18:45 ` Jan Henrik Weinstock
@ 2023-10-20 19:56   ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-20 19:56 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Jan,

On Fri, 20 Oct 2023 19:45:05 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi all,
> 
> I am looking for a way to have KVM_RUN exit back to userspace once the
> vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
> flag is currently ignored by arm64.

Well, that's consistent with arm64 not being an x86 implementation. We
can inject interrupts any time, and there is no notion of "interrupt
window".

> So my solution thus far is to
> patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
> userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
> Any chance to get this (or something similar) mainline?

Certainly not as such. For start, this won't hit all WFIs, but only
those that actively trap. And we don't even *try* to trap WFx in a
number of cases (vcpu alone in its run queue and/or direct injection).
There isn't even any guarantee that WFx is anything other than a NOP
(it is architecturally only a hint), in which case no trap applies.

So your "working example" really isn't one, as the architecture
doesn't give you a way to do what you're asking for. If you want to
cause an exit, writing to 'immediate_exit' and delivering a signal is
the way.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-20 19:56   ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-20 19:56 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Jan,

On Fri, 20 Oct 2023 19:45:05 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi all,
> 
> I am looking for a way to have KVM_RUN exit back to userspace once the
> vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
> flag is currently ignored by arm64.

Well, that's consistent with arm64 not being an x86 implementation. We
can inject interrupts any time, and there is no notion of "interrupt
window".

> So my solution thus far is to
> patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
> userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
> Any chance to get this (or something similar) mainline?

Certainly not as such. For start, this won't hit all WFIs, but only
those that actively trap. And we don't even *try* to trap WFx in a
number of cases (vcpu alone in its run queue and/or direct injection).
There isn't even any guarantee that WFx is anything other than a NOP
(it is architecturally only a hint), in which case no trap applies.

So your "working example" really isn't one, as the architecture
doesn't give you a way to do what you're asking for. If you want to
cause an exit, writing to 'immediate_exit' and delivering a signal is
the way.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-20 19:56   ` Marc Zyngier
@ 2023-10-25 12:12     ` Jan Henrik Weinstock
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-25 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Marc,

Thanks for your feedback. I understand that request_interrupt_window
is not to be used. I assume a setting a flag is a better way,
something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
KVM_ARCH_FLAG_WFX_EXIT_TO_USER.

I will also check that WFx traps are always enabled while this mode is
active to make sure userspace does not get blocked/scheduled out.

The reason for this is that we cannot have the thread that executes
KVM_RUN to be blocked or scheduled out whenever it hits a WFI.
Nop-WFIs are not a problem, since the PE will just continue executing
instructions, which is fine. We are currently using a timeout signal
that kicks KVM_RUN back into userspace, but we are seeing a lot of
time wasted because our KVM thread hangs in WFI/WFEs. It would be
better if we could just return from KVM_RUN immediately if the thread
would otherwise be blocked.

Thanks
Jan

Am Fr., 20. Okt. 2023 um 21:56 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> Hi Jan,
>
> On Fri, 20 Oct 2023 19:45:05 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi all,
> >
> > I am looking for a way to have KVM_RUN exit back to userspace once the
> > vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
> > flag is currently ignored by arm64.
>
> Well, that's consistent with arm64 not being an x86 implementation. We
> can inject interrupts any time, and there is no notion of "interrupt
> window".
>
> > So my solution thus far is to
> > patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
> > userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
> > Any chance to get this (or something similar) mainline?
>
> Certainly not as such. For start, this won't hit all WFIs, but only
> those that actively trap. And we don't even *try* to trap WFx in a
> number of cases (vcpu alone in its run queue and/or direct injection).
> There isn't even any guarantee that WFx is anything other than a NOP
> (it is architecturally only a hint), in which case no trap applies.
>
> So your "working example" really isn't one, as the architecture
> doesn't give you a way to do what you're asking for. If you want to
> cause an exit, writing to 'immediate_exit' and delivering a signal is
> the way.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-25 12:12     ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-25 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Marc,

Thanks for your feedback. I understand that request_interrupt_window
is not to be used. I assume a setting a flag is a better way,
something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
KVM_ARCH_FLAG_WFX_EXIT_TO_USER.

I will also check that WFx traps are always enabled while this mode is
active to make sure userspace does not get blocked/scheduled out.

The reason for this is that we cannot have the thread that executes
KVM_RUN to be blocked or scheduled out whenever it hits a WFI.
Nop-WFIs are not a problem, since the PE will just continue executing
instructions, which is fine. We are currently using a timeout signal
that kicks KVM_RUN back into userspace, but we are seeing a lot of
time wasted because our KVM thread hangs in WFI/WFEs. It would be
better if we could just return from KVM_RUN immediately if the thread
would otherwise be blocked.

Thanks
Jan

Am Fr., 20. Okt. 2023 um 21:56 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> Hi Jan,
>
> On Fri, 20 Oct 2023 19:45:05 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi all,
> >
> > I am looking for a way to have KVM_RUN exit back to userspace once the
> > vcpu encounters a WFI. It seems the kvm_run->request_interrupt_window
> > flag is currently ignored by arm64.
>
> Well, that's consistent with arm64 not being an x86 implementation. We
> can inject interrupts any time, and there is no notion of "interrupt
> window".
>
> > So my solution thus far is to
> > patch kvm_handle_wfx in arch/arm64/kvm/handle_exit.c and return to
> > userspace with KVM_EXIT_IRQ_WINDOW_OPEN - working example attached.
> > Any chance to get this (or something similar) mainline?
>
> Certainly not as such. For start, this won't hit all WFIs, but only
> those that actively trap. And we don't even *try* to trap WFx in a
> number of cases (vcpu alone in its run queue and/or direct injection).
> There isn't even any guarantee that WFx is anything other than a NOP
> (it is architecturally only a hint), in which case no trap applies.
>
> So your "working example" really isn't one, as the architecture
> doesn't give you a way to do what you're asking for. If you want to
> cause an exit, writing to 'immediate_exit' and delivering a signal is
> the way.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-25 12:12     ` Jan Henrik Weinstock
@ 2023-10-25 12:42       ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-25 12:42 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

On Wed, 25 Oct 2023 13:12:14 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi Marc,
> 
> Thanks for your feedback. I understand that request_interrupt_window
> is not to be used. I assume a setting a flag is a better way,
> something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> 
> I will also check that WFx traps are always enabled while this mode is
> active to make sure userspace does not get blocked/scheduled out.

Why would that be an acceptable behaviour?

> The reason for this is that we cannot have the thread that executes
> KVM_RUN to be blocked or scheduled out whenever it hits a WFI.

Why? If that's not acceptable, how do you even cope with the basic
preemption?

> Nop-WFIs are not a problem, since the PE will just continue executing
> instructions, which is fine. We are currently using a timeout signal
> that kicks KVM_RUN back into userspace, but we are seeing a lot of
> time wasted because our KVM thread hangs in WFI/WFEs. It would be
> better if we could just return from KVM_RUN immediately if the thread
> would otherwise be blocked.

On the face of it, this makes little sense:

- While in userspace, no interrupt source that normally delivered
  without any userpsace intervention will be blocked (timers,
  VLPIs...). I cannot how this can be a good idea.

- Trapping WFE is an important scheduling hint, and returning to
  userspace defeats it. Contended spinlocks, for example, will be even
  slower to acquire.

I'm sure you have a particular use case for such a degraded behaviour,
but since you are not describing it, I'm not at all inclined to
actively break KVM's performance and scalability.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-25 12:42       ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-25 12:42 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

On Wed, 25 Oct 2023 13:12:14 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi Marc,
> 
> Thanks for your feedback. I understand that request_interrupt_window
> is not to be used. I assume a setting a flag is a better way,
> something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> 
> I will also check that WFx traps are always enabled while this mode is
> active to make sure userspace does not get blocked/scheduled out.

Why would that be an acceptable behaviour?

> The reason for this is that we cannot have the thread that executes
> KVM_RUN to be blocked or scheduled out whenever it hits a WFI.

Why? If that's not acceptable, how do you even cope with the basic
preemption?

> Nop-WFIs are not a problem, since the PE will just continue executing
> instructions, which is fine. We are currently using a timeout signal
> that kicks KVM_RUN back into userspace, but we are seeing a lot of
> time wasted because our KVM thread hangs in WFI/WFEs. It would be
> better if we could just return from KVM_RUN immediately if the thread
> would otherwise be blocked.

On the face of it, this makes little sense:

- While in userspace, no interrupt source that normally delivered
  without any userpsace intervention will be blocked (timers,
  VLPIs...). I cannot how this can be a good idea.

- Trapping WFE is an important scheduling hint, and returning to
  userspace defeats it. Contended spinlocks, for example, will be even
  slower to acquire.

I'm sure you have a particular use case for such a degraded behaviour,
but since you are not describing it, I'm not at all inclined to
actively break KVM's performance and scalability.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-25 12:42       ` Marc Zyngier
@ 2023-10-27 17:41         ` Jan Henrik Weinstock
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-27 17:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Marc,

the basic idea behind this is to have a (single-threaded) execution loop,
something like this:

vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
                         ^
                  WFX or timeout

We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
a certain budget of instructions (counted via pmu). Our fallback currently is
to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
of course, if the cpu is stuck at a wfi, we are wasting a lot of time.

I understand that the proposed behavior is not desirable for most use cases,
which is why I suggest locking it behind a flag, e.g.
KVM_ARCH_FLAG_WFX_EXIT_TO_USER.


Am Mi., 25. Okt. 2023 um 14:42 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> On Wed, 25 Oct 2023 13:12:14 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi Marc,
> >
> > Thanks for your feedback. I understand that request_interrupt_window
> > is not to be used. I assume a setting a flag is a better way,
> > something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> >
> > I will also check that WFx traps are always enabled while this mode is
> > active to make sure userspace does not get blocked/scheduled out.
>
> Why would that be an acceptable behaviour?
>
> > The reason for this is that we cannot have the thread that executes
> > KVM_RUN to be blocked or scheduled out whenever it hits a WFI.
>
> Why? If that's not acceptable, how do you even cope with the basic
> preemption?
>
> > Nop-WFIs are not a problem, since the PE will just continue executing
> > instructions, which is fine. We are currently using a timeout signal
> > that kicks KVM_RUN back into userspace, but we are seeing a lot of
> > time wasted because our KVM thread hangs in WFI/WFEs. It would be
> > better if we could just return from KVM_RUN immediately if the thread
> > would otherwise be blocked.
>
> On the face of it, this makes little sense:
>
> - While in userspace, no interrupt source that normally delivered
>   without any userpsace intervention will be blocked (timers,
>   VLPIs...). I cannot how this can be a good idea.
>
> - Trapping WFE is an important scheduling hint, and returning to
>   userspace defeats it. Contended spinlocks, for example, will be even
>   slower to acquire.
>
> I'm sure you have a particular use case for such a degraded behaviour,
> but since you are not describing it, I'm not at all inclined to
> actively break KVM's performance and scalability.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-27 17:41         ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-27 17:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Hi Marc,

the basic idea behind this is to have a (single-threaded) execution loop,
something like this:

vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
                         ^
                  WFX or timeout

We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
a certain budget of instructions (counted via pmu). Our fallback currently is
to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
of course, if the cpu is stuck at a wfi, we are wasting a lot of time.

I understand that the proposed behavior is not desirable for most use cases,
which is why I suggest locking it behind a flag, e.g.
KVM_ARCH_FLAG_WFX_EXIT_TO_USER.


Am Mi., 25. Okt. 2023 um 14:42 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> On Wed, 25 Oct 2023 13:12:14 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi Marc,
> >
> > Thanks for your feedback. I understand that request_interrupt_window
> > is not to be used. I assume a setting a flag is a better way,
> > something similar to KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> >
> > I will also check that WFx traps are always enabled while this mode is
> > active to make sure userspace does not get blocked/scheduled out.
>
> Why would that be an acceptable behaviour?
>
> > The reason for this is that we cannot have the thread that executes
> > KVM_RUN to be blocked or scheduled out whenever it hits a WFI.
>
> Why? If that's not acceptable, how do you even cope with the basic
> preemption?
>
> > Nop-WFIs are not a problem, since the PE will just continue executing
> > instructions, which is fine. We are currently using a timeout signal
> > that kicks KVM_RUN back into userspace, but we are seeing a lot of
> > time wasted because our KVM thread hangs in WFI/WFEs. It would be
> > better if we could just return from KVM_RUN immediately if the thread
> > would otherwise be blocked.
>
> On the face of it, this makes little sense:
>
> - While in userspace, no interrupt source that normally delivered
>   without any userpsace intervention will be blocked (timers,
>   VLPIs...). I cannot how this can be a good idea.
>
> - Trapping WFE is an important scheduling hint, and returning to
>   userspace defeats it. Contended spinlocks, for example, will be even
>   slower to acquire.
>
> I'm sure you have a particular use case for such a degraded behaviour,
> but since you are not describing it, I'm not at all inclined to
> actively break KVM's performance and scalability.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-27 17:41         ` Jan Henrik Weinstock
@ 2023-10-30 12:36           ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-30 12:36 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

[please make an effort not to top-post]

On Fri, 27 Oct 2023 18:41:44 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi Marc,
> 
> the basic idea behind this is to have a (single-threaded) execution loop,
> something like this:
> 
> vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
>                          ^
>                   WFX or timeout
> 
> We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> a certain budget of instructions (counted via pmu). Our fallback currently is
> to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> 
> I understand that the proposed behavior is not desirable for most use cases,
> which is why I suggest locking it behind a flag, e.g.
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER.

But how do you reconcile the fact that exposing this to userspace
breaks fundamental expectations that the guest has, such as getting
its timer interrupts and directly injected LPIs? Implementing WFI in
userspace breaks it. What about the case where we don't trap WFx and
let the *guest* wait for an interrupt?

Honestly, what you are describing seems to be a use model that doesn't
fit KVM, which is a general purpose hypervisor, but more a simulation
environment. Yes, the primitives are the same, but the plumbing is
wildly different.

*If* that's the stuff you're looking at, then I'm afraid you'll have
to do it in different way, because what you are suggesting is
fundamentally incompatible with the guarantees that KVM gives to guest
and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
lie. It should really be named something more along the lines of
KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
(probably with additional clauses related to breaking things).

Overall, you are still asking for something that is not guaranteed at
the architecture level, even less in KVM, and I'm not going to add
support for something that can only work "sometime".

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-30 12:36           ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-10-30 12:36 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

[please make an effort not to top-post]

On Fri, 27 Oct 2023 18:41:44 +0100,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Hi Marc,
> 
> the basic idea behind this is to have a (single-threaded) execution loop,
> something like this:
> 
> vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
>                          ^
>                   WFX or timeout
> 
> We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> a certain budget of instructions (counted via pmu). Our fallback currently is
> to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> 
> I understand that the proposed behavior is not desirable for most use cases,
> which is why I suggest locking it behind a flag, e.g.
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER.

But how do you reconcile the fact that exposing this to userspace
breaks fundamental expectations that the guest has, such as getting
its timer interrupts and directly injected LPIs? Implementing WFI in
userspace breaks it. What about the case where we don't trap WFx and
let the *guest* wait for an interrupt?

Honestly, what you are describing seems to be a use model that doesn't
fit KVM, which is a general purpose hypervisor, but more a simulation
environment. Yes, the primitives are the same, but the plumbing is
wildly different.

*If* that's the stuff you're looking at, then I'm afraid you'll have
to do it in different way, because what you are suggesting is
fundamentally incompatible with the guarantees that KVM gives to guest
and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
lie. It should really be named something more along the lines of
KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
(probably with additional clauses related to breaking things).

Overall, you are still asking for something that is not guaranteed at
the architecture level, even less in KVM, and I'm not going to add
support for something that can only work "sometime".

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-30 12:36           ` Marc Zyngier
@ 2023-10-31 19:21             ` Jan Henrik Weinstock
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-31 19:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> [please make an effort not to top-post]
>
> On Fri, 27 Oct 2023 18:41:44 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi Marc,
> >
> > the basic idea behind this is to have a (single-threaded) execution loop,
> > something like this:
> >
> > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> >                          ^
> >                   WFX or timeout
> >
> > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > a certain budget of instructions (counted via pmu). Our fallback currently is
> > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> >
> > I understand that the proposed behavior is not desirable for most use cases,
> > which is why I suggest locking it behind a flag, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
>
> But how do you reconcile the fact that exposing this to userspace
> breaks fundamental expectations that the guest has, such as getting
> its timer interrupts and directly injected LPIs? Implementing WFI in
> userspace breaks it. What about the case where we don't trap WFx and
> let the *guest* wait for an interrupt?

Timer interrupts etc. will be injected into the vcpu during the
io-phases. When there are no interrupts present and the guest performs
a WFI, we can just skip forward to the next timer event.

> Honestly, what you are describing seems to be a use model that doesn't
> fit KVM, which is a general purpose hypervisor, but more a simulation
> environment. Yes, the primitives are the same, but the plumbing is
> wildly different.

Agreed.

> *If* that's the stuff you're looking at, then I'm afraid you'll have
> to do it in different way, because what you are suggesting is
> fundamentally incompatible with the guarantees that KVM gives to guest
> and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> lie. It should really be named something more along the lines of
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> (probably with additional clauses related to breaking things).

I have attached a reworked version of the patch as a reference (based
on my 5.15 kernel). It puts the modified behavior behind a new
capability so as to not interfere with the current expectations
towards handling WFI/WFE.
I think it should now trap all blocking calls to WFx on the vcpu and
reliably return to the userspace. If I have missed something that
would cause the vcpu to not trap on a WFI kindly let me know.

> Overall, you are still asking for something that is not guaranteed at
> the architecture level, even less in KVM, and I'm not going to add
> support for something that can only work "sometime".

I am not quite sure what you mean with "sometime". Are you referring
to WFIs as NOPs? Or WFIs that do not yield because of pending
interrupts?

The point of my patch is not to accurately count every single WFI. The
point is to prevent the host cpu from sleeping just because my vcpu
executed a WFI somewhere in the guest software. If a WFI is executed
by the guest and that does not result in my vcpu thread to block (in
other words: the vcpu continues executing instructions beyond the WFI)
then it also should not exit to userspace. So instead of
"KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

[-- Attachment #2: kvm.patch --]
[-- Type: text/x-patch, Size: 3360 bytes --]

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fc6ee6c59..c3107506b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* Exit on WFI/WFE */
+	bool exit_on_wfx;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f181527f9..6d54dfbae 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_EXIT_ON_WFX:
+		r = 0;
+		kvm->arch.exit_on_wfx = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -215,6 +219,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_EXIT_ON_WFX:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -394,8 +399,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
+	bool exit_on_wfx;
 
 	mmu = vcpu->arch.hw_mmu;
+	exit_on_wfx = vcpu->kvm->arch.exit_on_wfx;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
 	/*
@@ -423,7 +430,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
 		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
-	if (single_task_running())
+	if (single_task_running() && !exit_on_wfx)
 		vcpu_clear_wfx_traps(vcpu);
 	else
 		vcpu_set_wfx_traps(vcpu);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5ab52150..80fa6bdef 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -91,10 +91,21 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
+		if (vcpu->kvm->arch.exit_on_wfx) {
+			vcpu->run->exit_reason = KVM_EXIT_WFX;
+			vcpu->run->wfx.esr = kvm_vcpu_get_esr(vcpu);
+			return 0;
+		}
+
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
 	} else {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
+		if (vcpu->kvm->arch.exit_on_wfx) {
+			vcpu->run->exit_reason = KVM_EXIT_WFX;
+			vcpu->run->wfx.esr = kvm_vcpu_get_esr(vcpu);
+			return 0;
+		}
 		kvm_vcpu_block(vcpu);
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d47e07f4..155dc7eab 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_WFX              35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -469,6 +470,11 @@ struct kvm_run {
 		} msr;
 		/* KVM_EXIT_XEN */
 		struct kvm_xen_exit xen;
+		/* KVM_EXIT_WFX */
+		struct {
+			__u64 esr;
+		} wfx;
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1123,6 +1129,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
+#define KVM_CAP_EXIT_ON_WFX 222
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-10-31 19:21             ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-10-31 19:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> [please make an effort not to top-post]
>
> On Fri, 27 Oct 2023 18:41:44 +0100,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Hi Marc,
> >
> > the basic idea behind this is to have a (single-threaded) execution loop,
> > something like this:
> >
> > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> >                          ^
> >                   WFX or timeout
> >
> > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > a certain budget of instructions (counted via pmu). Our fallback currently is
> > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> >
> > I understand that the proposed behavior is not desirable for most use cases,
> > which is why I suggest locking it behind a flag, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
>
> But how do you reconcile the fact that exposing this to userspace
> breaks fundamental expectations that the guest has, such as getting
> its timer interrupts and directly injected LPIs? Implementing WFI in
> userspace breaks it. What about the case where we don't trap WFx and
> let the *guest* wait for an interrupt?

Timer interrupts etc. will be injected into the vcpu during the
io-phases. When there are no interrupts present and the guest performs
a WFI, we can just skip forward to the next timer event.

> Honestly, what you are describing seems to be a use model that doesn't
> fit KVM, which is a general purpose hypervisor, but more a simulation
> environment. Yes, the primitives are the same, but the plumbing is
> wildly different.

Agreed.

> *If* that's the stuff you're looking at, then I'm afraid you'll have
> to do it in different way, because what you are suggesting is
> fundamentally incompatible with the guarantees that KVM gives to guest
> and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> lie. It should really be named something more along the lines of
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> (probably with additional clauses related to breaking things).

I have attached a reworked version of the patch as a reference (based
on my 5.15 kernel). It puts the modified behavior behind a new
capability so as to not interfere with the current expectations
towards handling WFI/WFE.
I think it should now trap all blocking calls to WFx on the vcpu and
reliably return to the userspace. If I have missed something that
would cause the vcpu to not trap on a WFI kindly let me know.

> Overall, you are still asking for something that is not guaranteed at
> the architecture level, even less in KVM, and I'm not going to add
> support for something that can only work "sometime".

I am not quite sure what you mean with "sometime". Are you referring
to WFIs as NOPs? Or WFIs that do not yield because of pending
interrupts?

The point of my patch is not to accurately count every single WFI. The
point is to prevent the host cpu from sleeping just because my vcpu
executed a WFI somewhere in the guest software. If a WFI is executed
by the guest and that does not result in my vcpu thread to block (in
other words: the vcpu continues executing instructions beyond the WFI)
then it also should not exit to userspace. So instead of
"KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

[-- Attachment #2: kvm.patch --]
[-- Type: text/x-patch, Size: 3360 bytes --]

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fc6ee6c59..c3107506b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* Exit on WFI/WFE */
+	bool exit_on_wfx;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f181527f9..6d54dfbae 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_EXIT_ON_WFX:
+		r = 0;
+		kvm->arch.exit_on_wfx = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -215,6 +219,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_EXIT_ON_WFX:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -394,8 +399,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
+	bool exit_on_wfx;
 
 	mmu = vcpu->arch.hw_mmu;
+	exit_on_wfx = vcpu->kvm->arch.exit_on_wfx;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
 	/*
@@ -423,7 +430,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
 		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
-	if (single_task_running())
+	if (single_task_running() && !exit_on_wfx)
 		vcpu_clear_wfx_traps(vcpu);
 	else
 		vcpu_set_wfx_traps(vcpu);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5ab52150..80fa6bdef 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -91,10 +91,21 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
+		if (vcpu->kvm->arch.exit_on_wfx) {
+			vcpu->run->exit_reason = KVM_EXIT_WFX;
+			vcpu->run->wfx.esr = kvm_vcpu_get_esr(vcpu);
+			return 0;
+		}
+
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
 	} else {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
+		if (vcpu->kvm->arch.exit_on_wfx) {
+			vcpu->run->exit_reason = KVM_EXIT_WFX;
+			vcpu->run->wfx.esr = kvm_vcpu_get_esr(vcpu);
+			return 0;
+		}
 		kvm_vcpu_block(vcpu);
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d47e07f4..155dc7eab 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_WFX              35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -469,6 +470,11 @@ struct kvm_run {
 		} msr;
 		/* KVM_EXIT_XEN */
 		struct kvm_xen_exit xen;
+		/* KVM_EXIT_WFX */
+		struct {
+			__u64 esr;
+		} wfx;
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1123,6 +1129,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
+#define KVM_CAP_EXIT_ON_WFX 222
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-10-31 19:21             ` Jan Henrik Weinstock
@ 2023-11-04 12:13               ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-11-04 12:13 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

On Tue, 31 Oct 2023 19:21:16 +0000,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
> >
> > [please make an effort not to top-post]
> >
> > On Fri, 27 Oct 2023 18:41:44 +0100,
> > Jan Henrik Weinstock <jan@mwa.re> wrote:
> > >
> > > Hi Marc,
> > >
> > > the basic idea behind this is to have a (single-threaded) execution loop,
> > > something like this:
> > >
> > > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> > >                          ^
> > >                   WFX or timeout
> > >
> > > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > > a certain budget of instructions (counted via pmu). Our fallback currently is
> > > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> > >
> > > I understand that the proposed behavior is not desirable for most use cases,
> > > which is why I suggest locking it behind a flag, e.g.
> > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> >
> > But how do you reconcile the fact that exposing this to userspace
> > breaks fundamental expectations that the guest has, such as getting
> > its timer interrupts and directly injected LPIs? Implementing WFI in
> > userspace breaks it. What about the case where we don't trap WFx and
> > let the *guest* wait for an interrupt?
> 
> Timer interrupts etc. will be injected into the vcpu during the
> io-phases. When there are no interrupts present and the guest performs
> a WFI, we can just skip forward to the next timer event.

Skip forward? What does that mean? Compress time and move along?

> 
> > Honestly, what you are describing seems to be a use model that doesn't
> > fit KVM, which is a general purpose hypervisor, but more a simulation
> > environment. Yes, the primitives are the same, but the plumbing is
> > wildly different.
> 
> Agreed.
> 
> > *If* that's the stuff you're looking at, then I'm afraid you'll have
> > to do it in different way, because what you are suggesting is
> > fundamentally incompatible with the guarantees that KVM gives to guest
> > and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> > lie. It should really be named something more along the lines of
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> > (probably with additional clauses related to breaking things).
> 
> I have attached a reworked version of the patch as a reference (based
> on my 5.15 kernel). It puts the modified behavior behind a new
> capability so as to not interfere with the current expectations
> towards handling WFI/WFE.
> I think it should now trap all blocking calls to WFx on the vcpu and
> reliably return to the userspace. If I have missed something that
> would cause the vcpu to not trap on a WFI kindly let me know.

Oh FFS. Please read my previous emails, the architecture spec, and
understand that WFx is a *hint*. Given your line of work, I would hope
you understand the implications of this.

> 
> > Overall, you are still asking for something that is not guaranteed at
> > the architecture level, even less in KVM, and I'm not going to add
> > support for something that can only work "sometime".
> 
> I am not quite sure what you mean with "sometime". Are you referring
> to WFIs as NOPs? Or WFIs that do not yield because of pending
> interrupts?

NOP is a valid implementation of WFx. WFx doesn't have to trap. Its
only requirements are not to lose state. Nothing else. Trapping is a
'quality of implementation' feature, and doesn't affect correctness.
And yes, there are machines out there that will absolutely ignore any
request for trapping.

From the architecture spec (ARM DDI 0487J.a, D19.2.48, TWI):

<quote>
Since a WFI can complete at any time, even without a Wakeup event, the
traps on WFI are not guaranteed to be taken, even if the WFI is
executed when there is no Wakeup event. The only guarantee is that if
the instruction does not complete in finite time in the absence of a
Wakeup event, the trap will be taken.
</quote>

Similar verbiage exists for WFE. Do you now see why your proposal
makes little sense?

> 
> The point of my patch is not to accurately count every single WFI. The
> point is to prevent the host cpu from sleeping just because my vcpu
> executed a WFI somewhere in the guest software. If a WFI is executed
> by the guest and that does not result in my vcpu thread to block (in
> other words: the vcpu continues executing instructions beyond the WFI)
> then it also should not exit to userspace. So instead of
> "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
> is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

You already must be able to handle a guest spinning in a loop without
a WFI. So why would WFI be of interest more than anything else? You
can always make an interrupt pending at any point, without having to
wait for WFI to occur. Just make the interrupt pending (which, if you
emulate everything in userspace, is just giving the vcpu thread a
signal).

My hunch is that your SW is trying to do the interrupt injection from
the vcpu thread, which is a pretty broken model (it would badly model
the concept of an interrupt being an asynchronous event).

Honestly, if there was one thing I would add to the kernel, it would
be an option to *prevent* any trap of WFx, because that at least is
something we can universally enforce and guarantee to userspace.
Anything else is only wishful thinking.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-11-04 12:13               ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2023-11-04 12:13 UTC (permalink / raw)
  To: Jan Henrik Weinstock
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

On Tue, 31 Oct 2023 19:21:16 +0000,
Jan Henrik Weinstock <jan@mwa.re> wrote:
> 
> Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
> >
> > [please make an effort not to top-post]
> >
> > On Fri, 27 Oct 2023 18:41:44 +0100,
> > Jan Henrik Weinstock <jan@mwa.re> wrote:
> > >
> > > Hi Marc,
> > >
> > > the basic idea behind this is to have a (single-threaded) execution loop,
> > > something like this:
> > >
> > > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> > >                          ^
> > >                   WFX or timeout
> > >
> > > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > > a certain budget of instructions (counted via pmu). Our fallback currently is
> > > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> > >
> > > I understand that the proposed behavior is not desirable for most use cases,
> > > which is why I suggest locking it behind a flag, e.g.
> > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> >
> > But how do you reconcile the fact that exposing this to userspace
> > breaks fundamental expectations that the guest has, such as getting
> > its timer interrupts and directly injected LPIs? Implementing WFI in
> > userspace breaks it. What about the case where we don't trap WFx and
> > let the *guest* wait for an interrupt?
> 
> Timer interrupts etc. will be injected into the vcpu during the
> io-phases. When there are no interrupts present and the guest performs
> a WFI, we can just skip forward to the next timer event.

Skip forward? What does that mean? Compress time and move along?

> 
> > Honestly, what you are describing seems to be a use model that doesn't
> > fit KVM, which is a general purpose hypervisor, but more a simulation
> > environment. Yes, the primitives are the same, but the plumbing is
> > wildly different.
> 
> Agreed.
> 
> > *If* that's the stuff you're looking at, then I'm afraid you'll have
> > to do it in different way, because what you are suggesting is
> > fundamentally incompatible with the guarantees that KVM gives to guest
> > and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> > lie. It should really be named something more along the lines of
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> > (probably with additional clauses related to breaking things).
> 
> I have attached a reworked version of the patch as a reference (based
> on my 5.15 kernel). It puts the modified behavior behind a new
> capability so as to not interfere with the current expectations
> towards handling WFI/WFE.
> I think it should now trap all blocking calls to WFx on the vcpu and
> reliably return to the userspace. If I have missed something that
> would cause the vcpu to not trap on a WFI kindly let me know.

Oh FFS. Please read my previous emails, the architecture spec, and
understand that WFx is a *hint*. Given your line of work, I would hope
you understand the implications of this.

> 
> > Overall, you are still asking for something that is not guaranteed at
> > the architecture level, even less in KVM, and I'm not going to add
> > support for something that can only work "sometime".
> 
> I am not quite sure what you mean with "sometime". Are you referring
> to WFIs as NOPs? Or WFIs that do not yield because of pending
> interrupts?

NOP is a valid implementation of WFx. WFx doesn't have to trap. Its
only requirements are not to lose state. Nothing else. Trapping is a
'quality of implementation' feature, and doesn't affect correctness.
And yes, there are machines out there that will absolutely ignore any
request for trapping.

From the architecture spec (ARM DDI 0487J.a, D19.2.48, TWI):

<quote>
Since a WFI can complete at any time, even without a Wakeup event, the
traps on WFI are not guaranteed to be taken, even if the WFI is
executed when there is no Wakeup event. The only guarantee is that if
the instruction does not complete in finite time in the absence of a
Wakeup event, the trap will be taken.
</quote>

Similar verbiage exists for WFE. Do you now see why your proposal
makes little sense?

> 
> The point of my patch is not to accurately count every single WFI. The
> point is to prevent the host cpu from sleeping just because my vcpu
> executed a WFI somewhere in the guest software. If a WFI is executed
> by the guest and that does not result in my vcpu thread to block (in
> other words: the vcpu continues executing instructions beyond the WFI)
> then it also should not exit to userspace. So instead of
> "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
> is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

You already must be able to handle a guest spinning in a loop without
a WFI. So why would WFI be of interest more than anything else? You
can always make an interrupt pending at any point, without having to
wait for WFI to occur. Just make the interrupt pending (which, if you
emulate everything in userspace, is just giving the vcpu thread a
signal).

My hunch is that your SW is trying to do the interrupt injection from
the vcpu thread, which is a pretty broken model (it would badly model
the concept of an interrupt being an asynchronous event).

Honestly, if there was one thing I would add to the kernel, it would
be an option to *prevent* any trap of WFx, because that at least is
something we can universally enforce and guarantee to userspace.
Anything else is only wishful thinking.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
  2023-11-04 12:13               ` Marc Zyngier
@ 2023-11-08  9:38                 ` Jan Henrik Weinstock
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-11-08  9:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Am Sa., 4. Nov. 2023 um 13:13 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> On Tue, 31 Oct 2023 19:21:16 +0000,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
> > >
> > > [please make an effort not to top-post]
> > >
> > > On Fri, 27 Oct 2023 18:41:44 +0100,
> > > Jan Henrik Weinstock <jan@mwa.re> wrote:
> > > >
> > > > Hi Marc,
> > > >
> > > > the basic idea behind this is to have a (single-threaded) execution loop,
> > > > something like this:
> > > >
> > > > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> > > >                          ^
> > > >                   WFX or timeout
> > > >
> > > > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > > > a certain budget of instructions (counted via pmu). Our fallback currently is
> > > > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > > > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> > > >
> > > > I understand that the proposed behavior is not desirable for most use cases,
> > > > which is why I suggest locking it behind a flag, e.g.
> > > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> > >
> > > But how do you reconcile the fact that exposing this to userspace
> > > breaks fundamental expectations that the guest has, such as getting
> > > its timer interrupts and directly injected LPIs? Implementing WFI in
> > > userspace breaks it. What about the case where we don't trap WFx and
> > > let the *guest* wait for an interrupt?
> >
> > Timer interrupts etc. will be injected into the vcpu during the
> > io-phases. When there are no interrupts present and the guest performs
> > a WFI, we can just skip forward to the next timer event.
>
> Skip forward? What does that mean? Compress time and move along?

Yes, advance virtual time to the next relevant event (timer interrupt, I/O, ...)

> > > Honestly, what you are describing seems to be a use model that doesn't
> > > fit KVM, which is a general purpose hypervisor, but more a simulation
> > > environment. Yes, the primitives are the same, but the plumbing is
> > > wildly different.
> >
> > Agreed.
> >
> > > *If* that's the stuff you're looking at, then I'm afraid you'll have
> > > to do it in different way, because what you are suggesting is
> > > fundamentally incompatible with the guarantees that KVM gives to guest
> > > and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> > > lie. It should really be named something more along the lines of
> > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> > > (probably with additional clauses related to breaking things).
> >
> > I have attached a reworked version of the patch as a reference (based
> > on my 5.15 kernel). It puts the modified behavior behind a new
> > capability so as to not interfere with the current expectations
> > towards handling WFI/WFE.
> > I think it should now trap all blocking calls to WFx on the vcpu and
> > reliably return to the userspace. If I have missed something that
> > would cause the vcpu to not trap on a WFI kindly let me know.
>
> Oh FFS. Please read my previous emails, the architecture spec, and
> understand that WFx is a *hint*. Given your line of work, I would hope
> you understand the implications of this.
>
> >
> > > Overall, you are still asking for something that is not guaranteed at
> > > the architecture level, even less in KVM, and I'm not going to add
> > > support for something that can only work "sometime".
> >
> > I am not quite sure what you mean with "sometime". Are you referring
> > to WFIs as NOPs? Or WFIs that do not yield because of pending
> > interrupts?
>
> NOP is a valid implementation of WFx. WFx doesn't have to trap. Its
> only requirements are not to lose state. Nothing else. Trapping is a
> 'quality of implementation' feature, and doesn't affect correctness.
> And yes, there are machines out there that will absolutely ignore any
> request for trapping.
>
> From the architecture spec (ARM DDI 0487J.a, D19.2.48, TWI):
>
> <quote>
> Since a WFI can complete at any time, even without a Wakeup event, the
> traps on WFI are not guaranteed to be taken, even if the WFI is
> executed when there is no Wakeup event. The only guarantee is that if
> the instruction does not complete in finite time in the absence of a
> Wakeup event, the trap will be taken.
> </quote>

Yes, this guarantee is what I want: if the instruction does not
complete within a finite time, trap (and return to userspace).

> Similar verbiage exists for WFE. Do you now see why your proposal
> makes little sense?
>
> >
> > The point of my patch is not to accurately count every single WFI. The
> > point is to prevent the host cpu from sleeping just because my vcpu
> > executed a WFI somewhere in the guest software. If a WFI is executed
> > by the guest and that does not result in my vcpu thread to block (in
> > other words: the vcpu continues executing instructions beyond the WFI)
> > then it also should not exit to userspace. So instead of
> > "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
> > is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".
>
> You already must be able to handle a guest spinning in a loop without
> a WFI. So why would WFI be of interest more than anything else? You
> can always make an interrupt pending at any point, without having to
> wait for WFI to occur. Just make the interrupt pending (which, if you
> emulate everything in userspace, is just giving the vcpu thread a
> signal).

I can use a watchdog that kicks ("interrupts") the VCPU every so often
in order to check if it was stuck in a WFI. But if a WFI occurs right
at the beginning of its execution, I am wasting a lot of time waiting
for the watchdog timeout. Hence my idea to have the VCPU report its
idleness back to userspace.

> My hunch is that your SW is trying to do the interrupt injection from
> the vcpu thread, which is a pretty broken model (it would badly model
> the concept of an interrupt being an asynchronous event).
>
> Honestly, if there was one thing I would add to the kernel, it would
> be an option to *prevent* any trap of WFx, because that at least is
> something we can universally enforce and guarantee to userspace.
> Anything else is only wishful thinking.

Would that not block your host CPU until the next periodic timer
event? What about other processes that could run on that core while
your VCPU is idle?

>         M.
>
> --
> Without deviation from the norm, progress is not possible.

-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM exit to userspace on WFI
@ 2023-11-08  9:38                 ` Jan Henrik Weinstock
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Henrik Weinstock @ 2023-11-08  9:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, linux-kernel,
	Lukas Jünger

Am Sa., 4. Nov. 2023 um 13:13 Uhr schrieb Marc Zyngier <maz@kernel.org>:
>
> On Tue, 31 Oct 2023 19:21:16 +0000,
> Jan Henrik Weinstock <jan@mwa.re> wrote:
> >
> > Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz@kernel.org>:
> > >
> > > [please make an effort not to top-post]
> > >
> > > On Fri, 27 Oct 2023 18:41:44 +0100,
> > > Jan Henrik Weinstock <jan@mwa.re> wrote:
> > > >
> > > > Hi Marc,
> > > >
> > > > the basic idea behind this is to have a (single-threaded) execution loop,
> > > > something like this:
> > > >
> > > > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> > > >                          ^
> > > >                   WFX or timeout
> > > >
> > > > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > > > a certain budget of instructions (counted via pmu). Our fallback currently is
> > > > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > > > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> > > >
> > > > I understand that the proposed behavior is not desirable for most use cases,
> > > > which is why I suggest locking it behind a flag, e.g.
> > > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
> > >
> > > But how do you reconcile the fact that exposing this to userspace
> > > breaks fundamental expectations that the guest has, such as getting
> > > its timer interrupts and directly injected LPIs? Implementing WFI in
> > > userspace breaks it. What about the case where we don't trap WFx and
> > > let the *guest* wait for an interrupt?
> >
> > Timer interrupts etc. will be injected into the vcpu during the
> > io-phases. When there are no interrupts present and the guest performs
> > a WFI, we can just skip forward to the next timer event.
>
> Skip forward? What does that mean? Compress time and move along?

Yes, advance virtual time to the next relevant event (timer interrupt, I/O, ...)

> > > Honestly, what you are describing seems to be a use model that doesn't
> > > fit KVM, which is a general purpose hypervisor, but more a simulation
> > > environment. Yes, the primitives are the same, but the plumbing is
> > > wildly different.
> >
> > Agreed.
> >
> > > *If* that's the stuff you're looking at, then I'm afraid you'll have
> > > to do it in different way, because what you are suggesting is
> > > fundamentally incompatible with the guarantees that KVM gives to guest
> > > and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> > > lie. It should really be named something more along the lines of
> > > KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> > > (probably with additional clauses related to breaking things).
> >
> > I have attached a reworked version of the patch as a reference (based
> > on my 5.15 kernel). It puts the modified behavior behind a new
> > capability so as to not interfere with the current expectations
> > towards handling WFI/WFE.
> > I think it should now trap all blocking calls to WFx on the vcpu and
> > reliably return to the userspace. If I have missed something that
> > would cause the vcpu to not trap on a WFI kindly let me know.
>
> Oh FFS. Please read my previous emails, the architecture spec, and
> understand that WFx is a *hint*. Given your line of work, I would hope
> you understand the implications of this.
>
> >
> > > Overall, you are still asking for something that is not guaranteed at
> > > the architecture level, even less in KVM, and I'm not going to add
> > > support for something that can only work "sometime".
> >
> > I am not quite sure what you mean with "sometime". Are you referring
> > to WFIs as NOPs? Or WFIs that do not yield because of pending
> > interrupts?
>
> NOP is a valid implementation of WFx. WFx doesn't have to trap. Its
> only requirements are not to lose state. Nothing else. Trapping is a
> 'quality of implementation' feature, and doesn't affect correctness.
> And yes, there are machines out there that will absolutely ignore any
> request for trapping.
>
> From the architecture spec (ARM DDI 0487J.a, D19.2.48, TWI):
>
> <quote>
> Since a WFI can complete at any time, even without a Wakeup event, the
> traps on WFI are not guaranteed to be taken, even if the WFI is
> executed when there is no Wakeup event. The only guarantee is that if
> the instruction does not complete in finite time in the absence of a
> Wakeup event, the trap will be taken.
> </quote>

Yes, this guarantee is what I want: if the instruction does not
complete within a finite time, trap (and return to userspace).

> Similar verbiage exists for WFE. Do you now see why your proposal
> makes little sense?
>
> >
> > The point of my patch is not to accurately count every single WFI. The
> > point is to prevent the host cpu from sleeping just because my vcpu
> > executed a WFI somewhere in the guest software. If a WFI is executed
> > by the guest and that does not result in my vcpu thread to block (in
> > other words: the vcpu continues executing instructions beyond the WFI)
> > then it also should not exit to userspace. So instead of
> > "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
> > is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".
>
> You already must be able to handle a guest spinning in a loop without
> a WFI. So why would WFI be of interest more than anything else? You
> can always make an interrupt pending at any point, without having to
> wait for WFI to occur. Just make the interrupt pending (which, if you
> emulate everything in userspace, is just giving the vcpu thread a
> signal).

I can use a watchdog that kicks ("interrupts") the VCPU every so often
in order to check if it was stuck in a WFI. But if a WFI occurs right
at the beginning of its execution, I am wasting a lot of time waiting
for the watchdog timeout. Hence my idea to have the VCPU report its
idleness back to userspace.

> My hunch is that your SW is trying to do the interrupt injection from
> the vcpu thread, which is a pretty broken model (it would badly model
> the concept of an interrupt being an asynchronous event).
>
> Honestly, if there was one thing I would add to the kernel, it would
> be an option to *prevent* any trap of WFx, because that at least is
> something we can universally enforce and guarantee to userspace.
> Anything else is only wishful thinking.

Would that not block your host CPU until the next periodic timer
event? What about other processes that could run on that core while
your VCPU is idle?

>         M.
>
> --
> Without deviation from the norm, progress is not possible.

-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-11-08  9:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 18:45 KVM exit to userspace on WFI Jan Henrik Weinstock
2023-10-20 18:45 ` Jan Henrik Weinstock
2023-10-20 19:56 ` Marc Zyngier
2023-10-20 19:56   ` Marc Zyngier
2023-10-25 12:12   ` Jan Henrik Weinstock
2023-10-25 12:12     ` Jan Henrik Weinstock
2023-10-25 12:42     ` Marc Zyngier
2023-10-25 12:42       ` Marc Zyngier
2023-10-27 17:41       ` Jan Henrik Weinstock
2023-10-27 17:41         ` Jan Henrik Weinstock
2023-10-30 12:36         ` Marc Zyngier
2023-10-30 12:36           ` Marc Zyngier
2023-10-31 19:21           ` Jan Henrik Weinstock
2023-10-31 19:21             ` Jan Henrik Weinstock
2023-11-04 12:13             ` Marc Zyngier
2023-11-04 12:13               ` Marc Zyngier
2023-11-08  9:38               ` Jan Henrik Weinstock
2023-11-08  9:38                 ` Jan Henrik Weinstock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.