All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-05-28 18:49 ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-05-28 18:49 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Marc Zyngier, borntraeger, Paolo Bonzini, Christoffer Dall

Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
rework recently posted by Christian Borntraeger.  I hope I got the logic
of this right, there were 2 slightly worrying facts about this:

First, we now enable and disable and enable interrupts on each exit
path, but I couldn't see any performance overhead on hackbench - yes the
only benchmark we care about.

Second, looking at the ppc and mips code, they seem to also call
kvm_guest_exit() before enabling interrupts, so I don't understand how
guest CPU time accounting works on those architectures.

Changes since v1:
 - Tweak comment and commit text based on Marc's feedback.
 - Explicitly disable preemption and enable it only after kvm_guest_exit().

 arch/arm/kvm/arm.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e41cb11..fe8028d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_vgic_flush_hwstate(vcpu);
 		kvm_timer_flush_hwstate(vcpu);
 
+		preempt_disable();
 		local_irq_disable();
 
 		/*
@@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			preempt_enable();
 			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			continue;
@@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-		__kvm_guest_exit();
-		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		/*
+		 * Back from guest
+		 *************************************************************/
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
@@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		local_irq_enable();
 
 		/*
-		 * Back from guest
-		 *************************************************************/
+		 * We do local_irq_enable() before calling kvm_guest_exit() so
+		 * that if a timer interrupt hits while running the guest we
+		 * account that tick as being spent in the guest.  We enable
+		 * preemption after calling kvm_guest_exit() so that if we get
+		 * preempted we make sure ticks after that is not counted as
+		 * guest time.
+		 */
+		kvm_guest_exit();
+		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		preempt_enable();
+
 
 		kvm_timer_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-05-28 18:49 ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-05-28 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
rework recently posted by Christian Borntraeger.  I hope I got the logic
of this right, there were 2 slightly worrying facts about this:

First, we now enable and disable and enable interrupts on each exit
path, but I couldn't see any performance overhead on hackbench - yes the
only benchmark we care about.

Second, looking at the ppc and mips code, they seem to also call
kvm_guest_exit() before enabling interrupts, so I don't understand how
guest CPU time accounting works on those architectures.

Changes since v1:
 - Tweak comment and commit text based on Marc's feedback.
 - Explicitly disable preemption and enable it only after kvm_guest_exit().

 arch/arm/kvm/arm.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e41cb11..fe8028d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_vgic_flush_hwstate(vcpu);
 		kvm_timer_flush_hwstate(vcpu);
 
+		preempt_disable();
 		local_irq_disable();
 
 		/*
@@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			preempt_enable();
 			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			continue;
@@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-		__kvm_guest_exit();
-		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		/*
+		 * Back from guest
+		 *************************************************************/
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
@@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		local_irq_enable();
 
 		/*
-		 * Back from guest
-		 *************************************************************/
+		 * We do local_irq_enable() before calling kvm_guest_exit() so
+		 * that if a timer interrupt hits while running the guest we
+		 * account that tick as being spent in the guest.  We enable
+		 * preemption after calling kvm_guest_exit() so that if we get
+		 * preempted we make sure ticks after that is not counted as
+		 * guest time.
+		 */
+		kvm_guest_exit();
+		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		preempt_enable();
+
 
 		kvm_timer_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-05-28 18:49 ` Christoffer Dall
@ 2015-05-29 22:34   ` Mario Smarduch
  -1 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-05-29 22:34 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.
> 
> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.
> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.
> 
> Changes since v1:
>  - Tweak comment and commit text based on Marc's feedback.
>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> 
>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e41cb11..fe8028d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		preempt_disable();
>  		local_irq_disable();
>  
>  		/*
> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			continue;
> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -		__kvm_guest_exit();
> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		local_irq_enable();
>  
>  		/*
> -		 * Back from guest
> -		 *************************************************************/
> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> +		 * that if a timer interrupt hits while running the guest we
> +		 * account that tick as being spent in the guest.  We enable
> +		 * preemption after calling kvm_guest_exit() so that if we get
> +		 * preempted we make sure ticks after that is not counted as
> +		 * guest time.
> +		 */
> +		kvm_guest_exit();
> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		preempt_enable();
> +
>  
>  		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
> 

Hi Christoffer,
 so currently we take a snap shot when we enter the guest
(tsk->vtime_snap) and upon exit add the time we spent in
the guest and update accrued time, which appears correct.

With this patch it appears that interrupts running
in host mode are accrued to Guest time, and additional preemption
latency is added.

Thanks,
- Mario

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-05-29 22:34   ` Mario Smarduch
  0 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-05-29 22:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.
> 
> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.
> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.
> 
> Changes since v1:
>  - Tweak comment and commit text based on Marc's feedback.
>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> 
>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e41cb11..fe8028d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		preempt_disable();
>  		local_irq_disable();
>  
>  		/*
> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			continue;
> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -		__kvm_guest_exit();
> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		local_irq_enable();
>  
>  		/*
> -		 * Back from guest
> -		 *************************************************************/
> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> +		 * that if a timer interrupt hits while running the guest we
> +		 * account that tick as being spent in the guest.  We enable
> +		 * preemption after calling kvm_guest_exit() so that if we get
> +		 * preempted we make sure ticks after that is not counted as
> +		 * guest time.
> +		 */
> +		kvm_guest_exit();
> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		preempt_enable();
> +
>  
>  		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
> 

Hi Christoffer,
 so currently we take a snap shot when we enter the guest
(tsk->vtime_snap) and upon exit add the time we spent in
the guest and update accrued time, which appears correct.

With this patch it appears that interrupts running
in host mode are accrued to Guest time, and additional preemption
latency is added.

Thanks,
- Mario

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-05-29 22:34   ` Mario Smarduch
@ 2015-05-31  6:59     ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-05-31  6:59 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, kvm, linux-arm-kernel, Marc Zyngier, borntraeger, Paolo Bonzini

Hi Mario,

On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> > 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> > 
> > Changes since v1:
> >  - Tweak comment and commit text based on Marc's feedback.
> >  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> > 
> >  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index e41cb11..fe8028d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_vgic_flush_hwstate(vcpu);
> >  		kvm_timer_flush_hwstate(vcpu);
> >  
> > +		preempt_disable();
> >  		local_irq_disable();
> >  
> >  		/*
> > @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			preempt_enable();
> >  			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			continue;
> > @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> > -		__kvm_guest_exit();
> > -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		/*
> > +		 * Back from guest
> > +		 *************************************************************/
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		local_irq_enable();
> >  
> >  		/*
> > -		 * Back from guest
> > -		 *************************************************************/
> > +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> > +		 * that if a timer interrupt hits while running the guest we
> > +		 * account that tick as being spent in the guest.  We enable
> > +		 * preemption after calling kvm_guest_exit() so that if we get
> > +		 * preempted we make sure ticks after that is not counted as
> > +		 * guest time.
> > +		 */
> > +		kvm_guest_exit();
> > +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		preempt_enable();
> > +
> >  
> >  		kvm_timer_sync_hwstate(vcpu);
> >  		kvm_vgic_sync_hwstate(vcpu);
> > 
> 
> Hi Christoffer,
>  so currently we take a snap shot when we enter the guest
> (tsk->vtime_snap) and upon exit add the time we spent in
> the guest and update accrued time, which appears correct.

not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
am I missing something obvious here?

> 
> With this patch it appears that interrupts running
> in host mode are accrued to Guest time, and additional preemption
> latency is added.
> 
It is true that interrupt processing in host mode (if they hit on a CPU
when it is running a guest) are accrued to guest time, but without this
patch on current arm64 we accrue no CPU time to guest time at all, which
is hardly more correct.

If this patch is incorrect, then how does it work on x86, where
handle_external_intr() is called (with a barrier in between) before
kvm_guest_exit(), and where handle_external_intr() is simply
local_irq_enable() on SVM and something more complicated on VMX ?

Finally, how exactly is preemption latency added here?  Won't IRQ
processing run with higher priority than any task on your system, so the
order of (1) process pending IRQs (2) call schedule if needed is still
preserved here, but we call kvm_guest_exit() between (1) and (2) instead
of before (1).

It is entirely possible that I'm missing the mark here and everything
gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
extra logic?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-05-31  6:59     ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-05-31  6:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mario,

On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> > 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> > 
> > Changes since v1:
> >  - Tweak comment and commit text based on Marc's feedback.
> >  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> > 
> >  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index e41cb11..fe8028d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_vgic_flush_hwstate(vcpu);
> >  		kvm_timer_flush_hwstate(vcpu);
> >  
> > +		preempt_disable();
> >  		local_irq_disable();
> >  
> >  		/*
> > @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			preempt_enable();
> >  			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			continue;
> > @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> > -		__kvm_guest_exit();
> > -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		/*
> > +		 * Back from guest
> > +		 *************************************************************/
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		local_irq_enable();
> >  
> >  		/*
> > -		 * Back from guest
> > -		 *************************************************************/
> > +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> > +		 * that if a timer interrupt hits while running the guest we
> > +		 * account that tick as being spent in the guest.  We enable
> > +		 * preemption after calling kvm_guest_exit() so that if we get
> > +		 * preempted we make sure ticks after that is not counted as
> > +		 * guest time.
> > +		 */
> > +		kvm_guest_exit();
> > +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		preempt_enable();
> > +
> >  
> >  		kvm_timer_sync_hwstate(vcpu);
> >  		kvm_vgic_sync_hwstate(vcpu);
> > 
> 
> Hi Christoffer,
>  so currently we take a snap shot when we enter the guest
> (tsk->vtime_snap) and upon exit add the time we spent in
> the guest and update accrued time, which appears correct.

not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
am I missing something obvious here?

> 
> With this patch it appears that interrupts running
> in host mode are accrued to Guest time, and additional preemption
> latency is added.
> 
It is true that interrupt processing in host mode (if they hit on a CPU
when it is running a guest) are accrued to guest time, but without this
patch on current arm64 we accrue no CPU time to guest time at all, which
is hardly more correct.

If this patch is incorrect, then how does it work on x86, where
handle_external_intr() is called (with a barrier in between) before
kvm_guest_exit(), and where handle_external_intr() is simply
local_irq_enable() on SVM and something more complicated on VMX ?

Finally, how exactly is preemption latency added here?  Won't IRQ
processing run with higher priority than any task on your system, so the
order of (1) process pending IRQs (2) call schedule if needed is still
preserved here, but we call kvm_guest_exit() between (1) and (2) instead
of before (1).

It is entirely possible that I'm missing the mark here and everything
gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
extra logic?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-05-28 18:49 ` Christoffer Dall
@ 2015-06-01  7:47   ` Christian Borntraeger
  -1 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01  7:47 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, kvm, linux-arm-kernel
  Cc: Marc Zyngier, Paolo Bonzini

Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.

Can you verify that a CPU bound guest has almost zero guest time?
Assuming that your answer is "yes" your patch make sense as host
timer interrupts should be the only reasons for guest exits then.


> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.

This should be somewhat similar to the situation before my patch.
There it was

1: "disable", "guest", "disable again and save", "restore to disable", "enable"
and now it is
2: "disable", "guest", "enable"
and with your patch it is
3: "disable", "guest", "enable", "disable, "enable"

I assume that 3 and 1 are similar in its costs, so this is probably ok.


> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.

Not an expert here, but I assume mips has the same logic as arm so if your
patch is right for arm its probably also for mips.

powerpc looks similar to what s390 does (not using the tick, instead it uses
a hw-timer) so this should be fine.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01  7:47   ` Christian Borntraeger
  0 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01  7:47 UTC (permalink / raw)
  To: linux-arm-kernel

Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.

Can you verify that a CPU bound guest has almost zero guest time?
Assuming that your answer is "yes" your patch make sense as host
timer interrupts should be the only reasons for guest exits then.


> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.

This should be somewhat similar to the situation before my patch.
There it was

1: "disable", "guest", "disable again and save", "restore to disable", "enable"
and now it is
2: "disable", "guest", "enable"
and with your patch it is
3: "disable", "guest", "enable", "disable, "enable"

I assume that 3 and 1 are similar in its costs, so this is probably ok.


> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.

Not an expert here, but I assume mips has the same logic as arm so if your
patch is right for arm its probably also for mips.

powerpc looks similar to what s390 does (not using the tick, instead it uses
a hw-timer) so this should be fine.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01  7:47   ` Christian Borntraeger
@ 2015-06-01  9:08     ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-01  9:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kvmarm, kvm, linux-arm-kernel, Marc Zyngier, Paolo Bonzini

On Mon, Jun 01, 2015 at 09:47:46AM +0200, Christian Borntraeger wrote:
> Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> 
> Can you verify that a CPU bound guest has almost zero guest time?
> Assuming that your answer is "yes" your patch make sense as host
> timer interrupts should be the only reasons for guest exits then.
> 

Yes, 'cat /dev/urandom > /dev/null' in the guest shows up as 100% sys in
mpstat on the host, 0% guest.

> 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> 
> This should be somewhat similar to the situation before my patch.
> There it was
> 
> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
> and now it is
> 2: "disable", "guest", "enable"
> and with your patch it is
> 3: "disable", "guest", "enable", "disable, "enable"
> 
> I assume that 3 and 1 are similar in its costs, so this is probably ok.
> 
> 
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> 
> Not an expert here, but I assume mips has the same logic as arm so if your
> patch is right for arm its probably also for mips.
> 
> powerpc looks similar to what s390 does (not using the tick, instead it uses
> a hw-timer) so this should be fine.
> 
I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
this for free which would avoid the need for this patch?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01  9:08     ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-01  9:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 01, 2015 at 09:47:46AM +0200, Christian Borntraeger wrote:
> Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> 
> Can you verify that a CPU bound guest has almost zero guest time?
> Assuming that your answer is "yes" your patch make sense as host
> timer interrupts should be the only reasons for guest exits then.
> 

Yes, 'cat /dev/urandom > /dev/null' in the guest shows up as 100% sys in
mpstat on the host, 0% guest.

> 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> 
> This should be somewhat similar to the situation before my patch.
> There it was
> 
> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
> and now it is
> 2: "disable", "guest", "enable"
> and with your patch it is
> 3: "disable", "guest", "enable", "disable, "enable"
> 
> I assume that 3 and 1 are similar in its costs, so this is probably ok.
> 
> 
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> 
> Not an expert here, but I assume mips has the same logic as arm so if your
> patch is right for arm its probably also for mips.
> 
> powerpc looks similar to what s390 does (not using the tick, instead it uses
> a hw-timer) so this should be fine.
> 
I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
this for free which would avoid the need for this patch?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01  9:08     ` Christoffer Dall
@ 2015-06-01  9:21       ` Christian Borntraeger
  -1 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01  9:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, kvm, linux-arm-kernel, Marc Zyngier, Paolo Bonzini

Am 01.06.2015 um 11:08 schrieb Christoffer Dall:

>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>
>> Not an expert here, but I assume mips has the same logic as arm so if your
>> patch is right for arm its probably also for mips.
>>
>> powerpc looks similar to what s390 does (not using the tick, instead it uses
>> a hw-timer) so this should be fine.
>>
> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> this for free which would avoid the need for this patch?

Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
- yes it might work out. Can you give it a try?

Christian



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01  9:21       ` Christian Borntraeger
  0 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

Am 01.06.2015 um 11:08 schrieb Christoffer Dall:

>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>
>> Not an expert here, but I assume mips has the same logic as arm so if your
>> patch is right for arm its probably also for mips.
>>
>> powerpc looks similar to what s390 does (not using the tick, instead it uses
>> a hw-timer) so this should be fine.
>>
> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> this for free which would avoid the need for this patch?

Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
- yes it might work out. Can you give it a try?

Christian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01  7:47   ` Christian Borntraeger
@ 2015-06-01 11:34     ` Paolo Bonzini
  -1 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2015-06-01 11:34 UTC (permalink / raw)
  To: Christian Borntraeger, Christoffer Dall, kvmarm, kvm, linux-arm-kernel
  Cc: Marc Zyngier



On 01/06/2015 09:47, Christian Borntraeger wrote:
> 
> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
> and now it is
> 2: "disable", "guest", "enable"
> and with your patch it is
> 3: "disable", "guest", "enable", "disable, "enable"
> 
> I assume that 3 and 1 are similar in its costs, so this is probably ok.

At least on x86, 3 and 2 are similar, but 3 is much more expensive than
1!  See https://lkml.org/lkml/2015/5/5/835:

Cost of: CLI                         insn  same-IF :     0 cycles
Cost of: CLI                         insn  flip-IF :     0 cycles
Cost of: STI                         insn  same-IF :     0 cycles
Cost of: STI                         insn  flip-IF :     0 cycles
Cost of: PUSHF                       insn          :     0 cycles
Cost of: POPF                        insn  same-IF :    20 cycles
Cost of: POPF                        insn  flip-IF :    28 cycles
Cost of: local_irq_save()            fn            :    20 cycles
Cost of: local_irq_restore()         fn    same-IF :    24 cycles
Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 11:34     ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2015-06-01 11:34 UTC (permalink / raw)
  To: linux-arm-kernel



On 01/06/2015 09:47, Christian Borntraeger wrote:
> 
> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
> and now it is
> 2: "disable", "guest", "enable"
> and with your patch it is
> 3: "disable", "guest", "enable", "disable, "enable"
> 
> I assume that 3 and 1 are similar in its costs, so this is probably ok.

At least on x86, 3 and 2 are similar, but 3 is much more expensive than
1!  See https://lkml.org/lkml/2015/5/5/835:

Cost of: CLI                         insn  same-IF :     0 cycles
Cost of: CLI                         insn  flip-IF :     0 cycles
Cost of: STI                         insn  same-IF :     0 cycles
Cost of: STI                         insn  flip-IF :     0 cycles
Cost of: PUSHF                       insn          :     0 cycles
Cost of: POPF                        insn  same-IF :    20 cycles
Cost of: POPF                        insn  flip-IF :    28 cycles
Cost of: local_irq_save()            fn            :    20 cycles
Cost of: local_irq_restore()         fn    same-IF :    24 cycles
Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01 11:34     ` Paolo Bonzini
@ 2015-06-01 11:42       ` Christian Borntraeger
  -1 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01 11:42 UTC (permalink / raw)
  To: Paolo Bonzini, Christoffer Dall, kvmarm, kvm, linux-arm-kernel
  Cc: Marc Zyngier

Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:
> 
> 
> On 01/06/2015 09:47, Christian Borntraeger wrote:
>>
>> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
>> and now it is
>> 2: "disable", "guest", "enable"
>> and with your patch it is
>> 3: "disable", "guest", "enable", "disable, "enable"
>>
>> I assume that 3 and 1 are similar in its costs, so this is probably ok.
> 
> At least on x86, 3 and 2 are similar, but 3 is much more expensive than
> 1!  See https://lkml.org/lkml/2015/5/5/835:

That does not make sense. If 3 and 2 are similar, then 2 must be much more
expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?


> 
> Cost of: CLI                         insn  same-IF :     0 cycles
> Cost of: CLI                         insn  flip-IF :     0 cycles
> Cost of: STI                         insn  same-IF :     0 cycles
> Cost of: STI                         insn  flip-IF :     0 cycles
> Cost of: PUSHF                       insn          :     0 cycles
> Cost of: POPF                        insn  same-IF :    20 cycles
> Cost of: POPF                        insn  flip-IF :    28 cycles
> Cost of: local_irq_save()            fn            :    20 cycles
> Cost of: local_irq_restore()         fn    same-IF :    24 cycles
> Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
> Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
> Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles

Yes its similar on s390. local_irq_save/restore is noticable in guest exit
hot loops (thats what inspired my patch), but a simple irq disable is
just single cycle pipelined. Given the design of aggressive out-out order
designs with all the architectural ordering this makes sense.

Christian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 11:42       ` Christian Borntraeger
  0 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01 11:42 UTC (permalink / raw)
  To: linux-arm-kernel

Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:
> 
> 
> On 01/06/2015 09:47, Christian Borntraeger wrote:
>>
>> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
>> and now it is
>> 2: "disable", "guest", "enable"
>> and with your patch it is
>> 3: "disable", "guest", "enable", "disable, "enable"
>>
>> I assume that 3 and 1 are similar in its costs, so this is probably ok.
> 
> At least on x86, 3 and 2 are similar, but 3 is much more expensive than
> 1!  See https://lkml.org/lkml/2015/5/5/835:

That does not make sense. If 3 and 2 are similar, then 2 must be much more
expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?


> 
> Cost of: CLI                         insn  same-IF :     0 cycles
> Cost of: CLI                         insn  flip-IF :     0 cycles
> Cost of: STI                         insn  same-IF :     0 cycles
> Cost of: STI                         insn  flip-IF :     0 cycles
> Cost of: PUSHF                       insn          :     0 cycles
> Cost of: POPF                        insn  same-IF :    20 cycles
> Cost of: POPF                        insn  flip-IF :    28 cycles
> Cost of: local_irq_save()            fn            :    20 cycles
> Cost of: local_irq_restore()         fn    same-IF :    24 cycles
> Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
> Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
> Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles

Yes its similar on s390. local_irq_save/restore is noticable in guest exit
hot loops (thats what inspired my patch), but a simple irq disable is
just single cycle pipelined. Given the design of aggressive out-out order
designs with all the architectural ordering this makes sense.

Christian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01 11:42       ` Christian Borntraeger
@ 2015-06-01 11:52         ` Paolo Bonzini
  -1 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2015-06-01 11:52 UTC (permalink / raw)
  To: Christian Borntraeger, Christoffer Dall, kvmarm, kvm, linux-arm-kernel
  Cc: Marc Zyngier



On 01/06/2015 13:42, Christian Borntraeger wrote:
> Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:
>>
>>
>> On 01/06/2015 09:47, Christian Borntraeger wrote:
>>>
>>> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
>>> and now it is
>>> 2: "disable", "guest", "enable"
>>> and with your patch it is
>>> 3: "disable", "guest", "enable", "disable, "enable"
>>>
>>> I assume that 3 and 1 are similar in its costs, so this is probably ok.
>>
>> At least on x86, 3 and 2 are similar, but 3 is much more expensive than
>> 1!  See https://lkml.org/lkml/2015/5/5/835:
> 
> That does not make sense. If 3 and 2 are similar, then 2 must be much more
> expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?

Yes, it must.  I meant 3 is much cheaper than 1.

Paolo

>> Cost of: CLI                         insn  same-IF :     0 cycles
>> Cost of: CLI                         insn  flip-IF :     0 cycles
>> Cost of: STI                         insn  same-IF :     0 cycles
>> Cost of: STI                         insn  flip-IF :     0 cycles
>> Cost of: PUSHF                       insn          :     0 cycles
>> Cost of: POPF                        insn  same-IF :    20 cycles
>> Cost of: POPF                        insn  flip-IF :    28 cycles
>> Cost of: local_irq_save()            fn            :    20 cycles
>> Cost of: local_irq_restore()         fn    same-IF :    24 cycles
>> Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
>> Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
>> Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles
> 
> Yes its similar on s390. local_irq_save/restore is noticable in guest exit
> hot loops (thats what inspired my patch), but a simple irq disable is
> just single cycle pipelined. Given the design of aggressive out-out order
> designs with all the architectural ordering this makes sense.
> 
> Christian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 11:52         ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2015-06-01 11:52 UTC (permalink / raw)
  To: linux-arm-kernel



On 01/06/2015 13:42, Christian Borntraeger wrote:
> Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:
>>
>>
>> On 01/06/2015 09:47, Christian Borntraeger wrote:
>>>
>>> 1: "disable", "guest", "disable again and save", "restore to disable", "enable"
>>> and now it is
>>> 2: "disable", "guest", "enable"
>>> and with your patch it is
>>> 3: "disable", "guest", "enable", "disable, "enable"
>>>
>>> I assume that 3 and 1 are similar in its costs, so this is probably ok.
>>
>> At least on x86, 3 and 2 are similar, but 3 is much more expensive than
>> 1!  See https://lkml.org/lkml/2015/5/5/835:
> 
> That does not make sense. If 3 and 2 are similar, then 2 must be much more
> expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?

Yes, it must.  I meant 3 is much cheaper than 1.

Paolo

>> Cost of: CLI                         insn  same-IF :     0 cycles
>> Cost of: CLI                         insn  flip-IF :     0 cycles
>> Cost of: STI                         insn  same-IF :     0 cycles
>> Cost of: STI                         insn  flip-IF :     0 cycles
>> Cost of: PUSHF                       insn          :     0 cycles
>> Cost of: POPF                        insn  same-IF :    20 cycles
>> Cost of: POPF                        insn  flip-IF :    28 cycles
>> Cost of: local_irq_save()            fn            :    20 cycles
>> Cost of: local_irq_restore()         fn    same-IF :    24 cycles
>> Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
>> Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
>> Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles
> 
> Yes its similar on s390. local_irq_save/restore is noticable in guest exit
> hot loops (thats what inspired my patch), but a simple irq disable is
> just single cycle pipelined. Given the design of aggressive out-out order
> designs with all the architectural ordering this makes sense.
> 
> Christian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01  9:21       ` Christian Borntraeger
@ 2015-06-01 13:35         ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-01 13:35 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kvmarm, kvm, linux-arm-kernel, Marc Zyngier, Paolo Bonzini

On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
> 
> >>>
> >>> Second, looking at the ppc and mips code, they seem to also call
> >>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>> guest CPU time accounting works on those architectures.
> >>
> >> Not an expert here, but I assume mips has the same logic as arm so if your
> >> patch is right for arm its probably also for mips.
> >>
> >> powerpc looks similar to what s390 does (not using the tick, instead it uses
> >> a hw-timer) so this should be fine.
> >>
> > I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> > this for free which would avoid the need for this patch?
> 
> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
> - yes it might work out. Can you give it a try?
> 
Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
like a fix since it would be a shame to force users to use this config
option to report CPU usage correctly.

I'm not entirely sure what the history and meaning behind these configs
are, so maybe there is an entirely different rework needed here.  It
seems logical that you could simply sample the counter at entry/exit of
the guest, but if there is nowhere to store this data without
NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 13:35         ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-01 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
> 
> >>>
> >>> Second, looking at the ppc and mips code, they seem to also call
> >>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>> guest CPU time accounting works on those architectures.
> >>
> >> Not an expert here, but I assume mips has the same logic as arm so if your
> >> patch is right for arm its probably also for mips.
> >>
> >> powerpc looks similar to what s390 does (not using the tick, instead it uses
> >> a hw-timer) so this should be fine.
> >>
> > I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> > this for free which would avoid the need for this patch?
> 
> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
> - yes it might work out. Can you give it a try?
> 
Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
like a fix since it would be a shame to force users to use this config
option to report CPU usage correctly.

I'm not entirely sure what the history and meaning behind these configs
are, so maybe there is an entirely different rework needed here.  It
seems logical that you could simply sample the counter at entry/exit of
the guest, but if there is nowhere to store this data without
NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01 13:35         ` Christoffer Dall
@ 2015-06-01 13:37           ` Christian Borntraeger
  -1 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01 13:37 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, kvm, linux-arm-kernel, Marc Zyngier, Paolo Bonzini

Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
> On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
>> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
>>
>>>>>
>>>>> Second, looking at the ppc and mips code, they seem to also call
>>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>>>> guest CPU time accounting works on those architectures.
>>>>
>>>> Not an expert here, but I assume mips has the same logic as arm so if your
>>>> patch is right for arm its probably also for mips.
>>>>
>>>> powerpc looks similar to what s390 does (not using the tick, instead it uses
>>>> a hw-timer) so this should be fine.
>>>>
>>> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
>>> this for free which would avoid the need for this patch?
>>
>> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
>> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
>> - yes it might work out. Can you give it a try?
>>
> Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
> no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
> like a fix since it would be a shame to force users to use this config
> option to report CPU usage correctly.
> 
> I'm not entirely sure what the history and meaning behind these configs
> are, so maybe there is an entirely different rework needed here.  It
> seems logical that you could simply sample the counter at entry/exit of
> the guest, but if there is nowhere to store this data without
> NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Given Paolos response that irq_disable/enable is faster than save/restore
at least on x86 your v2 patch might actually be the right thing to do.

Christian


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 13:37           ` Christian Borntraeger
  0 siblings, 0 replies; 42+ messages in thread
From: Christian Borntraeger @ 2015-06-01 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
> On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
>> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
>>
>>>>>
>>>>> Second, looking at the ppc and mips code, they seem to also call
>>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>>>> guest CPU time accounting works on those architectures.
>>>>
>>>> Not an expert here, but I assume mips has the same logic as arm so if your
>>>> patch is right for arm its probably also for mips.
>>>>
>>>> powerpc looks similar to what s390 does (not using the tick, instead it uses
>>>> a hw-timer) so this should be fine.
>>>>
>>> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
>>> this for free which would avoid the need for this patch?
>>
>> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
>> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
>> - yes it might work out. Can you give it a try?
>>
> Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
> no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
> like a fix since it would be a shame to force users to use this config
> option to report CPU usage correctly.
> 
> I'm not entirely sure what the history and meaning behind these configs
> are, so maybe there is an entirely different rework needed here.  It
> seems logical that you could simply sample the counter at entry/exit of
> the guest, but if there is nowhere to store this data without
> NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Given Paolos response that irq_disable/enable is faster than save/restore
at least on x86 your v2 patch might actually be the right thing to do.

Christian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-05-31  6:59     ` Christoffer Dall
@ 2015-06-01 15:48       ` Mario Smarduch
  -1 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-01 15:48 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> Hi Mario,
> 
> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>> Until now we have been calling kvm_guest_exit after re-enabling
>>> interrupts when we come back from the guest, but this has the
>>> unfortunate effect that CPU time accounting done in the context of timer
>>> interrupts occurring while the guest is running doesn't properly notice
>>> that the time since the last tick was spent in the guest.
>>>
>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>> kvm_guest_exit(), because we are now calling this function with
>>> interrupts enabled.  We have to now explicitly disable preemption and
>>> not enable preemption before we've called kvm_guest_exit(), since
>>> otherwise we could be preempted and everything happening before we
>>> eventually get scheduled again would be accounted for as guest time.
>>>
>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>> section, since there is no reason for us to do that with interrupts
>>> disabled.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>> of this right, there were 2 slightly worrying facts about this:
>>>
>>> First, we now enable and disable and enable interrupts on each exit
>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>> only benchmark we care about.
>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>>
>>> Changes since v1:
>>>  - Tweak comment and commit text based on Marc's feedback.
>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>
>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index e41cb11..fe8028d 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>  		kvm_timer_flush_hwstate(vcpu);
>>>  
>>> +		preempt_disable();
>>>  		local_irq_disable();
>>>  
>>>  		/*
>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			preempt_enable();
>>>  			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			continue;
>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>  
>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>> -		__kvm_guest_exit();
>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		/*
>>> +		 * Back from guest
>>> +		 *************************************************************/
>>> +
>>>  		/*
>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>  		 * while executing the guest). This interrupt is still
>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		local_irq_enable();
>>>  
>>>  		/*
>>> -		 * Back from guest
>>> -		 *************************************************************/
>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>> +		 * that if a timer interrupt hits while running the guest we
>>> +		 * account that tick as being spent in the guest.  We enable
>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>> +		 * preempted we make sure ticks after that is not counted as
>>> +		 * guest time.
>>> +		 */
>>> +		kvm_guest_exit();
>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		preempt_enable();
>>> +
>>>  
>>>  		kvm_timer_sync_hwstate(vcpu);
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>
>>
>> Hi Christoffer,
>>  so currently we take a snap shot when we enter the guest
>> (tsk->vtime_snap) and upon exit add the time we spent in
>> the guest and update accrued time, which appears correct.
> 
> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> am I missing something obvious here?
I see what you mean we can't use cycle based accounting to accrue
Guest time.

> 
>>
>> With this patch it appears that interrupts running
>> in host mode are accrued to Guest time, and additional preemption
>> latency is added.
>>
> It is true that interrupt processing in host mode (if they hit on a CPU
> when it is running a guest) are accrued to guest time, but without this
> patch on current arm64 we accrue no CPU time to guest time at all, which
> is hardly more correct.
Yes if only ticks are supported then it's definitely better!

Nevertheless with high interrupt rate it will complicate root causing
issues, a lot of that time will go to guest.

> 
> If this patch is incorrect, then how does it work on x86, where
> handle_external_intr() is called (with a barrier in between) before
> kvm_guest_exit(), and where handle_external_intr() is simply
> local_irq_enable() on SVM and something more complicated on VMX ?
> 
> Finally, how exactly is preemption latency added here?   Won't IRQ
> processing run with higher priority than any task on your system, so the
> order of (1) process pending IRQs (2) call schedule if needed is still
> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> of before (1).

I may be missing something, but on return from interrupt with preempt
disabled we can't take the need resched path. And need to return
to KVM no?
> 
> It is entirely possible that I'm missing the mark here and everything
> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> extra logic?

I think something to look into for us, taking a low issue to high level
application - for state machine based type of applications (I guess like
NFV) it cause problems  to root cause issues, a lot of activities
run between ticks. For VM cycle based accounting is probably a must
in that case.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-01 15:48       ` Mario Smarduch
  0 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-01 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> Hi Mario,
> 
> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>> Until now we have been calling kvm_guest_exit after re-enabling
>>> interrupts when we come back from the guest, but this has the
>>> unfortunate effect that CPU time accounting done in the context of timer
>>> interrupts occurring while the guest is running doesn't properly notice
>>> that the time since the last tick was spent in the guest.
>>>
>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>> kvm_guest_exit(), because we are now calling this function with
>>> interrupts enabled.  We have to now explicitly disable preemption and
>>> not enable preemption before we've called kvm_guest_exit(), since
>>> otherwise we could be preempted and everything happening before we
>>> eventually get scheduled again would be accounted for as guest time.
>>>
>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>> section, since there is no reason for us to do that with interrupts
>>> disabled.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>> of this right, there were 2 slightly worrying facts about this:
>>>
>>> First, we now enable and disable and enable interrupts on each exit
>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>> only benchmark we care about.
>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>>
>>> Changes since v1:
>>>  - Tweak comment and commit text based on Marc's feedback.
>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>
>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index e41cb11..fe8028d 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>  		kvm_timer_flush_hwstate(vcpu);
>>>  
>>> +		preempt_disable();
>>>  		local_irq_disable();
>>>  
>>>  		/*
>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			preempt_enable();
>>>  			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			continue;
>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>  
>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>> -		__kvm_guest_exit();
>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		/*
>>> +		 * Back from guest
>>> +		 *************************************************************/
>>> +
>>>  		/*
>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>  		 * while executing the guest). This interrupt is still
>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		local_irq_enable();
>>>  
>>>  		/*
>>> -		 * Back from guest
>>> -		 *************************************************************/
>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>> +		 * that if a timer interrupt hits while running the guest we
>>> +		 * account that tick as being spent in the guest.  We enable
>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>> +		 * preempted we make sure ticks after that is not counted as
>>> +		 * guest time.
>>> +		 */
>>> +		kvm_guest_exit();
>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		preempt_enable();
>>> +
>>>  
>>>  		kvm_timer_sync_hwstate(vcpu);
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>
>>
>> Hi Christoffer,
>>  so currently we take a snap shot when we enter the guest
>> (tsk->vtime_snap) and upon exit add the time we spent in
>> the guest and update accrued time, which appears correct.
> 
> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> am I missing something obvious here?
I see what you mean we can't use cycle based accounting to accrue
Guest time.

> 
>>
>> With this patch it appears that interrupts running
>> in host mode are accrued to Guest time, and additional preemption
>> latency is added.
>>
> It is true that interrupt processing in host mode (if they hit on a CPU
> when it is running a guest) are accrued to guest time, but without this
> patch on current arm64 we accrue no CPU time to guest time at all, which
> is hardly more correct.
Yes if only ticks are supported then it's definitely better!

Nevertheless with high interrupt rate it will complicate root causing
issues, a lot of that time will go to guest.

> 
> If this patch is incorrect, then how does it work on x86, where
> handle_external_intr() is called (with a barrier in between) before
> kvm_guest_exit(), and where handle_external_intr() is simply
> local_irq_enable() on SVM and something more complicated on VMX ?
> 
> Finally, how exactly is preemption latency added here?   Won't IRQ
> processing run with higher priority than any task on your system, so the
> order of (1) process pending IRQs (2) call schedule if needed is still
> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> of before (1).

I may be missing something, but on return from interrupt with preempt
disabled we can't take the need resched path. And need to return
to KVM no?
> 
> It is entirely possible that I'm missing the mark here and everything
> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> extra logic?

I think something to look into for us, taking a low issue to high level
application - for state machine based type of applications (I guess like
NFV) it cause problems  to root cause issues, a lot of activities
run between ticks. For VM cycle based accounting is probably a must
in that case.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01 15:48       ` Mario Smarduch
@ 2015-06-02  9:27         ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02  9:27 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> > Hi Mario,
> > 
> > On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> >> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> >>> Until now we have been calling kvm_guest_exit after re-enabling
> >>> interrupts when we come back from the guest, but this has the
> >>> unfortunate effect that CPU time accounting done in the context of timer
> >>> interrupts occurring while the guest is running doesn't properly notice
> >>> that the time since the last tick was spent in the guest.
> >>>
> >>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> >>> below the local_irq_enable() call and change __kvm_guest_exit() to
> >>> kvm_guest_exit(), because we are now calling this function with
> >>> interrupts enabled.  We have to now explicitly disable preemption and
> >>> not enable preemption before we've called kvm_guest_exit(), since
> >>> otherwise we could be preempted and everything happening before we
> >>> eventually get scheduled again would be accounted for as guest time.
> >>>
> >>> At the same time, move the trace_kvm_exit() call outside of the atomic
> >>> section, since there is no reason for us to do that with interrupts
> >>> disabled.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> >>> rework recently posted by Christian Borntraeger.  I hope I got the logic
> >>> of this right, there were 2 slightly worrying facts about this:
> >>>
> >>> First, we now enable and disable and enable interrupts on each exit
> >>> path, but I couldn't see any performance overhead on hackbench - yes the
> >>> only benchmark we care about.
> >>>
> >>> Second, looking at the ppc and mips code, they seem to also call
> >>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>> guest CPU time accounting works on those architectures.
> >>>
> >>> Changes since v1:
> >>>  - Tweak comment and commit text based on Marc's feedback.
> >>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> >>>
> >>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >>>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index e41cb11..fe8028d 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		kvm_vgic_flush_hwstate(vcpu);
> >>>  		kvm_timer_flush_hwstate(vcpu);
> >>>  
> >>> +		preempt_disable();
> >>>  		local_irq_disable();
> >>>  
> >>>  		/*
> >>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  
> >>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>  			local_irq_enable();
> >>> +			preempt_enable();
> >>>  			kvm_timer_sync_hwstate(vcpu);
> >>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>  			continue;
> >>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>>  
> >>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >>> -		__kvm_guest_exit();
> >>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>> +		/*
> >>> +		 * Back from guest
> >>> +		 *************************************************************/
> >>> +
> >>>  		/*
> >>>  		 * We may have taken a host interrupt in HYP mode (ie
> >>>  		 * while executing the guest). This interrupt is still
> >>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		local_irq_enable();
> >>>  
> >>>  		/*
> >>> -		 * Back from guest
> >>> -		 *************************************************************/
> >>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> >>> +		 * that if a timer interrupt hits while running the guest we
> >>> +		 * account that tick as being spent in the guest.  We enable
> >>> +		 * preemption after calling kvm_guest_exit() so that if we get
> >>> +		 * preempted we make sure ticks after that is not counted as
> >>> +		 * guest time.
> >>> +		 */
> >>> +		kvm_guest_exit();
> >>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>> +		preempt_enable();
> >>> +
> >>>  
> >>>  		kvm_timer_sync_hwstate(vcpu);
> >>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>
> >>
> >> Hi Christoffer,
> >>  so currently we take a snap shot when we enter the guest
> >> (tsk->vtime_snap) and upon exit add the time we spent in
> >> the guest and update accrued time, which appears correct.
> > 
> > not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> > am I missing something obvious here?
> I see what you mean we can't use cycle based accounting to accrue
> Guest time.
> 

See other thread, we can enable this in the config but it still only
works with NO_HZ_FULL.

> > 
> >>
> >> With this patch it appears that interrupts running
> >> in host mode are accrued to Guest time, and additional preemption
> >> latency is added.
> >>
> > It is true that interrupt processing in host mode (if they hit on a CPU
> > when it is running a guest) are accrued to guest time, but without this
> > patch on current arm64 we accrue no CPU time to guest time at all, which
> > is hardly more correct.
> Yes if only ticks are supported then it's definitely better!
> 
> Nevertheless with high interrupt rate it will complicate root causing
> issues, a lot of that time will go to guest.

That sounds like a separate fix to me; if there's an existing mechanism
to account for time spent on hw IRQs and it is somehow invalidated by
the PF_VCPU flag being set, then that feels wrong, at least how arm64
works, but it doesn't make this patch less correct.

> 
> > 
> > If this patch is incorrect, then how does it work on x86, where
> > handle_external_intr() is called (with a barrier in between) before
> > kvm_guest_exit(), and where handle_external_intr() is simply
> > local_irq_enable() on SVM and something more complicated on VMX ?
> > 
> > Finally, how exactly is preemption latency added here?   Won't IRQ
> > processing run with higher priority than any task on your system, so the
> > order of (1) process pending IRQs (2) call schedule if needed is still
> > preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> > of before (1).
> 
> I may be missing something, but on return from interrupt with preempt
> disabled we can't take the need resched path. And need to return
> to KVM no?

preempt_enable() will call __preempt_schedule() and cause preemption
there, so you're talking about adding these lines of latency:

	kvm_guest_exit();
	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

And these were called with interrupts disabled before, so I don't see
the issue??

However, your question is making me think whether we have a race in the
current code on fully preemptible kernels, if we get preempted before
calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
could potentially schedule another vcpu on this core and loose/corrupt
state, can we not?  We probably need to check for this in
kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
real issue or if I'm seeing ghosts.

> > 
> > It is entirely possible that I'm missing the mark here and everything
> > gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> > extra logic?
> 
> I think something to look into for us, taking a low issue to high level
> application - for state machine based type of applications (I guess like
> NFV) it cause problems  to root cause issues, a lot of activities
> run between ticks. For VM cycle based accounting is probably a must
> in that case.
> 
Would you run with NO_HZ_FULL in this case?  Because then we should just
enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
start.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-02  9:27         ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> > Hi Mario,
> > 
> > On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> >> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> >>> Until now we have been calling kvm_guest_exit after re-enabling
> >>> interrupts when we come back from the guest, but this has the
> >>> unfortunate effect that CPU time accounting done in the context of timer
> >>> interrupts occurring while the guest is running doesn't properly notice
> >>> that the time since the last tick was spent in the guest.
> >>>
> >>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> >>> below the local_irq_enable() call and change __kvm_guest_exit() to
> >>> kvm_guest_exit(), because we are now calling this function with
> >>> interrupts enabled.  We have to now explicitly disable preemption and
> >>> not enable preemption before we've called kvm_guest_exit(), since
> >>> otherwise we could be preempted and everything happening before we
> >>> eventually get scheduled again would be accounted for as guest time.
> >>>
> >>> At the same time, move the trace_kvm_exit() call outside of the atomic
> >>> section, since there is no reason for us to do that with interrupts
> >>> disabled.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> >>> rework recently posted by Christian Borntraeger.  I hope I got the logic
> >>> of this right, there were 2 slightly worrying facts about this:
> >>>
> >>> First, we now enable and disable and enable interrupts on each exit
> >>> path, but I couldn't see any performance overhead on hackbench - yes the
> >>> only benchmark we care about.
> >>>
> >>> Second, looking at the ppc and mips code, they seem to also call
> >>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>> guest CPU time accounting works on those architectures.
> >>>
> >>> Changes since v1:
> >>>  - Tweak comment and commit text based on Marc's feedback.
> >>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> >>>
> >>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >>>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index e41cb11..fe8028d 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		kvm_vgic_flush_hwstate(vcpu);
> >>>  		kvm_timer_flush_hwstate(vcpu);
> >>>  
> >>> +		preempt_disable();
> >>>  		local_irq_disable();
> >>>  
> >>>  		/*
> >>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  
> >>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>  			local_irq_enable();
> >>> +			preempt_enable();
> >>>  			kvm_timer_sync_hwstate(vcpu);
> >>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>  			continue;
> >>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>>  
> >>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >>> -		__kvm_guest_exit();
> >>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>> +		/*
> >>> +		 * Back from guest
> >>> +		 *************************************************************/
> >>> +
> >>>  		/*
> >>>  		 * We may have taken a host interrupt in HYP mode (ie
> >>>  		 * while executing the guest). This interrupt is still
> >>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		local_irq_enable();
> >>>  
> >>>  		/*
> >>> -		 * Back from guest
> >>> -		 *************************************************************/
> >>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> >>> +		 * that if a timer interrupt hits while running the guest we
> >>> +		 * account that tick as being spent in the guest.  We enable
> >>> +		 * preemption after calling kvm_guest_exit() so that if we get
> >>> +		 * preempted we make sure ticks after that is not counted as
> >>> +		 * guest time.
> >>> +		 */
> >>> +		kvm_guest_exit();
> >>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>> +		preempt_enable();
> >>> +
> >>>  
> >>>  		kvm_timer_sync_hwstate(vcpu);
> >>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>
> >>
> >> Hi Christoffer,
> >>  so currently we take a snap shot when we enter the guest
> >> (tsk->vtime_snap) and upon exit add the time we spent in
> >> the guest and update accrued time, which appears correct.
> > 
> > not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> > am I missing something obvious here?
> I see what you mean we can't use cycle based accounting to accrue
> Guest time.
> 

See other thread, we can enable this in the config but it still only
works with NO_HZ_FULL.

> > 
> >>
> >> With this patch it appears that interrupts running
> >> in host mode are accrued to Guest time, and additional preemption
> >> latency is added.
> >>
> > It is true that interrupt processing in host mode (if they hit on a CPU
> > when it is running a guest) are accrued to guest time, but without this
> > patch on current arm64 we accrue no CPU time to guest time at all, which
> > is hardly more correct.
> Yes if only ticks are supported then it's definitely better!
> 
> Nevertheless with high interrupt rate it will complicate root causing
> issues, a lot of that time will go to guest.

That sounds like a separate fix to me; if there's an existing mechanism
to account for time spent on hw IRQs and it is somehow invalidated by
the PF_VCPU flag being set, then that feels wrong, at least how arm64
works, but it doesn't make this patch less correct.

> 
> > 
> > If this patch is incorrect, then how does it work on x86, where
> > handle_external_intr() is called (with a barrier in between) before
> > kvm_guest_exit(), and where handle_external_intr() is simply
> > local_irq_enable() on SVM and something more complicated on VMX ?
> > 
> > Finally, how exactly is preemption latency added here?   Won't IRQ
> > processing run with higher priority than any task on your system, so the
> > order of (1) process pending IRQs (2) call schedule if needed is still
> > preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> > of before (1).
> 
> I may be missing something, but on return from interrupt with preempt
> disabled we can't take the need resched path. And need to return
> to KVM no?

preempt_enable() will call __preempt_schedule() and cause preemption
there, so you're talking about adding these lines of latency:

	kvm_guest_exit();
	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

And these were called with interrupts disabled before, so I don't see
the issue??

However, your question is making me think whether we have a race in the
current code on fully preemptible kernels, if we get preempted before
calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
could potentially schedule another vcpu on this core and loose/corrupt
state, can we not?  We probably need to check for this in
kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
real issue or if I'm seeing ghosts.

> > 
> > It is entirely possible that I'm missing the mark here and everything
> > gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> > extra logic?
> 
> I think something to look into for us, taking a low issue to high level
> application - for state machine based type of applications (I guess like
> NFV) it cause problems  to root cause issues, a lot of activities
> run between ticks. For VM cycle based accounting is probably a must
> in that case.
> 
Would you run with NO_HZ_FULL in this case?  Because then we should just
enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
start.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-01 13:37           ` Christian Borntraeger
@ 2015-06-02  9:28             ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02  9:28 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arm-kernel, Marc Zyngier, kvmarm, kvm, Paolo Bonzini

On Mon, Jun 01, 2015 at 03:37:32PM +0200, Christian Borntraeger wrote:
> Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
> > On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
> >> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
> >>
> >>>>>
> >>>>> Second, looking at the ppc and mips code, they seem to also call
> >>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>>>> guest CPU time accounting works on those architectures.
> >>>>
> >>>> Not an expert here, but I assume mips has the same logic as arm so if your
> >>>> patch is right for arm its probably also for mips.
> >>>>
> >>>> powerpc looks similar to what s390 does (not using the tick, instead it uses
> >>>> a hw-timer) so this should be fine.
> >>>>
> >>> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> >>> this for free which would avoid the need for this patch?
> >>
> >> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
> >> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
> >> - yes it might work out. Can you give it a try?
> >>
> > Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
> > no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
> > like a fix since it would be a shame to force users to use this config
> > option to report CPU usage correctly.
> > 
> > I'm not entirely sure what the history and meaning behind these configs
> > are, so maybe there is an entirely different rework needed here.  It
> > seems logical that you could simply sample the counter at entry/exit of
> > the guest, but if there is nowhere to store this data without
> > NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?
> 
> Given Paolos response that irq_disable/enable is faster than save/restore
> at least on x86 your v2 patch might actually be the right thing to do.
> 
Thanks, I think so too, but we should enable
HAVE_VIRT_CPU_ACCOUNTING_GEN for arm64 as well, assuming there are no
mysterious side affects of doing so.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-02  9:28             ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 01, 2015 at 03:37:32PM +0200, Christian Borntraeger wrote:
> Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
> > On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
> >> Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
> >>
> >>>>>
> >>>>> Second, looking at the ppc and mips code, they seem to also call
> >>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>>>> guest CPU time accounting works on those architectures.
> >>>>
> >>>> Not an expert here, but I assume mips has the same logic as arm so if your
> >>>> patch is right for arm its probably also for mips.
> >>>>
> >>>> powerpc looks similar to what s390 does (not using the tick, instead it uses
> >>>> a hw-timer) so this should be fine.
> >>>>
> >>> I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
> >>> this for free which would avoid the need for this patch?
> >>
> >> Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
> >> HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
> >> - yes it might work out. Can you give it a try?
> >>
> > Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
> > no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
> > like a fix since it would be a shame to force users to use this config
> > option to report CPU usage correctly.
> > 
> > I'm not entirely sure what the history and meaning behind these configs
> > are, so maybe there is an entirely different rework needed here.  It
> > seems logical that you could simply sample the counter at entry/exit of
> > the guest, but if there is nowhere to store this data without
> > NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?
> 
> Given Paolos response that irq_disable/enable is faster than save/restore
> at least on x86 your v2 patch might actually be the right thing to do.
> 
Thanks, I think so too, but we should enable
HAVE_VIRT_CPU_ACCOUNTING_GEN for arm64 as well, assuming there are no
mysterious side affects of doing so.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-02  9:27         ` Christoffer Dall
@ 2015-06-02 11:55           ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02 11:55 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

[replying to myself]

On Tue, Jun 02, 2015 at 11:27:59AM +0200, Christoffer Dall wrote:

[..]

> > > 
> > > If this patch is incorrect, then how does it work on x86, where
> > > handle_external_intr() is called (with a barrier in between) before
> > > kvm_guest_exit(), and where handle_external_intr() is simply
> > > local_irq_enable() on SVM and something more complicated on VMX ?
> > > 
> > > Finally, how exactly is preemption latency added here?   Won't IRQ
> > > processing run with higher priority than any task on your system, so the
> > > order of (1) process pending IRQs (2) call schedule if needed is still
> > > preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> > > of before (1).
> > 
> > I may be missing something, but on return from interrupt with preempt
> > disabled we can't take the need resched path. And need to return
> > to KVM no?
> 
> preempt_enable() will call __preempt_schedule() and cause preemption
> there, so you're talking about adding these lines of latency:
> 
> 	kvm_guest_exit();
> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> 
> And these were called with interrupts disabled before, so I don't see
> the issue??
> 
> However, your question is making me think whether we have a race in the
> current code on fully preemptible kernels, if we get preempted before
> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> could potentially schedule another vcpu on this core and loose/corrupt
> state, can we not?  We probably need to check for this in
> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> real issue or if I'm seeing ghosts.
> 
I've thought about it and I don't think there's a race because those
functions don't access the hardware directly, but only manipulate
per-vcpu data structures.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-02 11:55           ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-02 11:55 UTC (permalink / raw)
  To: linux-arm-kernel

[replying to myself]

On Tue, Jun 02, 2015 at 11:27:59AM +0200, Christoffer Dall wrote:

[..]

> > > 
> > > If this patch is incorrect, then how does it work on x86, where
> > > handle_external_intr() is called (with a barrier in between) before
> > > kvm_guest_exit(), and where handle_external_intr() is simply
> > > local_irq_enable() on SVM and something more complicated on VMX ?
> > > 
> > > Finally, how exactly is preemption latency added here?   Won't IRQ
> > > processing run with higher priority than any task on your system, so the
> > > order of (1) process pending IRQs (2) call schedule if needed is still
> > > preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> > > of before (1).
> > 
> > I may be missing something, but on return from interrupt with preempt
> > disabled we can't take the need resched path. And need to return
> > to KVM no?
> 
> preempt_enable() will call __preempt_schedule() and cause preemption
> there, so you're talking about adding these lines of latency:
> 
> 	kvm_guest_exit();
> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> 
> And these were called with interrupts disabled before, so I don't see
> the issue??
> 
> However, your question is making me think whether we have a race in the
> current code on fully preemptible kernels, if we get preempted before
> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> could potentially schedule another vcpu on this core and loose/corrupt
> state, can we not?  We probably need to check for this in
> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> real issue or if I'm seeing ghosts.
> 
I've thought about it and I don't think there's a race because those
functions don't access the hardware directly, but only manipulate
per-vcpu data structures.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-02  9:27         ` Christoffer Dall
@ 2015-06-05 12:24           ` Mario Smarduch
  -1 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-05 12:24 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On 06/02/2015 02:27 AM, Christoffer Dall wrote:
> On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
>> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
>>> Hi Mario,
>>>
>>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>>>> Until now we have been calling kvm_guest_exit after re-enabling
>>>>> interrupts when we come back from the guest, but this has the
>>>>> unfortunate effect that CPU time accounting done in the context of timer
>>>>> interrupts occurring while the guest is running doesn't properly notice
>>>>> that the time since the last tick was spent in the guest.
>>>>>
>>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>>>> kvm_guest_exit(), because we are now calling this function with
>>>>> interrupts enabled.  We have to now explicitly disable preemption and
>>>>> not enable preemption before we've called kvm_guest_exit(), since
>>>>> otherwise we could be preempted and everything happening before we
>>>>> eventually get scheduled again would be accounted for as guest time.
>>>>>
>>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>>>> section, since there is no reason for us to do that with interrupts
>>>>> disabled.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>>>> of this right, there were 2 slightly worrying facts about this:
>>>>>
>>>>> First, we now enable and disable and enable interrupts on each exit
>>>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>>>> only benchmark we care about.
>>>>>
>>>>> Second, looking at the ppc and mips code, they seem to also call
>>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>>>> guest CPU time accounting works on those architectures.
>>>>>
>>>>> Changes since v1:
>>>>>  - Tweak comment and commit text based on Marc's feedback.
>>>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>>>
>>>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>>> index e41cb11..fe8028d 100644
>>>>> --- a/arch/arm/kvm/arm.c
>>>>> +++ b/arch/arm/kvm/arm.c
>>>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>>>  		kvm_timer_flush_hwstate(vcpu);
>>>>>  
>>>>> +		preempt_disable();
>>>>>  		local_irq_disable();
>>>>>  
>>>>>  		/*
>>>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  
>>>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>>>  			local_irq_enable();
>>>>> +			preempt_enable();
>>>>>  			kvm_timer_sync_hwstate(vcpu);
>>>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>>>  			continue;
>>>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>>>  
>>>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>>>> -		__kvm_guest_exit();
>>>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>> +		/*
>>>>> +		 * Back from guest
>>>>> +		 *************************************************************/
>>>>> +
>>>>>  		/*
>>>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>>>  		 * while executing the guest). This interrupt is still
>>>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		local_irq_enable();
>>>>>  
>>>>>  		/*
>>>>> -		 * Back from guest
>>>>> -		 *************************************************************/
>>>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>>>> +		 * that if a timer interrupt hits while running the guest we
>>>>> +		 * account that tick as being spent in the guest.  We enable
>>>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>>>> +		 * preempted we make sure ticks after that is not counted as
>>>>> +		 * guest time.
>>>>> +		 */
>>>>> +		kvm_guest_exit();
>>>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>> +		preempt_enable();
>>>>> +
>>>>>  
>>>>>  		kvm_timer_sync_hwstate(vcpu);
>>>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>>>
>>>>
>>>> Hi Christoffer,
>>>>  so currently we take a snap shot when we enter the guest
>>>> (tsk->vtime_snap) and upon exit add the time we spent in
>>>> the guest and update accrued time, which appears correct.
>>>
>>> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
>>> am I missing something obvious here?
>> I see what you mean we can't use cycle based accounting to accrue
>> Guest time.
>>
> 
> See other thread, we can enable this in the config but it still only
> works with NO_HZ_FULL.
> 
>>>
>>>>
>>>> With this patch it appears that interrupts running
>>>> in host mode are accrued to Guest time, and additional preemption
>>>> latency is added.
>>>>
>>> It is true that interrupt processing in host mode (if they hit on a CPU
>>> when it is running a guest) are accrued to guest time, but without this
>>> patch on current arm64 we accrue no CPU time to guest time at all, which
>>> is hardly more correct.
>> Yes if only ticks are supported then it's definitely better!
>>
>> Nevertheless with high interrupt rate it will complicate root causing
>> issues, a lot of that time will go to guest.
> 
> That sounds like a separate fix to me; if there's an existing mechanism
> to account for time spent on hw IRQs and it is somehow invalidated by
> the PF_VCPU flag being set, then that feels wrong, at least how arm64
> works, but it doesn't make this patch less correct.

Tracing through the code (account_system_time()) it appears if the
timer fires while an IRQ runs, tick are accounted to host IRQ
mode (CPUTIME_IRQ). Otherwise it's accrued to guest. Under heavy
interrupt load it's likely guest will mis some ticks, it appears
it's the reverse of what I initially thought but in practice
guest time should be ok as far as ticks go.

> 
>>
>>>
>>> If this patch is incorrect, then how does it work on x86, where
>>> handle_external_intr() is called (with a barrier in between) before
>>> kvm_guest_exit(), and where handle_external_intr() is simply
>>> local_irq_enable() on SVM and something more complicated on VMX ?
>>>
>>> Finally, how exactly is preemption latency added here?   Won't IRQ
>>> processing run with higher priority than any task on your system, so the
>>> order of (1) process pending IRQs (2) call schedule if needed is still
>>> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
>>> of before (1).
>>
>> I may be missing something, but on return from interrupt with preempt
>> disabled we can't take the need resched path. And need to return
>> to KVM no?
> 
> preempt_enable() will call __preempt_schedule() and cause preemption
> there, so you're talking about adding these lines of latency:t
> 
> 	kvm_guest_exit();
> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

On return from IRQ this should execute - and el1_preempt won't
get called.

#ifdef CONFIG_PREEMPT
        get_thread_info tsk
        ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
        cbnz    w24, 1f                         // preempt count != 0
        ldr     x0, [tsk, #TI_FLAGS]            // get flags
        tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
        bl      el1_preempt
1:
#endif


> 
> And these were called with interrupts disabled before, so I don't see
> the issue??
> 
> However, your question is making me think whether we have a race in the
> current code on fully preemptible kernels, if we get preempted before
> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> could potentially schedule another vcpu on this core and loose/corrupt
> state, can we not?  We probably need to check for this in
> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> real issue or if I'm seeing ghosts.

Yes appears like it could be an issue in PREEMPT mode.
> 
>>>
>>> It is entirely possible that I'm missing the mark here and everything
>>> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
>>> extra logic?
>>
>> I think something to look into for us, taking a low issue to high level
>> application - for state machine based type of applications (I guess like
>> NFV) it cause problems  to root cause issues, a lot of activities
>> run between ticks. For VM cycle based accounting is probably a must
>> in that case.
>>
> Would you run with NO_HZ_FULL in this case?  Because then we should just
> enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
> start.
It may have a use case to run an isolated vCPU, but in general any mode
may be used (,NO_HZ, even low PERIODIC).

- Mario

> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-05 12:24           ` Mario Smarduch
  0 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-05 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/02/2015 02:27 AM, Christoffer Dall wrote:
> On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
>> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
>>> Hi Mario,
>>>
>>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>>>> Until now we have been calling kvm_guest_exit after re-enabling
>>>>> interrupts when we come back from the guest, but this has the
>>>>> unfortunate effect that CPU time accounting done in the context of timer
>>>>> interrupts occurring while the guest is running doesn't properly notice
>>>>> that the time since the last tick was spent in the guest.
>>>>>
>>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>>>> kvm_guest_exit(), because we are now calling this function with
>>>>> interrupts enabled.  We have to now explicitly disable preemption and
>>>>> not enable preemption before we've called kvm_guest_exit(), since
>>>>> otherwise we could be preempted and everything happening before we
>>>>> eventually get scheduled again would be accounted for as guest time.
>>>>>
>>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>>>> section, since there is no reason for us to do that with interrupts
>>>>> disabled.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>>>> of this right, there were 2 slightly worrying facts about this:
>>>>>
>>>>> First, we now enable and disable and enable interrupts on each exit
>>>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>>>> only benchmark we care about.
>>>>>
>>>>> Second, looking at the ppc and mips code, they seem to also call
>>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>>>> guest CPU time accounting works on those architectures.
>>>>>
>>>>> Changes since v1:
>>>>>  - Tweak comment and commit text based on Marc's feedback.
>>>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>>>
>>>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>>> index e41cb11..fe8028d 100644
>>>>> --- a/arch/arm/kvm/arm.c
>>>>> +++ b/arch/arm/kvm/arm.c
>>>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>>>  		kvm_timer_flush_hwstate(vcpu);
>>>>>  
>>>>> +		preempt_disable();
>>>>>  		local_irq_disable();
>>>>>  
>>>>>  		/*
>>>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  
>>>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>>>  			local_irq_enable();
>>>>> +			preempt_enable();
>>>>>  			kvm_timer_sync_hwstate(vcpu);
>>>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>>>  			continue;
>>>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>>>  
>>>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>>>> -		__kvm_guest_exit();
>>>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>> +		/*
>>>>> +		 * Back from guest
>>>>> +		 *************************************************************/
>>>>> +
>>>>>  		/*
>>>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>>>  		 * while executing the guest). This interrupt is still
>>>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  		local_irq_enable();
>>>>>  
>>>>>  		/*
>>>>> -		 * Back from guest
>>>>> -		 *************************************************************/
>>>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>>>> +		 * that if a timer interrupt hits while running the guest we
>>>>> +		 * account that tick as being spent in the guest.  We enable
>>>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>>>> +		 * preempted we make sure ticks after that is not counted as
>>>>> +		 * guest time.
>>>>> +		 */
>>>>> +		kvm_guest_exit();
>>>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>> +		preempt_enable();
>>>>> +
>>>>>  
>>>>>  		kvm_timer_sync_hwstate(vcpu);
>>>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>>>
>>>>
>>>> Hi Christoffer,
>>>>  so currently we take a snap shot when we enter the guest
>>>> (tsk->vtime_snap) and upon exit add the time we spent in
>>>> the guest and update accrued time, which appears correct.
>>>
>>> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
>>> am I missing something obvious here?
>> I see what you mean we can't use cycle based accounting to accrue
>> Guest time.
>>
> 
> See other thread, we can enable this in the config but it still only
> works with NO_HZ_FULL.
> 
>>>
>>>>
>>>> With this patch it appears that interrupts running
>>>> in host mode are accrued to Guest time, and additional preemption
>>>> latency is added.
>>>>
>>> It is true that interrupt processing in host mode (if they hit on a CPU
>>> when it is running a guest) are accrued to guest time, but without this
>>> patch on current arm64 we accrue no CPU time to guest time at all, which
>>> is hardly more correct.
>> Yes if only ticks are supported then it's definitely better!
>>
>> Nevertheless with high interrupt rate it will complicate root causing
>> issues, a lot of that time will go to guest.
> 
> That sounds like a separate fix to me; if there's an existing mechanism
> to account for time spent on hw IRQs and it is somehow invalidated by
> the PF_VCPU flag being set, then that feels wrong, at least how arm64
> works, but it doesn't make this patch less correct.

Tracing through the code (account_system_time()) it appears if the
timer fires while an IRQ runs, tick are accounted to host IRQ
mode (CPUTIME_IRQ). Otherwise it's accrued to guest. Under heavy
interrupt load it's likely guest will mis some ticks, it appears
it's the reverse of what I initially thought but in practice
guest time should be ok as far as ticks go.

> 
>>
>>>
>>> If this patch is incorrect, then how does it work on x86, where
>>> handle_external_intr() is called (with a barrier in between) before
>>> kvm_guest_exit(), and where handle_external_intr() is simply
>>> local_irq_enable() on SVM and something more complicated on VMX ?
>>>
>>> Finally, how exactly is preemption latency added here?   Won't IRQ
>>> processing run with higher priority than any task on your system, so the
>>> order of (1) process pending IRQs (2) call schedule if needed is still
>>> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
>>> of before (1).
>>
>> I may be missing something, but on return from interrupt with preempt
>> disabled we can't take the need resched path. And need to return
>> to KVM no?
> 
> preempt_enable() will call __preempt_schedule() and cause preemption
> there, so you're talking about adding these lines of latency:t
> 
> 	kvm_guest_exit();
> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

On return from IRQ this should execute - and el1_preempt won't
get called.

#ifdef CONFIG_PREEMPT
        get_thread_info tsk
        ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
        cbnz    w24, 1f                         // preempt count != 0
        ldr     x0, [tsk, #TI_FLAGS]            // get flags
        tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
        bl      el1_preempt
1:
#endif


> 
> And these were called with interrupts disabled before, so I don't see
> the issue??
> 
> However, your question is making me think whether we have a race in the
> current code on fully preemptible kernels, if we get preempted before
> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> could potentially schedule another vcpu on this core and loose/corrupt
> state, can we not?  We probably need to check for this in
> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> real issue or if I'm seeing ghosts.

Yes appears like it could be an issue in PREEMPT mode.
> 
>>>
>>> It is entirely possible that I'm missing the mark here and everything
>>> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
>>> extra logic?
>>
>> I think something to look into for us, taking a low issue to high level
>> application - for state machine based type of applications (I guess like
>> NFV) it cause problems  to root cause issues, a lot of activities
>> run between ticks. For VM cycle based accounting is probably a must
>> in that case.
>>
> Would you run with NO_HZ_FULL in this case?  Because then we should just
> enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
> start.
It may have a use case to run an isolated vCPU, but in general any mode
may be used (,NO_HZ, even low PERIODIC).

- Mario

> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-05 12:24           ` Mario Smarduch
@ 2015-06-08 11:35             ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-08 11:35 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
> On 06/02/2015 02:27 AM, Christoffer Dall wrote:
> > On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
> >> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> >>> Hi Mario,
> >>>
> >>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> >>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> >>>>> Until now we have been calling kvm_guest_exit after re-enabling
> >>>>> interrupts when we come back from the guest, but this has the
> >>>>> unfortunate effect that CPU time accounting done in the context of timer
> >>>>> interrupts occurring while the guest is running doesn't properly notice
> >>>>> that the time since the last tick was spent in the guest.
> >>>>>
> >>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> >>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
> >>>>> kvm_guest_exit(), because we are now calling this function with
> >>>>> interrupts enabled.  We have to now explicitly disable preemption and
> >>>>> not enable preemption before we've called kvm_guest_exit(), since
> >>>>> otherwise we could be preempted and everything happening before we
> >>>>> eventually get scheduled again would be accounted for as guest time.
> >>>>>
> >>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
> >>>>> section, since there is no reason for us to do that with interrupts
> >>>>> disabled.
> >>>>>
> >>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>>> ---
> >>>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> >>>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
> >>>>> of this right, there were 2 slightly worrying facts about this:
> >>>>>
> >>>>> First, we now enable and disable and enable interrupts on each exit
> >>>>> path, but I couldn't see any performance overhead on hackbench - yes the
> >>>>> only benchmark we care about.
> >>>>>
> >>>>> Second, looking at the ppc and mips code, they seem to also call
> >>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>>>> guest CPU time accounting works on those architectures.
> >>>>>
> >>>>> Changes since v1:
> >>>>>  - Tweak comment and commit text based on Marc's feedback.
> >>>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> >>>>>
> >>>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >>>>>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>>> index e41cb11..fe8028d 100644
> >>>>> --- a/arch/arm/kvm/arm.c
> >>>>> +++ b/arch/arm/kvm/arm.c
> >>>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		kvm_vgic_flush_hwstate(vcpu);
> >>>>>  		kvm_timer_flush_hwstate(vcpu);
> >>>>>  
> >>>>> +		preempt_disable();
> >>>>>  		local_irq_disable();
> >>>>>  
> >>>>>  		/*
> >>>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  
> >>>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>>>  			local_irq_enable();
> >>>>> +			preempt_enable();
> >>>>>  			kvm_timer_sync_hwstate(vcpu);
> >>>>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>>>  			continue;
> >>>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>>>>  
> >>>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >>>>> -		__kvm_guest_exit();
> >>>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>>> +		/*
> >>>>> +		 * Back from guest
> >>>>> +		 *************************************************************/
> >>>>> +
> >>>>>  		/*
> >>>>>  		 * We may have taken a host interrupt in HYP mode (ie
> >>>>>  		 * while executing the guest). This interrupt is still
> >>>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		local_irq_enable();
> >>>>>  
> >>>>>  		/*
> >>>>> -		 * Back from guest
> >>>>> -		 *************************************************************/
> >>>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> >>>>> +		 * that if a timer interrupt hits while running the guest we
> >>>>> +		 * account that tick as being spent in the guest.  We enable
> >>>>> +		 * preemption after calling kvm_guest_exit() so that if we get
> >>>>> +		 * preempted we make sure ticks after that is not counted as
> >>>>> +		 * guest time.
> >>>>> +		 */
> >>>>> +		kvm_guest_exit();
> >>>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>>> +		preempt_enable();
> >>>>> +
> >>>>>  
> >>>>>  		kvm_timer_sync_hwstate(vcpu);
> >>>>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>>>
> >>>>
> >>>> Hi Christoffer,
> >>>>  so currently we take a snap shot when we enter the guest
> >>>> (tsk->vtime_snap) and upon exit add the time we spent in
> >>>> the guest and update accrued time, which appears correct.
> >>>
> >>> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> >>> am I missing something obvious here?
> >> I see what you mean we can't use cycle based accounting to accrue
> >> Guest time.
> >>
> > 
> > See other thread, we can enable this in the config but it still only
> > works with NO_HZ_FULL.
> > 
> >>>
> >>>>
> >>>> With this patch it appears that interrupts running
> >>>> in host mode are accrued to Guest time, and additional preemption
> >>>> latency is added.
> >>>>
> >>> It is true that interrupt processing in host mode (if they hit on a CPU
> >>> when it is running a guest) are accrued to guest time, but without this
> >>> patch on current arm64 we accrue no CPU time to guest time at all, which
> >>> is hardly more correct.
> >> Yes if only ticks are supported then it's definitely better!
> >>
> >> Nevertheless with high interrupt rate it will complicate root causing
> >> issues, a lot of that time will go to guest.
> > 
> > That sounds like a separate fix to me; if there's an existing mechanism
> > to account for time spent on hw IRQs and it is somehow invalidated by
> > the PF_VCPU flag being set, then that feels wrong, at least how arm64
> > works, but it doesn't make this patch less correct.
> 
> Tracing through the code (account_system_time()) it appears if the
> timer fires while an IRQ runs, tick are accounted to host IRQ
> mode (CPUTIME_IRQ). Otherwise it's accrued to guest. Under heavy
> interrupt load it's likely guest will mis some ticks, it appears
> it's the reverse of what I initially thought but in practice
> guest time should be ok as far as ticks go.
> 
> > 
> >>
> >>>
> >>> If this patch is incorrect, then how does it work on x86, where
> >>> handle_external_intr() is called (with a barrier in between) before
> >>> kvm_guest_exit(), and where handle_external_intr() is simply
> >>> local_irq_enable() on SVM and something more complicated on VMX ?
> >>>
> >>> Finally, how exactly is preemption latency added here?   Won't IRQ
> >>> processing run with higher priority than any task on your system, so the
> >>> order of (1) process pending IRQs (2) call schedule if needed is still
> >>> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> >>> of before (1).
> >>
> >> I may be missing something, but on return from interrupt with preempt
> >> disabled we can't take the need resched path. And need to return
> >> to KVM no?
> > 
> > preempt_enable() will call __preempt_schedule() and cause preemption
> > there, so you're talking about adding these lines of latency:t
> > 
> > 	kvm_guest_exit();
> > 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> 
> On return from IRQ this should execute - and el1_preempt won't
> get called.
> 
> #ifdef CONFIG_PREEMPT
>         get_thread_info tsk
>         ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
>         cbnz    w24, 1f                         // preempt count != 0
>         ldr     x0, [tsk, #TI_FLAGS]            // get flags
>         tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
>         bl      el1_preempt
> 1:
> #endif
> 

I understand that, but then you call preempt_enable right after which
calls __preempt_schedule() which has the same affect as that asm snippet
you pasted here.

> 
> > 
> > And these were called with interrupts disabled before, so I don't see
> > the issue??
> > 
> > However, your question is making me think whether we have a race in the
> > current code on fully preemptible kernels, if we get preempted before
> > calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> > could potentially schedule another vcpu on this core and loose/corrupt
> > state, can we not?  We probably need to check for this in
> > kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> > real issue or if I'm seeing ghosts.
> 
> Yes appears like it could be an issue in PREEMPT mode.

see separate mail, I don't believe this to be an issue anymore.

> > 
> >>>
> >>> It is entirely possible that I'm missing the mark here and everything
> >>> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> >>> extra logic?
> >>
> >> I think something to look into for us, taking a low issue to high level
> >> application - for state machine based type of applications (I guess like
> >> NFV) it cause problems  to root cause issues, a lot of activities
> >> run between ticks. For VM cycle based accounting is probably a must
> >> in that case.
> >>
> > Would you run with NO_HZ_FULL in this case?  Because then we should just
> > enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
> > start.
> It may have a use case to run an isolated vCPU, but in general any mode
> may be used (,NO_HZ, even low PERIODIC).
> 
ok, but I still think it would be more correct to have this patch than
not to.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-08 11:35             ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-08 11:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
> On 06/02/2015 02:27 AM, Christoffer Dall wrote:
> > On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
> >> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
> >>> Hi Mario,
> >>>
> >>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
> >>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
> >>>>> Until now we have been calling kvm_guest_exit after re-enabling
> >>>>> interrupts when we come back from the guest, but this has the
> >>>>> unfortunate effect that CPU time accounting done in the context of timer
> >>>>> interrupts occurring while the guest is running doesn't properly notice
> >>>>> that the time since the last tick was spent in the guest.
> >>>>>
> >>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> >>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
> >>>>> kvm_guest_exit(), because we are now calling this function with
> >>>>> interrupts enabled.  We have to now explicitly disable preemption and
> >>>>> not enable preemption before we've called kvm_guest_exit(), since
> >>>>> otherwise we could be preempted and everything happening before we
> >>>>> eventually get scheduled again would be accounted for as guest time.
> >>>>>
> >>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
> >>>>> section, since there is no reason for us to do that with interrupts
> >>>>> disabled.
> >>>>>
> >>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>>> ---
> >>>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> >>>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
> >>>>> of this right, there were 2 slightly worrying facts about this:
> >>>>>
> >>>>> First, we now enable and disable and enable interrupts on each exit
> >>>>> path, but I couldn't see any performance overhead on hackbench - yes the
> >>>>> only benchmark we care about.
> >>>>>
> >>>>> Second, looking at the ppc and mips code, they seem to also call
> >>>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
> >>>>> guest CPU time accounting works on those architectures.
> >>>>>
> >>>>> Changes since v1:
> >>>>>  - Tweak comment and commit text based on Marc's feedback.
> >>>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> >>>>>
> >>>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >>>>>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>>> index e41cb11..fe8028d 100644
> >>>>> --- a/arch/arm/kvm/arm.c
> >>>>> +++ b/arch/arm/kvm/arm.c
> >>>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		kvm_vgic_flush_hwstate(vcpu);
> >>>>>  		kvm_timer_flush_hwstate(vcpu);
> >>>>>  
> >>>>> +		preempt_disable();
> >>>>>  		local_irq_disable();
> >>>>>  
> >>>>>  		/*
> >>>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  
> >>>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>>>  			local_irq_enable();
> >>>>> +			preempt_enable();
> >>>>>  			kvm_timer_sync_hwstate(vcpu);
> >>>>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>>>  			continue;
> >>>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>>>>  
> >>>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >>>>> -		__kvm_guest_exit();
> >>>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>>> +		/*
> >>>>> +		 * Back from guest
> >>>>> +		 *************************************************************/
> >>>>> +
> >>>>>  		/*
> >>>>>  		 * We may have taken a host interrupt in HYP mode (ie
> >>>>>  		 * while executing the guest). This interrupt is still
> >>>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>>  		local_irq_enable();
> >>>>>  
> >>>>>  		/*
> >>>>> -		 * Back from guest
> >>>>> -		 *************************************************************/
> >>>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> >>>>> +		 * that if a timer interrupt hits while running the guest we
> >>>>> +		 * account that tick as being spent in the guest.  We enable
> >>>>> +		 * preemption after calling kvm_guest_exit() so that if we get
> >>>>> +		 * preempted we make sure ticks after that is not counted as
> >>>>> +		 * guest time.
> >>>>> +		 */
> >>>>> +		kvm_guest_exit();
> >>>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>>> +		preempt_enable();
> >>>>> +
> >>>>>  
> >>>>>  		kvm_timer_sync_hwstate(vcpu);
> >>>>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>>>
> >>>>
> >>>> Hi Christoffer,
> >>>>  so currently we take a snap shot when we enter the guest
> >>>> (tsk->vtime_snap) and upon exit add the time we spent in
> >>>> the guest and update accrued time, which appears correct.
> >>>
> >>> not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
> >>> am I missing something obvious here?
> >> I see what you mean we can't use cycle based accounting to accrue
> >> Guest time.
> >>
> > 
> > See other thread, we can enable this in the config but it still only
> > works with NO_HZ_FULL.
> > 
> >>>
> >>>>
> >>>> With this patch it appears that interrupts running
> >>>> in host mode are accrued to Guest time, and additional preemption
> >>>> latency is added.
> >>>>
> >>> It is true that interrupt processing in host mode (if they hit on a CPU
> >>> when it is running a guest) are accrued to guest time, but without this
> >>> patch on current arm64 we accrue no CPU time to guest time at all, which
> >>> is hardly more correct.
> >> Yes if only ticks are supported then it's definitely better!
> >>
> >> Nevertheless with high interrupt rate it will complicate root causing
> >> issues, a lot of that time will go to guest.
> > 
> > That sounds like a separate fix to me; if there's an existing mechanism
> > to account for time spent on hw IRQs and it is somehow invalidated by
> > the PF_VCPU flag being set, then that feels wrong, at least how arm64
> > works, but it doesn't make this patch less correct.
> 
> Tracing through the code (account_system_time()) it appears if the
> timer fires while an IRQ runs, tick are accounted to host IRQ
> mode (CPUTIME_IRQ). Otherwise it's accrued to guest. Under heavy
> interrupt load it's likely guest will mis some ticks, it appears
> it's the reverse of what I initially thought but in practice
> guest time should be ok as far as ticks go.
> 
> > 
> >>
> >>>
> >>> If this patch is incorrect, then how does it work on x86, where
> >>> handle_external_intr() is called (with a barrier in between) before
> >>> kvm_guest_exit(), and where handle_external_intr() is simply
> >>> local_irq_enable() on SVM and something more complicated on VMX ?
> >>>
> >>> Finally, how exactly is preemption latency added here?   Won't IRQ
> >>> processing run with higher priority than any task on your system, so the
> >>> order of (1) process pending IRQs (2) call schedule if needed is still
> >>> preserved here, but we call kvm_guest_exit() between (1) and (2) instead
> >>> of before (1).
> >>
> >> I may be missing something, but on return from interrupt with preempt
> >> disabled we can't take the need resched path. And need to return
> >> to KVM no?
> > 
> > preempt_enable() will call __preempt_schedule() and cause preemption
> > there, so you're talking about adding these lines of latency:t
> > 
> > 	kvm_guest_exit();
> > 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> 
> On return from IRQ this should execute - and el1_preempt won't
> get called.
> 
> #ifdef CONFIG_PREEMPT
>         get_thread_info tsk
>         ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
>         cbnz    w24, 1f                         // preempt count != 0
>         ldr     x0, [tsk, #TI_FLAGS]            // get flags
>         tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
>         bl      el1_preempt
> 1:
> #endif
> 

I understand that, but then you call preempt_enable right after which
calls __preempt_schedule() which has the same affect as that asm snippet
you pasted here.

> 
> > 
> > And these were called with interrupts disabled before, so I don't see
> > the issue??
> > 
> > However, your question is making me think whether we have a race in the
> > current code on fully preemptible kernels, if we get preempted before
> > calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
> > could potentially schedule another vcpu on this core and loose/corrupt
> > state, can we not?  We probably need to check for this in
> > kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
> > real issue or if I'm seeing ghosts.
> 
> Yes appears like it could be an issue in PREEMPT mode.

see separate mail, I don't believe this to be an issue anymore.

> > 
> >>>
> >>> It is entirely possible that I'm missing the mark here and everything
> >>> gets solved by enabling HAVE_VIRT_CPU_ACCOUNTING_GEN or we need some
> >>> extra logic?
> >>
> >> I think something to look into for us, taking a low issue to high level
> >> application - for state machine based type of applications (I guess like
> >> NFV) it cause problems  to root cause issues, a lot of activities
> >> run between ticks. For VM cycle based accounting is probably a must
> >> in that case.
> >>
> > Would you run with NO_HZ_FULL in this case?  Because then we should just
> > enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
> > start.
> It may have a use case to run an isolated vCPU, but in general any mode
> may be used (,NO_HZ, even low PERIODIC).
> 
ok, but I still think it would be more correct to have this patch than
not to.

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-05-28 18:49 ` Christoffer Dall
@ 2015-06-08 17:50   ` Marc Zyngier
  -1 siblings, 0 replies; 42+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:50 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, kvm, linux-arm-kernel
  Cc: borntraeger, Paolo Bonzini

Hi Christoffer,

On 28/05/15 19:49, Christoffer Dall wrote:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.
> 
> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.
> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.
> 
> Changes since v1:
>  - Tweak comment and commit text based on Marc's feedback.
>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> 
>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e41cb11..fe8028d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		preempt_disable();
>  		local_irq_disable();
>  
>  		/*
> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			continue;
> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -		__kvm_guest_exit();
> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		local_irq_enable();
>  
>  		/*
> -		 * Back from guest
> -		 *************************************************************/
> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> +		 * that if a timer interrupt hits while running the guest we
> +		 * account that tick as being spent in the guest.  We enable
> +		 * preemption after calling kvm_guest_exit() so that if we get
> +		 * preempted we make sure ticks after that is not counted as
> +		 * guest time.
> +		 */
> +		kvm_guest_exit();
> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		preempt_enable();
> +
>  
>  		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
> 

I've been thinking about this a bit more, and I wonder if we can
simplify it a bit. At the moment, we disable the interrupts around the
HYP entry. But now that you have introduced preempt_disable, it looks
like we move the local_irq_disable call to be just after
__kvm_guest_enter, and not bother with having such a long critical section.

This is possible because entering HYP mode automatically masks
interrupts, and we restore PSTATE on exception return.

I think this would slightly reduce the amount of code we run on the host
that gets accounted to the guest.

Thoughts?

Thanks,

	M.

-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-08 17:50   ` Marc Zyngier
  0 siblings, 0 replies; 42+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 28/05/15 19:49, Christoffer Dall wrote:
> Until now we have been calling kvm_guest_exit after re-enabling
> interrupts when we come back from the guest, but this has the
> unfortunate effect that CPU time accounting done in the context of timer
> interrupts occurring while the guest is running doesn't properly notice
> that the time since the last tick was spent in the guest.
> 
> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> below the local_irq_enable() call and change __kvm_guest_exit() to
> kvm_guest_exit(), because we are now calling this function with
> interrupts enabled.  We have to now explicitly disable preemption and
> not enable preemption before we've called kvm_guest_exit(), since
> otherwise we could be preempted and everything happening before we
> eventually get scheduled again would be accounted for as guest time.
> 
> At the same time, move the trace_kvm_exit() call outside of the atomic
> section, since there is no reason for us to do that with interrupts
> disabled.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> rework recently posted by Christian Borntraeger.  I hope I got the logic
> of this right, there were 2 slightly worrying facts about this:
> 
> First, we now enable and disable and enable interrupts on each exit
> path, but I couldn't see any performance overhead on hackbench - yes the
> only benchmark we care about.
> 
> Second, looking at the ppc and mips code, they seem to also call
> kvm_guest_exit() before enabling interrupts, so I don't understand how
> guest CPU time accounting works on those architectures.
> 
> Changes since v1:
>  - Tweak comment and commit text based on Marc's feedback.
>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> 
>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e41cb11..fe8028d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		preempt_disable();
>  		local_irq_disable();
>  
>  		/*
> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			continue;
> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -		__kvm_guest_exit();
> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		local_irq_enable();
>  
>  		/*
> -		 * Back from guest
> -		 *************************************************************/
> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> +		 * that if a timer interrupt hits while running the guest we
> +		 * account that tick as being spent in the guest.  We enable
> +		 * preemption after calling kvm_guest_exit() so that if we get
> +		 * preempted we make sure ticks after that is not counted as
> +		 * guest time.
> +		 */
> +		kvm_guest_exit();
> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> +		preempt_enable();
> +
>  
>  		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
> 

I've been thinking about this a bit more, and I wonder if we can
simplify it a bit. At the moment, we disable the interrupts around the
HYP entry. But now that you have introduced preempt_disable, it looks
like we move the local_irq_disable call to be just after
__kvm_guest_enter, and not bother with having such a long critical section.

This is possible because entering HYP mode automatically masks
interrupts, and we restore PSTATE on exception return.

I think this would slightly reduce the amount of code we run on the host
that gets accounted to the guest.

Thoughts?

Thanks,

	M.

-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-08 17:50   ` Marc Zyngier
@ 2015-06-09 14:43     ` Christoffer Dall
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-09 14:43 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, kvm, linux-arm-kernel, borntraeger, Paolo Bonzini

On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
> Hi Christoffer,
> 
> On 28/05/15 19:49, Christoffer Dall wrote:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> > 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> > 
> > Changes since v1:
> >  - Tweak comment and commit text based on Marc's feedback.
> >  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> > 
> >  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index e41cb11..fe8028d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_vgic_flush_hwstate(vcpu);
> >  		kvm_timer_flush_hwstate(vcpu);
> >  
> > +		preempt_disable();
> >  		local_irq_disable();
> >  
> >  		/*
> > @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			preempt_enable();
> >  			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			continue;
> > @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> > -		__kvm_guest_exit();
> > -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		/*
> > +		 * Back from guest
> > +		 *************************************************************/
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		local_irq_enable();
> >  
> >  		/*
> > -		 * Back from guest
> > -		 *************************************************************/
> > +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> > +		 * that if a timer interrupt hits while running the guest we
> > +		 * account that tick as being spent in the guest.  We enable
> > +		 * preemption after calling kvm_guest_exit() so that if we get
> > +		 * preempted we make sure ticks after that is not counted as
> > +		 * guest time.
> > +		 */
> > +		kvm_guest_exit();
> > +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		preempt_enable();
> > +
> >  
> >  		kvm_timer_sync_hwstate(vcpu);
> >  		kvm_vgic_sync_hwstate(vcpu);
> > 
> 
> I've been thinking about this a bit more, and I wonder if we can
> simplify it a bit. At the moment, we disable the interrupts around the
> HYP entry. But now that you have introduced preempt_disable, it looks
> like we move the local_irq_disable call to be just after
> __kvm_guest_enter, and not bother with having such a long critical section.
> 
> This is possible because entering HYP mode automatically masks
> interrupts, and we restore PSTATE on exception return.
> 
> I think this would slightly reduce the amount of code we run on the host
> that gets accounted to the guest.
> 
> Thoughts?
> 
Isn't there a situation then where the guest can get stuck because we
don't properly check for signal handling?

As I recall, we have the lcoal_irq_disable() call before checking
signal_pending(), so that if, for example, another thread sends a signal
to that VCPU we either:
 (1) handle the IPI and see we have a signal pending, so we abort, or
 (2) don't handle the IPI because IRQs are disabled, enter the guest but
     soon as we run the guest the interrupt hits, we go back, see the
     signal and exit.

There was something like this which caused a guest to get stuck with
userspace timers on v7.

Am I making sense at all?

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-09 14:43     ` Christoffer Dall
  0 siblings, 0 replies; 42+ messages in thread
From: Christoffer Dall @ 2015-06-09 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
> Hi Christoffer,
> 
> On 28/05/15 19:49, Christoffer Dall wrote:
> > Until now we have been calling kvm_guest_exit after re-enabling
> > interrupts when we come back from the guest, but this has the
> > unfortunate effect that CPU time accounting done in the context of timer
> > interrupts occurring while the guest is running doesn't properly notice
> > that the time since the last tick was spent in the guest.
> > 
> > Inspired by the comment in the x86 code, move the kvm_guest_exit() call
> > below the local_irq_enable() call and change __kvm_guest_exit() to
> > kvm_guest_exit(), because we are now calling this function with
> > interrupts enabled.  We have to now explicitly disable preemption and
> > not enable preemption before we've called kvm_guest_exit(), since
> > otherwise we could be preempted and everything happening before we
> > eventually get scheduled again would be accounted for as guest time.
> > 
> > At the same time, move the trace_kvm_exit() call outside of the atomic
> > section, since there is no reason for us to do that with interrupts
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> > This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
> > rework recently posted by Christian Borntraeger.  I hope I got the logic
> > of this right, there were 2 slightly worrying facts about this:
> > 
> > First, we now enable and disable and enable interrupts on each exit
> > path, but I couldn't see any performance overhead on hackbench - yes the
> > only benchmark we care about.
> > 
> > Second, looking at the ppc and mips code, they seem to also call
> > kvm_guest_exit() before enabling interrupts, so I don't understand how
> > guest CPU time accounting works on those architectures.
> > 
> > Changes since v1:
> >  - Tweak comment and commit text based on Marc's feedback.
> >  - Explicitly disable preemption and enable it only after kvm_guest_exit().
> > 
> >  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index e41cb11..fe8028d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_vgic_flush_hwstate(vcpu);
> >  		kvm_timer_flush_hwstate(vcpu);
> >  
> > +		preempt_disable();
> >  		local_irq_disable();
> >  
> >  		/*
> > @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			preempt_enable();
> >  			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			continue;
> > @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> > -		__kvm_guest_exit();
> > -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		/*
> > +		 * Back from guest
> > +		 *************************************************************/
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		local_irq_enable();
> >  
> >  		/*
> > -		 * Back from guest
> > -		 *************************************************************/
> > +		 * We do local_irq_enable() before calling kvm_guest_exit() so
> > +		 * that if a timer interrupt hits while running the guest we
> > +		 * account that tick as being spent in the guest.  We enable
> > +		 * preemption after calling kvm_guest_exit() so that if we get
> > +		 * preempted we make sure ticks after that is not counted as
> > +		 * guest time.
> > +		 */
> > +		kvm_guest_exit();
> > +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> > +		preempt_enable();
> > +
> >  
> >  		kvm_timer_sync_hwstate(vcpu);
> >  		kvm_vgic_sync_hwstate(vcpu);
> > 
> 
> I've been thinking about this a bit more, and I wonder if we can
> simplify it a bit. At the moment, we disable the interrupts around the
> HYP entry. But now that you have introduced preempt_disable, it looks
> like we move the local_irq_disable call to be just after
> __kvm_guest_enter, and not bother with having such a long critical section.
> 
> This is possible because entering HYP mode automatically masks
> interrupts, and we restore PSTATE on exception return.
> 
> I think this would slightly reduce the amount of code we run on the host
> that gets accounted to the guest.
> 
> Thoughts?
> 
Isn't there a situation then where the guest can get stuck because we
don't properly check for signal handling?

As I recall, we have the lcoal_irq_disable() call before checking
signal_pending(), so that if, for example, another thread sends a signal
to that VCPU we either:
 (1) handle the IPI and see we have a signal pending, so we abort, or
 (2) don't handle the IPI because IRQs are disabled, enter the guest but
     soon as we run the guest the interrupt hits, we go back, see the
     signal and exit.

There was something like this which caused a guest to get stuck with
userspace timers on v7.

Am I making sense at all?

-Christoffer

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-09 14:43     ` Christoffer Dall
@ 2015-06-09 16:39       ` Marc Zyngier
  -1 siblings, 0 replies; 42+ messages in thread
From: Marc Zyngier @ 2015-06-09 16:39 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, kvm, linux-arm-kernel, borntraeger, Paolo Bonzini

On 09/06/15 15:43, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
>> Hi Christoffer,
>>
>> On 28/05/15 19:49, Christoffer Dall wrote:
>>> Until now we have been calling kvm_guest_exit after re-enabling
>>> interrupts when we come back from the guest, but this has the
>>> unfortunate effect that CPU time accounting done in the context of timer
>>> interrupts occurring while the guest is running doesn't properly notice
>>> that the time since the last tick was spent in the guest.
>>>
>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>> kvm_guest_exit(), because we are now calling this function with
>>> interrupts enabled.  We have to now explicitly disable preemption and
>>> not enable preemption before we've called kvm_guest_exit(), since
>>> otherwise we could be preempted and everything happening before we
>>> eventually get scheduled again would be accounted for as guest time.
>>>
>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>> section, since there is no reason for us to do that with interrupts
>>> disabled.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>> of this right, there were 2 slightly worrying facts about this:
>>>
>>> First, we now enable and disable and enable interrupts on each exit
>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>> only benchmark we care about.
>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>>
>>> Changes since v1:
>>>  - Tweak comment and commit text based on Marc's feedback.
>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>
>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index e41cb11..fe8028d 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>  		kvm_timer_flush_hwstate(vcpu);
>>>  
>>> +		preempt_disable();
>>>  		local_irq_disable();
>>>  
>>>  		/*
>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			preempt_enable();
>>>  			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			continue;
>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>  
>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>> -		__kvm_guest_exit();
>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		/*
>>> +		 * Back from guest
>>> +		 *************************************************************/
>>> +
>>>  		/*
>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>  		 * while executing the guest). This interrupt is still
>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		local_irq_enable();
>>>  
>>>  		/*
>>> -		 * Back from guest
>>> -		 *************************************************************/
>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>> +		 * that if a timer interrupt hits while running the guest we
>>> +		 * account that tick as being spent in the guest.  We enable
>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>> +		 * preempted we make sure ticks after that is not counted as
>>> +		 * guest time.
>>> +		 */
>>> +		kvm_guest_exit();
>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		preempt_enable();
>>> +
>>>  
>>>  		kvm_timer_sync_hwstate(vcpu);
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>
>>
>> I've been thinking about this a bit more, and I wonder if we can
>> simplify it a bit. At the moment, we disable the interrupts around the
>> HYP entry. But now that you have introduced preempt_disable, it looks
>> like we move the local_irq_disable call to be just after
>> __kvm_guest_enter, and not bother with having such a long critical section.
>>
>> This is possible because entering HYP mode automatically masks
>> interrupts, and we restore PSTATE on exception return.
>>
>> I think this would slightly reduce the amount of code we run on the host
>> that gets accounted to the guest.
>>
>> Thoughts?
>>
> Isn't there a situation then where the guest can get stuck because we
> don't properly check for signal handling?
> 
> As I recall, we have the lcoal_irq_disable() call before checking
> signal_pending(), so that if, for example, another thread sends a signal
> to that VCPU we either:
>  (1) handle the IPI and see we have a signal pending, so we abort, or
>  (2) don't handle the IPI because IRQs are disabled, enter the guest but
>      soon as we run the guest the interrupt hits, we go back, see the
>      signal and exit.
> 
> There was something like this which caused a guest to get stuck with
> userspace timers on v7.
> 
> Am I making sense at all?

Yup, I missed that crucial detail. Applying to queue.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-09 16:39       ` Marc Zyngier
  0 siblings, 0 replies; 42+ messages in thread
From: Marc Zyngier @ 2015-06-09 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/06/15 15:43, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
>> Hi Christoffer,
>>
>> On 28/05/15 19:49, Christoffer Dall wrote:
>>> Until now we have been calling kvm_guest_exit after re-enabling
>>> interrupts when we come back from the guest, but this has the
>>> unfortunate effect that CPU time accounting done in the context of timer
>>> interrupts occurring while the guest is running doesn't properly notice
>>> that the time since the last tick was spent in the guest.
>>>
>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>> kvm_guest_exit(), because we are now calling this function with
>>> interrupts enabled.  We have to now explicitly disable preemption and
>>> not enable preemption before we've called kvm_guest_exit(), since
>>> otherwise we could be preempted and everything happening before we
>>> eventually get scheduled again would be accounted for as guest time.
>>>
>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>> section, since there is no reason for us to do that with interrupts
>>> disabled.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>> This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
>>> rework recently posted by Christian Borntraeger.  I hope I got the logic
>>> of this right, there were 2 slightly worrying facts about this:
>>>
>>> First, we now enable and disable and enable interrupts on each exit
>>> path, but I couldn't see any performance overhead on hackbench - yes the
>>> only benchmark we care about.
>>>
>>> Second, looking at the ppc and mips code, they seem to also call
>>> kvm_guest_exit() before enabling interrupts, so I don't understand how
>>> guest CPU time accounting works on those architectures.
>>>
>>> Changes since v1:
>>>  - Tweak comment and commit text based on Marc's feedback.
>>>  - Explicitly disable preemption and enable it only after kvm_guest_exit().
>>>
>>>  arch/arm/kvm/arm.c | 21 +++++++++++++++++----
>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index e41cb11..fe8028d 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_vgic_flush_hwstate(vcpu);
>>>  		kvm_timer_flush_hwstate(vcpu);
>>>  
>>> +		preempt_disable();
>>>  		local_irq_disable();
>>>  
>>>  		/*
>>> @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			preempt_enable();
>>>  			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			continue;
>>> @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>  
>>>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>>> -		__kvm_guest_exit();
>>> -		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		/*
>>> +		 * Back from guest
>>> +		 *************************************************************/
>>> +
>>>  		/*
>>>  		 * We may have taken a host interrupt in HYP mode (ie
>>>  		 * while executing the guest). This interrupt is still
>>> @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		local_irq_enable();
>>>  
>>>  		/*
>>> -		 * Back from guest
>>> -		 *************************************************************/
>>> +		 * We do local_irq_enable() before calling kvm_guest_exit() so
>>> +		 * that if a timer interrupt hits while running the guest we
>>> +		 * account that tick as being spent in the guest.  We enable
>>> +		 * preemption after calling kvm_guest_exit() so that if we get
>>> +		 * preempted we make sure ticks after that is not counted as
>>> +		 * guest time.
>>> +		 */
>>> +		kvm_guest_exit();
>>> +		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>> +		preempt_enable();
>>> +
>>>  
>>>  		kvm_timer_sync_hwstate(vcpu);
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>
>>
>> I've been thinking about this a bit more, and I wonder if we can
>> simplify it a bit. At the moment, we disable the interrupts around the
>> HYP entry. But now that you have introduced preempt_disable, it looks
>> like we move the local_irq_disable call to be just after
>> __kvm_guest_enter, and not bother with having such a long critical section.
>>
>> This is possible because entering HYP mode automatically masks
>> interrupts, and we restore PSTATE on exception return.
>>
>> I think this would slightly reduce the amount of code we run on the host
>> that gets accounted to the guest.
>>
>> Thoughts?
>>
> Isn't there a situation then where the guest can get stuck because we
> don't properly check for signal handling?
> 
> As I recall, we have the lcoal_irq_disable() call before checking
> signal_pending(), so that if, for example, another thread sends a signal
> to that VCPU we either:
>  (1) handle the IPI and see we have a signal pending, so we abort, or
>  (2) don't handle the IPI because IRQs are disabled, enter the guest but
>      soon as we run the guest the interrupt hits, we go back, see the
>      signal and exit.
> 
> There was something like this which caused a guest to get stuck with
> userspace timers on v7.
> 
> Am I making sense at all?

Yup, I missed that crucial detail. Applying to queue.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
  2015-06-08 11:35             ` Christoffer Dall
@ 2015-06-09 23:04               ` Mario Smarduch
  -1 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-09 23:04 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Marc Zyngier, borntraeger, Paolo Bonzini, kvmarm, linux-arm-kernel

On 06/08/2015 04:35 AM, Christoffer Dall wrote:
> On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
>> On 06/02/2015 02:27 AM, Christoffer Dall wrote:
>>> On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
>>>> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
>>>>> Hi Mario,
>>>>>
>>>>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>>>>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>>>>>> Until now we have been calling kvm_guest_exit after re-enabling
>>>>>>> interrupts when we come back from the guest, but this has the
>>>>>>> unfortunate effect that CPU time accounting done in the context of timer
>>>>>>> interrupts occurring while the guest is running doesn't properly notice
>>>>>>> that the time since the last tick was spent in the guest.
>>>>>>>
>>>>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>>>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>>>>>> kvm_guest_exit(), because we are now calling this function with
>>>>>>> interrupts enabled.  We have to now explicitly disable preemption and
>>>>>>> not enable preemption before we've called kvm_guest_exit(), since
>>>>>>> otherwise we could be preempted and everything happening before we
>>>>>>> eventually get scheduled again would be accounted for as guest time.
>>>>>>>
>>>>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>>>>>> section, since there is no reason for us to do that with interrupts
>>>>>>> disabled.
>>>>>>>
>>>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> ---
[ ... ]
>>>
>>> preempt_enable() will call __preempt_schedule() and cause preemption
>>> there, so you're talking about adding these lines of latency:t
>>>
>>> 	kvm_guest_exit();
>>> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>
>> On return from IRQ this should execute - and el1_preempt won't
>> get called.
>>
>> #ifdef CONFIG_PREEMPT
>>         get_thread_info tsk
>>         ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
>>         cbnz    w24, 1f                         // preempt count != 0
>>         ldr     x0, [tsk, #TI_FLAGS]            // get flags
>>         tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
>>         bl      el1_preempt
>> 1:
>> #endif
>>
> 
> I understand that, but then you call preempt_enable right after which
> calls __preempt_schedule() which has the same affect as that asm snippet
> you pasted here.
> 
>>
>>>
>>> And these were called with interrupts disabled before, so I don't see
>>> the issue??
>>>
>>> However, your question is making me think whether we have a race in the
>>> current code on fully preemptible kernels, if we get preempted before
>>> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
>>> could potentially schedule another vcpu on this core and loose/corrupt
>>> state, can we not?  We probably need to check for this in
>>> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
>>> real issue or if I'm seeing ghosts.
>>
>> Yes appears like it could be an issue in PREEMPT mode.
> 
> see separate mail, I don't believe this to be an issue anymore.
> 
>>>
>>>>>
[ ... ]
>>>>
>>> Would you run with NO_HZ_FULL in this case?  Because then we should just
>>> enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
>>> start.
>> It may have a use case to run an isolated vCPU, but in general any mode
>> may be used (,NO_HZ, even low PERIODIC).
>>
> ok, but I still think it would be more correct to have this patch than
> not to.
No daubt, it exposes an important missing feature and fixes
'Guest time' which mostly should be accurate or close to it.

But there may be room for some future work in this
area (like NATIVE accounting with guest time support),

- Mario

> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
@ 2015-06-09 23:04               ` Mario Smarduch
  0 siblings, 0 replies; 42+ messages in thread
From: Mario Smarduch @ 2015-06-09 23:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2015 04:35 AM, Christoffer Dall wrote:
> On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
>> On 06/02/2015 02:27 AM, Christoffer Dall wrote:
>>> On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
>>>> On 05/30/2015 11:59 PM, Christoffer Dall wrote:
>>>>> Hi Mario,
>>>>>
>>>>> On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
>>>>>> On 05/28/2015 11:49 AM, Christoffer Dall wrote:
>>>>>>> Until now we have been calling kvm_guest_exit after re-enabling
>>>>>>> interrupts when we come back from the guest, but this has the
>>>>>>> unfortunate effect that CPU time accounting done in the context of timer
>>>>>>> interrupts occurring while the guest is running doesn't properly notice
>>>>>>> that the time since the last tick was spent in the guest.
>>>>>>>
>>>>>>> Inspired by the comment in the x86 code, move the kvm_guest_exit() call
>>>>>>> below the local_irq_enable() call and change __kvm_guest_exit() to
>>>>>>> kvm_guest_exit(), because we are now calling this function with
>>>>>>> interrupts enabled.  We have to now explicitly disable preemption and
>>>>>>> not enable preemption before we've called kvm_guest_exit(), since
>>>>>>> otherwise we could be preempted and everything happening before we
>>>>>>> eventually get scheduled again would be accounted for as guest time.
>>>>>>>
>>>>>>> At the same time, move the trace_kvm_exit() call outside of the atomic
>>>>>>> section, since there is no reason for us to do that with interrupts
>>>>>>> disabled.
>>>>>>>
>>>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> ---
[ ... ]
>>>
>>> preempt_enable() will call __preempt_schedule() and cause preemption
>>> there, so you're talking about adding these lines of latency:t
>>>
>>> 	kvm_guest_exit();
>>> 	trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>
>> On return from IRQ this should execute - and el1_preempt won't
>> get called.
>>
>> #ifdef CONFIG_PREEMPT
>>         get_thread_info tsk
>>         ldr     w24, [tsk, #TI_PREEMPT]         // get preempt count
>>         cbnz    w24, 1f                         // preempt count != 0
>>         ldr     x0, [tsk, #TI_FLAGS]            // get flags
>>         tbz     x0, #TIF_NEED_RESCHED, 1f       // needs rescheduling?
>>         bl      el1_preempt
>> 1:
>> #endif
>>
> 
> I understand that, but then you call preempt_enable right after which
> calls __preempt_schedule() which has the same affect as that asm snippet
> you pasted here.
> 
>>
>>>
>>> And these were called with interrupts disabled before, so I don't see
>>> the issue??
>>>
>>> However, your question is making me think whether we have a race in the
>>> current code on fully preemptible kernels, if we get preempted before
>>> calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
>>> could potentially schedule another vcpu on this core and loose/corrupt
>>> state, can we not?  We probably need to check for this in
>>> kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
>>> real issue or if I'm seeing ghosts.
>>
>> Yes appears like it could be an issue in PREEMPT mode.
> 
> see separate mail, I don't believe this to be an issue anymore.
> 
>>>
>>>>>
[ ... ]
>>>>
>>> Would you run with NO_HZ_FULL in this case?  Because then we should just
>>> enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
>>> start.
>> It may have a use case to run an isolated vCPU, but in general any mode
>> may be used (,NO_HZ, even low PERIODIC).
>>
> ok, but I still think it would be more correct to have this patch than
> not to.
No daubt, it exposes an important missing feature and fixes
'Guest time' which mostly should be accurate or close to it.

But there may be room for some future work in this
area (like NATIVE accounting with guest time support),

- Mario

> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-06-09 23:04 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-28 18:49 [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time Christoffer Dall
2015-05-28 18:49 ` Christoffer Dall
2015-05-29 22:34 ` Mario Smarduch
2015-05-29 22:34   ` Mario Smarduch
2015-05-31  6:59   ` Christoffer Dall
2015-05-31  6:59     ` Christoffer Dall
2015-06-01 15:48     ` Mario Smarduch
2015-06-01 15:48       ` Mario Smarduch
2015-06-02  9:27       ` Christoffer Dall
2015-06-02  9:27         ` Christoffer Dall
2015-06-02 11:55         ` Christoffer Dall
2015-06-02 11:55           ` Christoffer Dall
2015-06-05 12:24         ` Mario Smarduch
2015-06-05 12:24           ` Mario Smarduch
2015-06-08 11:35           ` Christoffer Dall
2015-06-08 11:35             ` Christoffer Dall
2015-06-09 23:04             ` Mario Smarduch
2015-06-09 23:04               ` Mario Smarduch
2015-06-01  7:47 ` Christian Borntraeger
2015-06-01  7:47   ` Christian Borntraeger
2015-06-01  9:08   ` Christoffer Dall
2015-06-01  9:08     ` Christoffer Dall
2015-06-01  9:21     ` Christian Borntraeger
2015-06-01  9:21       ` Christian Borntraeger
2015-06-01 13:35       ` Christoffer Dall
2015-06-01 13:35         ` Christoffer Dall
2015-06-01 13:37         ` Christian Borntraeger
2015-06-01 13:37           ` Christian Borntraeger
2015-06-02  9:28           ` Christoffer Dall
2015-06-02  9:28             ` Christoffer Dall
2015-06-01 11:34   ` Paolo Bonzini
2015-06-01 11:34     ` Paolo Bonzini
2015-06-01 11:42     ` Christian Borntraeger
2015-06-01 11:42       ` Christian Borntraeger
2015-06-01 11:52       ` Paolo Bonzini
2015-06-01 11:52         ` Paolo Bonzini
2015-06-08 17:50 ` Marc Zyngier
2015-06-08 17:50   ` Marc Zyngier
2015-06-09 14:43   ` Christoffer Dall
2015-06-09 14:43     ` Christoffer Dall
2015-06-09 16:39     ` Marc Zyngier
2015-06-09 16:39       ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.