All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kvm: x86: make lapic hrtimer pinned
@ 2016-04-04 20:46 Luiz Capitulino
  2016-04-04 21:00 ` Rik van Riel
  2016-04-05 10:05 ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Luiz Capitulino @ 2016-04-04 20:46 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, mtosatti, riel, bsd

When a vCPU runs on a nohz_full core, the hrtimer used by
the lapic emulation code can be migrated to another core.
When this happens, it's possible to observe milisecond
latency when delivering timer IRQs to KVM guests.

The huge latency is mainly due to the fact that
apic_timer_fn() expects to run during a kvm exit. It
sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
entry. However, if the timer fires on a different core,
we have to wait until the next kvm exit for the guest
to see KVM_REQ_PENDING_TIMER set.

This problem became visible after commit 9642d18ee. This
commit changed the timer migration code to always attempt
to migrate timers away from nohz_full cores. While it's
discussable if this is correct/desirable (I don't think
it is), it's clear that the lapic emulation code has
a requirement on firing the hrtimer in the same core
where it was started. This is achieved by making the
hrtimer pinned.

Lastly, note that KVM has code to migrate timers when a
vCPU is scheduled to run in different core. However, this
forced migration may fail. When this happens, we can have
the same problem. If we want 100% correctness, we'll have
to modify apic_timer_fn() to cause a kvm exit when it runs
on a different core than the vCPU. Not sure if this is
possible.

Here's a reproducer for the issue being fixed:

 1. Set all cores but core0 to be nohz_full cores
 2. Start a guest with a single vCPU
 3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()

You'll see that apic_timer_fn() will run in core0 while
kvm_inject_apic_timer_irqs() runs in a different core. If
you get both on core0, try running a program that takes 100%
of the CPU and pin it to core0 to force the vCPU out.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 arch/x86/kvm/lapic.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 443d2a5..1a2da0e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1369,7 +1369,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
 
 		hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
-			      HRTIMER_MODE_ABS);
+			      HRTIMER_MODE_ABS_PINNED);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1402,7 +1402,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
 			expire = ktime_add_ns(now, ns);
 			expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
 			hrtimer_start(&apic->lapic_timer.timer,
-				      expire, HRTIMER_MODE_ABS);
+				      expire, HRTIMER_MODE_ABS_PINNED);
 		} else
 			apic_timer_expired(apic);
 
@@ -1868,7 +1868,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 	apic->vcpu = vcpu;
 
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
-		     HRTIMER_MODE_ABS);
+		     HRTIMER_MODE_ABS_PINNED);
 	apic->lapic_timer.timer.function = apic_timer_fn;
 
 	/*
@@ -2003,7 +2003,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
 }
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-04 20:46 [PATCH] kvm: x86: make lapic hrtimer pinned Luiz Capitulino
@ 2016-04-04 21:00 ` Rik van Riel
  2016-04-05  6:18   ` Yang Zhang
  2016-04-05 10:05 ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Rik van Riel @ 2016-04-04 21:00 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, pbonzini, rkrcmar, mtosatti, bsd

[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]

On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
> When a vCPU runs on a nohz_full core, the hrtimer used by
> the lapic emulation code can be migrated to another core.
> When this happens, it's possible to observe milisecond
> latency when delivering timer IRQs to KVM guests.
> 
> The huge latency is mainly due to the fact that
> apic_timer_fn() expects to run during a kvm exit. It
> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
> entry. However, if the timer fires on a different core,
> we have to wait until the next kvm exit for the guest
> to see KVM_REQ_PENDING_TIMER set.
> 
> This problem became visible after commit 9642d18ee. This
> commit changed the timer migration code to always attempt
> to migrate timers away from nohz_full cores. While it's
> discussable if this is correct/desirable (I don't think
> it is), it's clear that the lapic emulation code has
> a requirement on firing the hrtimer in the same core
> where it was started. This is achieved by making the
> hrtimer pinned.

Given that delivering a timer to a guest seems to
involve trapping from the guest to the host, anyway,
I don't see a downside to your patch.

If that is ever changed (eg. allowing delivery of
a timer interrupt to a VCPU without trapping to the
host), we may want to revisit this.

Until then...

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-04 21:00 ` Rik van Riel
@ 2016-04-05  6:18   ` Yang Zhang
  2016-04-05 12:40     ` Luiz Capitulino
  2016-04-05 15:54     ` Radim Krčmář
  0 siblings, 2 replies; 10+ messages in thread
From: Yang Zhang @ 2016-04-05  6:18 UTC (permalink / raw)
  To: Rik van Riel, Luiz Capitulino, kvm
  Cc: linux-kernel, pbonzini, rkrcmar, mtosatti, bsd

On 2016/4/5 5:00, Rik van Riel wrote:
> On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
>> When a vCPU runs on a nohz_full core, the hrtimer used by
>> the lapic emulation code can be migrated to another core.
>> When this happens, it's possible to observe milisecond
>> latency when delivering timer IRQs to KVM guests.
>>
>> The huge latency is mainly due to the fact that
>> apic_timer_fn() expects to run during a kvm exit. It
>> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
>> entry. However, if the timer fires on a different core,
>> we have to wait until the next kvm exit for the guest
>> to see KVM_REQ_PENDING_TIMER set.
>>
>> This problem became visible after commit 9642d18ee. This
>> commit changed the timer migration code to always attempt
>> to migrate timers away from nohz_full cores. While it's
>> discussable if this is correct/desirable (I don't think
>> it is), it's clear that the lapic emulation code has
>> a requirement on firing the hrtimer in the same core
>> where it was started. This is achieved by making the
>> hrtimer pinned.
>
> Given that delivering a timer to a guest seems to
> involve trapping from the guest to the host, anyway,
> I don't see a downside to your patch.
>
> If that is ever changed (eg. allowing delivery of
> a timer interrupt to a VCPU without trapping to the
> host), we may want to revisit this.


Posted interrupt helps in this case. Currently, KVM doesn't use PI for 
lapic timer is due to same affinity for lapic timer and VCPU. Now, we 
can change to use PI for lapic timer. The only concern is what's 
frequency of timer migration in upstream Linux? If it is frequently, 
will it bring additional cost?

BTW, in what case the migration of timers during VCPU scheduling will fail?

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-04 20:46 [PATCH] kvm: x86: make lapic hrtimer pinned Luiz Capitulino
  2016-04-04 21:00 ` Rik van Riel
@ 2016-04-05 10:05 ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-04-05 10:05 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, mtosatti, riel, bsd

On 04/04/2016 22:46, Luiz Capitulino wrote:
> When a vCPU runs on a nohz_full core, the hrtimer used by
> the lapic emulation code can be migrated to another core.
> When this happens, it's possible to observe milisecond
> latency when delivering timer IRQs to KVM guests.
> 
> The huge latency is mainly due to the fact that
> apic_timer_fn() expects to run during a kvm exit. It
> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
> entry. However, if the timer fires on a different core,
> we have to wait until the next kvm exit for the guest
> to see KVM_REQ_PENDING_TIMER set.
> 
> This problem became visible after commit 9642d18ee. This
> commit changed the timer migration code to always attempt
> to migrate timers away from nohz_full cores. While it's
> discussable if this is correct/desirable (I don't think
> it is), it's clear that the lapic emulation code has
> a requirement on firing the hrtimer in the same core
> where it was started. This is achieved by making the
> hrtimer pinned.
> 
> Lastly, note that KVM has code to migrate timers when a
> vCPU is scheduled to run in different core. However, this
> forced migration may fail. When this happens, we can have
> the same problem. If we want 100% correctness, we'll have
> to modify apic_timer_fn() to cause a kvm exit when it runs
> on a different core than the vCPU. Not sure if this is
> possible.
> 
> Here's a reproducer for the issue being fixed:
> 
>  1. Set all cores but core0 to be nohz_full cores
>  2. Start a guest with a single vCPU
>  3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()
> 
> You'll see that apic_timer_fn() will run in core0 while
> kvm_inject_apic_timer_irqs() runs in a different core. If
> you get both on core0, try running a program that takes 100%
> of the CPU and pin it to core0 to force the vCPU out.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 443d2a5..1a2da0e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1369,7 +1369,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  
>  		hrtimer_start(&apic->lapic_timer.timer,
>  			      ktime_add_ns(now, apic->lapic_timer.period),
> -			      HRTIMER_MODE_ABS);
> +			      HRTIMER_MODE_ABS_PINNED);
>  
>  		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
>  			   PRIx64 ", "
> @@ -1402,7 +1402,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  			expire = ktime_add_ns(now, ns);
>  			expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
>  			hrtimer_start(&apic->lapic_timer.timer,
> -				      expire, HRTIMER_MODE_ABS);
> +				      expire, HRTIMER_MODE_ABS_PINNED);
>  		} else
>  			apic_timer_expired(apic);
>  
> @@ -1868,7 +1868,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
>  	apic->vcpu = vcpu;
>  
>  	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
> -		     HRTIMER_MODE_ABS);
> +		     HRTIMER_MODE_ABS_PINNED);
>  	apic->lapic_timer.timer.function = apic_timer_fn;
>  
>  	/*
> @@ -2003,7 +2003,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
>  
>  	timer = &vcpu->arch.apic->lapic_timer.timer;
>  	if (hrtimer_cancel(timer))
> -		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> +		hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
>  }
>  
>  /*
> 

Queued for 4.6.0-rc3, thanks.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-05  6:18   ` Yang Zhang
@ 2016-04-05 12:40     ` Luiz Capitulino
  2016-04-21 23:12       ` Wanpeng Li
  2016-04-05 15:54     ` Radim Krčmář
  1 sibling, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2016-04-05 12:40 UTC (permalink / raw)
  To: Yang Zhang
  Cc: Rik van Riel, kvm, linux-kernel, pbonzini, rkrcmar, mtosatti, bsd

On Tue, 5 Apr 2016 14:18:01 +0800
Yang Zhang <yang.zhang.wz@gmail.com> wrote:

> On 2016/4/5 5:00, Rik van Riel wrote:
> > On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
> >> When a vCPU runs on a nohz_full core, the hrtimer used by
> >> the lapic emulation code can be migrated to another core.
> >> When this happens, it's possible to observe milisecond
> >> latency when delivering timer IRQs to KVM guests.
> >>
> >> The huge latency is mainly due to the fact that
> >> apic_timer_fn() expects to run during a kvm exit. It
> >> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
> >> entry. However, if the timer fires on a different core,
> >> we have to wait until the next kvm exit for the guest
> >> to see KVM_REQ_PENDING_TIMER set.
> >>
> >> This problem became visible after commit 9642d18ee. This
> >> commit changed the timer migration code to always attempt
> >> to migrate timers away from nohz_full cores. While it's
> >> discussable if this is correct/desirable (I don't think
> >> it is), it's clear that the lapic emulation code has
> >> a requirement on firing the hrtimer in the same core
> >> where it was started. This is achieved by making the
> >> hrtimer pinned.
> >
> > Given that delivering a timer to a guest seems to
> > involve trapping from the guest to the host, anyway,
> > I don't see a downside to your patch.
> >
> > If that is ever changed (eg. allowing delivery of
> > a timer interrupt to a VCPU without trapping to the
> > host), we may want to revisit this.
> 
> 
> Posted interrupt helps in this case. Currently, KVM doesn't use PI for 
> lapic timer is due to same affinity for lapic timer and VCPU. Now, we 
> can change to use PI for lapic timer. The only concern is what's 
> frequency of timer migration in upstream Linux? If it is frequently, 
> will it bring additional cost?

I can't answer this questions.

> BTW, in what case the migration of timers during VCPU scheduling will fail?

For hrtimers (which is the lapic emulation case), it only succeeds if
the destination core has a hrtimer expiring before the hrtimer being
migrated.

Also, if the hrtimer callback function is already running (that is,
the timer fired already) it's not migrated either. But I _guess_ this
case doesn't affect KVM (and there's no much do about it anyways).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-05  6:18   ` Yang Zhang
  2016-04-05 12:40     ` Luiz Capitulino
@ 2016-04-05 15:54     ` Radim Krčmář
  2016-04-07  2:08       ` Yang Zhang
  1 sibling, 1 reply; 10+ messages in thread
From: Radim Krčmář @ 2016-04-05 15:54 UTC (permalink / raw)
  To: Yang Zhang
  Cc: Rik van Riel, Luiz Capitulino, kvm, linux-kernel, pbonzini,
	mtosatti, bsd

2016-04-05 14:18+0800, Yang Zhang:
> On 2016/4/5 5:00, Rik van Riel wrote:
>>Given that delivering a timer to a guest seems to
>>involve trapping from the guest to the host, anyway,
>>I don't see a downside to your patch.
>>
>>If that is ever changed (eg. allowing delivery of
>>a timer interrupt to a VCPU without trapping to the
>>host), we may want to revisit this.
> 
> Posted interrupt helps in this case. Currently, KVM doesn't use PI for lapic
> timer is due to same affinity for lapic timer and VCPU. Now, we can change
> to use PI for lapic timer. The only concern is what's frequency of timer
> migration in upstream Linux? If it is frequently, will it bring additional
> cost?

It's a scheduler bug if the timer migration frequency would matter. :)
Additional costs arise when the timer and VCPU are on two different
CPUs.  (e.g. if both CPUs are in deep C-state, we wasted one wakeup;
the timer would sometimes needs to send an interrupt.)

Fine tuned KVM could benefit from having the lapic timer backend on a
different physical core, but the general case would need some experience
to decide.

I think that we'd still want to have timer interrupts on the same
physical core if the host didn't have PI, and the fraction of timers
that can be injected without a guest entry is important to decide
whether PI can make the effort worthwhile.

The biggest benefit might come from handling multiple lapic timers in
one host interrupt.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-05 15:54     ` Radim Krčmář
@ 2016-04-07  2:08       ` Yang Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Zhang @ 2016-04-07  2:08 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Rik van Riel, Luiz Capitulino, kvm, linux-kernel, pbonzini,
	mtosatti, bsd

On 2016/4/5 23:54, Radim Krčmář wrote:
> 2016-04-05 14:18+0800, Yang Zhang:
>> On 2016/4/5 5:00, Rik van Riel wrote:
>>> Given that delivering a timer to a guest seems to
>>> involve trapping from the guest to the host, anyway,
>>> I don't see a downside to your patch.
>>>
>>> If that is ever changed (eg. allowing delivery of
>>> a timer interrupt to a VCPU without trapping to the
>>> host), we may want to revisit this.
>>
>> Posted interrupt helps in this case. Currently, KVM doesn't use PI for lapic
>> timer is due to same affinity for lapic timer and VCPU. Now, we can change
>> to use PI for lapic timer. The only concern is what's frequency of timer
>> migration in upstream Linux? If it is frequently, will it bring additional
>> cost?
>
> It's a scheduler bug if the timer migration frequency would matter. :)
> Additional costs arise when the timer and VCPU are on two different
> CPUs.  (e.g. if both CPUs are in deep C-state, we wasted one wakeup;
> the timer would sometimes needs to send an interrupt.)

Yes, it's possible. But the premise is VCPU is pinned to other CPU. 
Normally, the VCPU will wake up on the same CPU where timer interrupt is 
stay if CPU is idle.

>
> Fine tuned KVM could benefit from having the lapic timer backend on a
> different physical core, but the general case would need some experience
> to decide.
>
> I think that we'd still want to have timer interrupts on the same
> physical core if the host didn't have PI, and the fraction of timers
> that can be injected without a guest entry is important to decide
> whether PI can make the effort worthwhile.

Agree. I can do some experiences to see how much improvement we can get.

>
> The biggest benefit might come from handling multiple lapic timers in
> one host interrupt.

This is should be another story.We need to align multiple lapic timers 
into one timer firstly.:)

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-05 12:40     ` Luiz Capitulino
@ 2016-04-21 23:12       ` Wanpeng Li
  2016-04-22 13:12         ` Luiz Capitulino
  0 siblings, 1 reply; 10+ messages in thread
From: Wanpeng Li @ 2016-04-21 23:12 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Yang Zhang, Rik van Riel, kvm, linux-kernel, Paolo Bonzini,
	Radim Krcmar, Marcelo Tosatti, Bandan Das

2016-04-05 20:40 GMT+08:00 Luiz Capitulino <lcapitulino@redhat.com>:
> On Tue, 5 Apr 2016 14:18:01 +0800
> Yang Zhang <yang.zhang.wz@gmail.com> wrote:
>
>> On 2016/4/5 5:00, Rik van Riel wrote:
>> > On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
>> >> When a vCPU runs on a nohz_full core, the hrtimer used by
>> >> the lapic emulation code can be migrated to another core.
>> >> When this happens, it's possible to observe milisecond
>> >> latency when delivering timer IRQs to KVM guests.
>> >>
>> >> The huge latency is mainly due to the fact that
>> >> apic_timer_fn() expects to run during a kvm exit. It
>> >> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
>> >> entry. However, if the timer fires on a different core,
>> >> we have to wait until the next kvm exit for the guest
>> >> to see KVM_REQ_PENDING_TIMER set.
>> >>
>> >> This problem became visible after commit 9642d18ee. This
>> >> commit changed the timer migration code to always attempt
>> >> to migrate timers away from nohz_full cores. While it's
>> >> discussable if this is correct/desirable (I don't think
>> >> it is), it's clear that the lapic emulation code has
>> >> a requirement on firing the hrtimer in the same core
>> >> where it was started. This is achieved by making the
>> >> hrtimer pinned.
>> >
>> > Given that delivering a timer to a guest seems to
>> > involve trapping from the guest to the host, anyway,
>> > I don't see a downside to your patch.
>> >
>> > If that is ever changed (eg. allowing delivery of
>> > a timer interrupt to a VCPU without trapping to the
>> > host), we may want to revisit this.
>>
>>
>> Posted interrupt helps in this case. Currently, KVM doesn't use PI for
>> lapic timer is due to same affinity for lapic timer and VCPU. Now, we
>> can change to use PI for lapic timer. The only concern is what's
>> frequency of timer migration in upstream Linux? If it is frequently,
>> will it bring additional cost?
>
> I can't answer this questions.
>
>> BTW, in what case the migration of timers during VCPU scheduling will fail?
>
> For hrtimers (which is the lapic emulation case), it only succeeds if
> the destination core has a hrtimer expiring before the hrtimer being
> migrated.

Interesting, did you figure out why this happen? Actually the clock
event device will be reprogrammed if the expire time of the new
enqueued hrtimer is earlier than the left most(earliest expire time)
hrtimer in hrtimer rb tree.

Regards,
Wanpeng Li

>
> Also, if the hrtimer callback function is already running (that is,
> the timer fired already) it's not migrated either. But I _guess_ this
> case doesn't affect KVM (and there's no much do about it anyways).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-21 23:12       ` Wanpeng Li
@ 2016-04-22 13:12         ` Luiz Capitulino
  2016-04-23 23:06           ` Wanpeng Li
  0 siblings, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2016-04-22 13:12 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Yang Zhang, Rik van Riel, kvm, linux-kernel, Paolo Bonzini,
	Radim Krcmar, Marcelo Tosatti, Bandan Das

On Fri, 22 Apr 2016 07:12:51 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:

> 2016-04-05 20:40 GMT+08:00 Luiz Capitulino <lcapitulino@redhat.com>:
> > On Tue, 5 Apr 2016 14:18:01 +0800
> > Yang Zhang <yang.zhang.wz@gmail.com> wrote:
> >  
> >> On 2016/4/5 5:00, Rik van Riel wrote:  
> >> > On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:  
> >> >> When a vCPU runs on a nohz_full core, the hrtimer used by
> >> >> the lapic emulation code can be migrated to another core.
> >> >> When this happens, it's possible to observe milisecond
> >> >> latency when delivering timer IRQs to KVM guests.
> >> >>
> >> >> The huge latency is mainly due to the fact that
> >> >> apic_timer_fn() expects to run during a kvm exit. It
> >> >> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
> >> >> entry. However, if the timer fires on a different core,
> >> >> we have to wait until the next kvm exit for the guest
> >> >> to see KVM_REQ_PENDING_TIMER set.
> >> >>
> >> >> This problem became visible after commit 9642d18ee. This
> >> >> commit changed the timer migration code to always attempt
> >> >> to migrate timers away from nohz_full cores. While it's
> >> >> discussable if this is correct/desirable (I don't think
> >> >> it is), it's clear that the lapic emulation code has
> >> >> a requirement on firing the hrtimer in the same core
> >> >> where it was started. This is achieved by making the
> >> >> hrtimer pinned.  
> >> >
> >> > Given that delivering a timer to a guest seems to
> >> > involve trapping from the guest to the host, anyway,
> >> > I don't see a downside to your patch.
> >> >
> >> > If that is ever changed (eg. allowing delivery of
> >> > a timer interrupt to a VCPU without trapping to the
> >> > host), we may want to revisit this.  
> >>
> >>
> >> Posted interrupt helps in this case. Currently, KVM doesn't use PI for
> >> lapic timer is due to same affinity for lapic timer and VCPU. Now, we
> >> can change to use PI for lapic timer. The only concern is what's
> >> frequency of timer migration in upstream Linux? If it is frequently,
> >> will it bring additional cost?  
> >
> > I can't answer this questions.
> >  
> >> BTW, in what case the migration of timers during VCPU scheduling will fail?  
> >
> > For hrtimers (which is the lapic emulation case), it only succeeds if
> > the destination core has a hrtimer expiring before the hrtimer being
> > migrated.  
> 
> Interesting, did you figure out why this happen? Actually the clock
> event device will be reprogrammed if the expire time of the new
> enqueued hrtimer is earlier than the left most(earliest expire time)
> hrtimer in hrtimer rb tree.

Unless the code has changed very recently, what you describe is
what happens when queueing a hrtimer in the same core. Migrating a
hrtimer to a different core is a different case.

> 
> Regards,
> Wanpeng Li
> 
> >
> > Also, if the hrtimer callback function is already running (that is,
> > the timer fired already) it's not migrated either. But I _guess_ this
> > case doesn't affect KVM (and there's no much do about it anyways).  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kvm: x86: make lapic hrtimer pinned
  2016-04-22 13:12         ` Luiz Capitulino
@ 2016-04-23 23:06           ` Wanpeng Li
  0 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2016-04-23 23:06 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Yang Zhang, Rik van Riel, kvm, linux-kernel, Paolo Bonzini,
	Radim Krcmar, Marcelo Tosatti, Bandan Das

2016-04-22 21:12 GMT+08:00 Luiz Capitulino <lcapitulino@redhat.com>:
> On Fri, 22 Apr 2016 07:12:51 +0800
> Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 2016-04-05 20:40 GMT+08:00 Luiz Capitulino <lcapitulino@redhat.com>:
>> > On Tue, 5 Apr 2016 14:18:01 +0800
>> > Yang Zhang <yang.zhang.wz@gmail.com> wrote:
>> >
>> >> On 2016/4/5 5:00, Rik van Riel wrote:
>> >> > On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
>> >> >> When a vCPU runs on a nohz_full core, the hrtimer used by
>> >> >> the lapic emulation code can be migrated to another core.
>> >> >> When this happens, it's possible to observe milisecond
>> >> >> latency when delivering timer IRQs to KVM guests.
>> >> >>
>> >> >> The huge latency is mainly due to the fact that
>> >> >> apic_timer_fn() expects to run during a kvm exit. It
>> >> >> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
>> >> >> entry. However, if the timer fires on a different core,
>> >> >> we have to wait until the next kvm exit for the guest
>> >> >> to see KVM_REQ_PENDING_TIMER set.
>> >> >>
>> >> >> This problem became visible after commit 9642d18ee. This
>> >> >> commit changed the timer migration code to always attempt
>> >> >> to migrate timers away from nohz_full cores. While it's
>> >> >> discussable if this is correct/desirable (I don't think
>> >> >> it is), it's clear that the lapic emulation code has
>> >> >> a requirement on firing the hrtimer in the same core
>> >> >> where it was started. This is achieved by making the
>> >> >> hrtimer pinned.
>> >> >
>> >> > Given that delivering a timer to a guest seems to
>> >> > involve trapping from the guest to the host, anyway,
>> >> > I don't see a downside to your patch.
>> >> >
>> >> > If that is ever changed (eg. allowing delivery of
>> >> > a timer interrupt to a VCPU without trapping to the
>> >> > host), we may want to revisit this.
>> >>
>> >>
>> >> Posted interrupt helps in this case. Currently, KVM doesn't use PI for
>> >> lapic timer is due to same affinity for lapic timer and VCPU. Now, we
>> >> can change to use PI for lapic timer. The only concern is what's
>> >> frequency of timer migration in upstream Linux? If it is frequently,
>> >> will it bring additional cost?
>> >
>> > I can't answer this questions.
>> >
>> >> BTW, in what case the migration of timers during VCPU scheduling will fail?
>> >
>> > For hrtimers (which is the lapic emulation case), it only succeeds if
>> > the destination core has a hrtimer expiring before the hrtimer being
>> > migrated.
>>
>> Interesting, did you figure out why this happen? Actually the clock
>> event device will be reprogrammed if the expire time of the new
>> enqueued hrtimer is earlier than the left most(earliest expire time)
>> hrtimer in hrtimer rb tree.
>
> Unless the code has changed very recently, what you describe is
> what happens when queueing a hrtimer in the same core. Migrating a
> hrtimer to a different core is a different case.

You are right!

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-23 23:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-04 20:46 [PATCH] kvm: x86: make lapic hrtimer pinned Luiz Capitulino
2016-04-04 21:00 ` Rik van Riel
2016-04-05  6:18   ` Yang Zhang
2016-04-05 12:40     ` Luiz Capitulino
2016-04-21 23:12       ` Wanpeng Li
2016-04-22 13:12         ` Luiz Capitulino
2016-04-23 23:06           ` Wanpeng Li
2016-04-05 15:54     ` Radim Krčmář
2016-04-07  2:08       ` Yang Zhang
2016-04-05 10:05 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.