All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix almost infinite loop in APIC
@ 2009-01-08 16:36 Alexander Graf
  2009-01-09  6:34 ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2009-01-08 16:36 UTC (permalink / raw)
  To: kvm; +Cc: avi, Kevin Wolf

While booting Linux in VMware ESX, I encountered a strange effect
in the in-kernel lapic implementation: time went backwards!

While this should never occur, because of that the while loop that
is done after the invalid calculations caused my host system to hang.

In order to make debugging easier, let's replace this as suggested
with a modulo function and not run into the danger of looping forever.

To replace the nice hint this bug gave me that the values are broken,
I added a printk message so people encountering this can at least
see that something is fishy.

Of course, the real issue needs to be fixed as well! I'm open to ideas
why now < last_update!

(Thanks to Kevin for his help in debugging this)

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@suse.de>
---
 arch/x86/kvm/lapic.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..737217b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -539,19 +539,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 	counter_passed = div64_u64(ktime_to_ns(passed),
 				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
 
+	if (counter_passed > 0x7f00000000000000) {
+		/* If we're in here we probably encountered a bug! */
+		printk(KERN_INFO "LAPIC: Too high counter_passed value: "
+				 "0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n",
+				 counter_passed, ktime_to_ns(passed), passed,
+				 now.tv64, apic->timer.last_update.tv64);
+	}
+
 	if (counter_passed > tmcct) {
 		if (unlikely(!apic_lvtt_period(apic))) {
 			/* one-shot timers stick at 0 until reset */
 			tmcct = 0;
 		} else {
-			/*
-			 * periodic timers reset to APIC_TMICT when they
-			 * hit 0. The while loop simulates this happening N
-			 * times. (counter_passed %= tmcct) would also work,
-			 * but might be slower or not work on 32-bit??
-			 */
-			while (counter_passed > tmcct)
-				counter_passed -= tmcct;
+			counter_passed %= (u64)tmcct;
 			tmcct -= counter_passed;
 		}
 	} else {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-08 16:36 [PATCH] Fix almost infinite loop in APIC Alexander Graf
@ 2009-01-09  6:34 ` Sheng Yang
  2009-01-09 10:49   ` Alexander Graf
  2009-01-09 12:57   ` Alexander Graf
  0 siblings, 2 replies; 30+ messages in thread
From: Sheng Yang @ 2009-01-09  6:34 UTC (permalink / raw)
  To: kvm; +Cc: Alexander Graf, avi, Kevin Wolf, Marcelo Tosatti

On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> While booting Linux in VMware ESX, I encountered a strange effect
> in the in-kernel lapic implementation: time went backwards!
>
> While this should never occur, because of that the while loop that
> is done after the invalid calculations caused my host system to hang.
>
> In order to make debugging easier, let's replace this as suggested
> with a modulo function and not run into the danger of looping forever.
>
> To replace the nice hint this bug gave me that the values are broken,
> I added a printk message so people encountering this can at least
> see that something is fishy.
>
> Of course, the real issue needs to be fixed as well! I'm open to ideas
> why now < last_update!
>
> (Thanks to Kevin for his help in debugging this)
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> Signed-off-by: Kevin Wolf <kwolf@suse.de>
> ---

Hi Alexander

I'm a little suspect here:

> 	if (unlikely(ktime_to_ns(now) <=
> 		ktime_to_ns(apic->timer.last_update))) {
> 		/* Wrap around */
> 		passed = ktime_add(( {
> 				    (ktime_t) {
> 				    .tv64 = KTIME_MAX -
> 				    (apic->timer.last_update).tv64}; }
> 				   ), now);
> 		apic_debug("time elapsed\n");
> 	} else
> 		passed = ktime_sub(now, apic->timer.last_update);

And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
really ktime_get() which is almost impossible to wrap around. If it's 
overflow, at least we need a warning. I think this piece of code due to clock 
source change.

So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
big wrap around operation.

And the most suspect:

> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> {
> 	struct kvm_lapic *apic = vcpu->arch.apic;
>
> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> 		apic->timer.last_update = ktime_add_ns(
> 				apic->timer.last_update,
> 				apic->timer.period);
> }

Not sure what's happening, have you tried this? (In fact, I am little willing 
to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
as in pit.)

--
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..414e7e0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1115,9 +1115,7 @@ void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int 
vec)
        struct kvm_lapic *apic = vcpu->arch.apic;

        if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
-               apic->timer.last_update = ktime_add_ns(
-                               apic->timer.last_update,
-                               apic->timer.period);
+               apic->timer.last_update = apic->timer.dev.base->get_time();
 }

 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)


-- 
regards
Yang, Sheng



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-09  6:34 ` Sheng Yang
@ 2009-01-09 10:49   ` Alexander Graf
  2009-01-09 12:57   ` Alexander Graf
  1 sibling, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2009-01-09 10:49 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm, avi, Kevin Wolf, Marcelo Tosatti





On 09.01.2009, at 07:34, Sheng Yang <sheng@linux.intel.com> wrote:

> On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
>> While booting Linux in VMware ESX, I encountered a strange effect
>> in the in-kernel lapic implementation: time went backwards!
>>
>> While this should never occur, because of that the while loop that
>> is done after the invalid calculations caused my host system to hang.
>>
>> In order to make debugging easier, let's replace this as suggested
>> with a modulo function and not run into the danger of looping  
>> forever.
>>
>> To replace the nice hint this bug gave me that the values are broken,
>> I added a printk message so people encountering this can at least
>> see that something is fishy.
>>
>> Of course, the real issue needs to be fixed as well! I'm open to  
>> ideas
>> why now < last_update!
>>
>> (Thanks to Kevin for his help in debugging this)
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Kevin Wolf <kwolf@suse.de>
>> ---
>
> Hi Alexander
>
> I'm a little suspect here:
>
>>    if (unlikely(ktime_to_ns(now) <=
>>        ktime_to_ns(apic->timer.last_update))) {
>>        /* Wrap around */
>>        passed = ktime_add(( {
>>                    (ktime_t) {
>>                    .tv64 = KTIME_MAX -
>>                    (apic->timer.last_update).tv64}; }
>>                   ), now);
>>        apic_debug("time elapsed\n");
>>    } else
>>        passed = ktime_sub(now, apic->timer.last_update);
>
> And now apic timer base is hr_timer with CLOCK_MONOTONIC, and  
> get_time() is
> really ktime_get() which is almost impossible to wrap around. If it's
> overflow, at least we need a warning. I think this piece of code due  
> to clock
> source change.
>
> So I doubt: due to some reason, now <= apic->timer.last_update,  
> which cause a
> big wrap around operation.

Well, I put a printk in that 'time elapsed' place and it happened at  
the same time my other printk fired with now < last_update. I can't  
explain why this is happening, but I do see it.

>
>
> And the most suspect:
>
>> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
>> {
>>    struct kvm_lapic *apic = vcpu->arch.apic;
>>
>>    if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
>>        apic->timer.last_update = ktime_add_ns(
>>                apic->timer.last_update,
>>                apic->timer.period);
>> }
>
> Not sure what's happening, have you tried this? (In fact, I am  
> little willing
> to replace all apic->timer.dev.base->get_time() with more explicit  
> ktime_get()
> as in pit.)


I can give that patch a try :)

Alex

>
>
> --
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..414e7e0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1115,9 +1115,7 @@ void kvm_apic_timer_intr_post(struct kvm_vcpu  
> *vcpu, int
> vec)
>        struct kvm_lapic *apic = vcpu->arch.apic;
>
>        if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> -               apic->timer.last_update = ktime_add_ns(
> -                               apic->timer.last_update,
> -                               apic->timer.period);
> +               apic->timer.last_update = apic->timer.dev.base- 
> >get_time();
> }
>
> int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>
>
> -- 
> regards
> Yang, Sheng
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-09  6:34 ` Sheng Yang
  2009-01-09 10:49   ` Alexander Graf
@ 2009-01-09 12:57   ` Alexander Graf
  2009-01-10 11:21     ` Sheng Yang
                       ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Alexander Graf @ 2009-01-09 12:57 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm, avi, Kevin Wolf, Marcelo Tosatti

Sheng Yang wrote:
> On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
>   
>> While booting Linux in VMware ESX, I encountered a strange effect
>> in the in-kernel lapic implementation: time went backwards!
>>
>> While this should never occur, because of that the while loop that
>> is done after the invalid calculations caused my host system to hang.
>>
>> In order to make debugging easier, let's replace this as suggested
>> with a modulo function and not run into the danger of looping forever.
>>
>> To replace the nice hint this bug gave me that the values are broken,
>> I added a printk message so people encountering this can at least
>> see that something is fishy.
>>
>> Of course, the real issue needs to be fixed as well! I'm open to ideas
>> why now < last_update!
>>
>> (Thanks to Kevin for his help in debugging this)
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Kevin Wolf <kwolf@suse.de>
>> ---
>>     
>
> Hi Alexander
>
> I'm a little suspect here:
>
>   
>> 	if (unlikely(ktime_to_ns(now) <=
>> 		ktime_to_ns(apic->timer.last_update))) {
>> 		/* Wrap around */
>> 		passed = ktime_add(( {
>> 				    (ktime_t) {
>> 				    .tv64 = KTIME_MAX -
>> 				    (apic->timer.last_update).tv64}; }
>> 				   ), now);
>> 		apic_debug("time elapsed\n");
>> 	} else
>> 		passed = ktime_sub(now, apic->timer.last_update);
>>     
>
> And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
> really ktime_get() which is almost impossible to wrap around. If it's 
> overflow, at least we need a warning. I think this piece of code due to clock 
> source change.
>
> So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
> big wrap around operation.
>
> And the most suspect:
>
>   
>> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
>> {
>> 	struct kvm_lapic *apic = vcpu->arch.apic;
>>
>> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
>> 		apic->timer.last_update = ktime_add_ns(
>> 				apic->timer.last_update,
>> 				apic->timer.period);
>> }
>>     
>
> Not sure what's happening, have you tried this? (In fact, I am little willing 
> to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
> as in pit.)
>   

Yes, this code is the culprit. Using that patch of yours now is always >
last_update. I'd still like to see that while loop go away ;-).

Alex

> --
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..414e7e0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1115,9 +1115,7 @@ void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int 
> vec)
>         struct kvm_lapic *apic = vcpu->arch.apic;
>
>         if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> -               apic->timer.last_update = ktime_add_ns(
> -                               apic->timer.last_update,
> -                               apic->timer.period);
> +               apic->timer.last_update = apic->timer.dev.base->get_time();
>  }
>
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>
>
>   


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-09 12:57   ` Alexander Graf
@ 2009-01-10 11:21     ` Sheng Yang
  2009-01-11  4:55       ` Marcelo Tosatti
  2009-01-10 11:25     ` Sheng Yang
  2009-01-10 11:28     ` Sheng Yang
  2 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-10 11:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Sheng Yang, kvm, avi, Kevin Wolf, Marcelo Tosatti

On Fri, Jan 09, 2009 at 01:57:30PM +0100, Alexander Graf wrote:
> Sheng Yang wrote:
> > On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> >   
> >> While booting Linux in VMware ESX, I encountered a strange effect
> >> in the in-kernel lapic implementation: time went backwards!
> >>
> >> While this should never occur, because of that the while loop that
> >> is done after the invalid calculations caused my host system to hang.
> >>
> >> In order to make debugging easier, let's replace this as suggested
> >> with a modulo function and not run into the danger of looping forever.
> >>
> >> To replace the nice hint this bug gave me that the values are broken,
> >> I added a printk message so people encountering this can at least
> >> see that something is fishy.
> >>
> >> Of course, the real issue needs to be fixed as well! I'm open to ideas
> >> why now < last_update!
> >>
> >> (Thanks to Kevin for his help in debugging this)
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> Signed-off-by: Kevin Wolf <kwolf@suse.de>
> >> ---
> >>     
> >
> > Hi Alexander
> >
> > I'm a little suspect here:
> >
> >   
> >> 	if (unlikely(ktime_to_ns(now) <=
> >> 		ktime_to_ns(apic->timer.last_update))) {
> >> 		/* Wrap around */
> >> 		passed = ktime_add(( {
> >> 				    (ktime_t) {
> >> 				    .tv64 = KTIME_MAX -
> >> 				    (apic->timer.last_update).tv64}; }
> >> 				   ), now);
> >> 		apic_debug("time elapsed\n");
> >> 	} else
> >> 		passed = ktime_sub(now, apic->timer.last_update);
> >>     
> >
> > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
> > really ktime_get() which is almost impossible to wrap around. If it's 
> > overflow, at least we need a warning. I think this piece of code due to clock 
> > source change.
> >
> > So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
> > big wrap around operation.
> >
> > And the most suspect:
> >
> >   
> >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> >> {
> >> 	struct kvm_lapic *apic = vcpu->arch.apic;
> >>
> >> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> >> 		apic->timer.last_update = ktime_add_ns(
> >> 				apic->timer.last_update,
> >> 				apic->timer.period);
> >> }
> >>     
> >
> > Not sure what's happening, have you tried this? (In fact, I am little willing 
> > to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
> > as in pit.)
> >   
> 
> Yes, this code is the culprit. Using that patch of yours now is always >
> last_update. I'd still like to see that while loop go away ;-).

Glad it works. :)

I think it's a little similar to the PIT problem that Marcelo fixed, but I
am not sure about the reason. Anyone got idea?

Anyway, I would like to post a patch later.

-- 
regards
Yang, Sheng	|Intel Opensource Technology Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-09 12:57   ` Alexander Graf
  2009-01-10 11:21     ` Sheng Yang
@ 2009-01-10 11:25     ` Sheng Yang
  2009-01-10 11:28     ` Sheng Yang
  2 siblings, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2009-01-10 11:25 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Sheng Yang, kvm, avi, Kevin Wolf, Marcelo Tosatti

On Fri, Jan 09, 2009 at 01:57:30PM +0100, Alexander Graf wrote:
> Sheng Yang wrote:
> > On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> >   
> >> While booting Linux in VMware ESX, I encountered a strange effect
> >> in the in-kernel lapic implementation: time went backwards!
> >>
> >> While this should never occur, because of that the while loop that
> >> is done after the invalid calculations caused my host system to hang.
> >>
> >> In order to make debugging easier, let's replace this as suggested
> >> with a modulo function and not run into the danger of looping forever.
> >>
> >> To replace the nice hint this bug gave me that the values are broken,
> >> I added a printk message so people encountering this can at least
> >> see that something is fishy.
> >>
> >> Of course, the real issue needs to be fixed as well! I'm open to ideas
> >> why now < last_update!
> >>
> >> (Thanks to Kevin for his help in debugging this)
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> Signed-off-by: Kevin Wolf <kwolf@suse.de>
> >> ---
> >>     
> >
> > Hi Alexander
> >
> > I'm a little suspect here:
> >
> >   
> >> 	if (unlikely(ktime_to_ns(now) <=
> >> 		ktime_to_ns(apic->timer.last_update))) {
> >> 		/* Wrap around */
> >> 		passed = ktime_add(( {
> >> 				    (ktime_t) {
> >> 				    .tv64 = KTIME_MAX -
> >> 				    (apic->timer.last_update).tv64}; }
> >> 				   ), now);
> >> 		apic_debug("time elapsed\n");
> >> 	} else
> >> 		passed = ktime_sub(now, apic->timer.last_update);
> >>     
> >
> > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
> > really ktime_get() which is almost impossible to wrap around. If it's 
> > overflow, at least we need a warning. I think this piece of code due to clock 
> > source change.
> >
> > So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
> > big wrap around operation.
> >
> > And the most suspect:
> >
> >   
> >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> >> {
> >> 	struct kvm_lapic *apic = vcpu->arch.apic;
> >>
> >> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> >> 		apic->timer.last_update = ktime_add_ns(
> >> 				apic->timer.last_update,
> >> 				apic->timer.period);
> >> }
> >>     
> >
> > Not sure what's happening, have you tried this? (In fact, I am little willing 
> > to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
> > as in pit.)
> >   
> 
> Yes, this code is the culprit. Using that patch of yours now is always >
> last_update. I'd still like to see that while loop go away ;-).

Oh, for the loop, let's ask Avi. :)

-- 
regards
Yang, Sheng	|Intel Opensource Technology Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-09 12:57   ` Alexander Graf
  2009-01-10 11:21     ` Sheng Yang
  2009-01-10 11:25     ` Sheng Yang
@ 2009-01-10 11:28     ` Sheng Yang
  2 siblings, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2009-01-10 11:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Sheng Yang, kvm, avi, Kevin Wolf, Marcelo Tosatti

On Fri, Jan 09, 2009 at 01:57:30PM +0100, Alexander Graf wrote:
> Sheng Yang wrote:
> > On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> >   
> >> While booting Linux in VMware ESX, I encountered a strange effect
> >> in the in-kernel lapic implementation: time went backwards!
> >>
> >> While this should never occur, because of that the while loop that
> >> is done after the invalid calculations caused my host system to hang.
> >>
> >> In order to make debugging easier, let's replace this as suggested
> >> with a modulo function and not run into the danger of looping forever.
> >>
> >> To replace the nice hint this bug gave me that the values are broken,
> >> I added a printk message so people encountering this can at least
> >> see that something is fishy.
> >>
> >> Of course, the real issue needs to be fixed as well! I'm open to ideas
> >> why now < last_update!
> >>
> >> (Thanks to Kevin for his help in debugging this)
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> Signed-off-by: Kevin Wolf <kwolf@suse.de>
> >> ---
> >>     
> >
> > Hi Alexander
> >
> > I'm a little suspect here:
> >
> >   
> >> 	if (unlikely(ktime_to_ns(now) <=
> >> 		ktime_to_ns(apic->timer.last_update))) {
> >> 		/* Wrap around */
> >> 		passed = ktime_add(( {
> >> 				    (ktime_t) {
> >> 				    .tv64 = KTIME_MAX -
> >> 				    (apic->timer.last_update).tv64}; }
> >> 				   ), now);
> >> 		apic_debug("time elapsed\n");
> >> 	} else
> >> 		passed = ktime_sub(now, apic->timer.last_update);
> >>     
> >
> > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
> > really ktime_get() which is almost impossible to wrap around. If it's 
> > overflow, at least we need a warning. I think this piece of code due to clock 
> > source change.
> >
> > So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
> > big wrap around operation.
> >
> > And the most suspect:
> >
> >   
> >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> >> {
> >> 	struct kvm_lapic *apic = vcpu->arch.apic;
> >>
> >> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> >> 		apic->timer.last_update = ktime_add_ns(
> >> 				apic->timer.last_update,
> >> 				apic->timer.period);
> >> }
> >>     
> >
> > Not sure what's happening, have you tried this? (In fact, I am little willing 
> > to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
> > as in pit.)
> >   
> 
> Yes, this code is the culprit. Using that patch of yours now is always >
> last_update. I'd still like to see that while loop go away ;-).

(I would say all in one mail, but it always comes a little later... :) )

I think at least part of comment for the while loop should be kept...

-- 
regards
Yang, Sheng	|Intel Opensource Technology Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-10 11:21     ` Sheng Yang
@ 2009-01-11  4:55       ` Marcelo Tosatti
  2009-01-13  7:47         ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-11  4:55 UTC (permalink / raw)
  To: Alexander Graf, Sheng Yang, kvm, avi, Kevin Wolf

On Sat, Jan 10, 2009 at 07:21:13PM +0800, Sheng Yang wrote:
> On Fri, Jan 09, 2009 at 01:57:30PM +0100, Alexander Graf wrote:
> > Sheng Yang wrote:
> > > On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> > >   
> > >> While booting Linux in VMware ESX, I encountered a strange effect
> > >> in the in-kernel lapic implementation: time went backwards!
> > >>
> > >> While this should never occur, because of that the while loop that
> > >> is done after the invalid calculations caused my host system to hang.
> > >>
> > >> In order to make debugging easier, let's replace this as suggested
> > >> with a modulo function and not run into the danger of looping forever.
> > >>
> > >> To replace the nice hint this bug gave me that the values are broken,
> > >> I added a printk message so people encountering this can at least
> > >> see that something is fishy.
> > >>
> > >> Of course, the real issue needs to be fixed as well! I'm open to ideas
> > >> why now < last_update!
> > >>
> > >> (Thanks to Kevin for his help in debugging this)
> > >>
> > >> Signed-off-by: Alexander Graf <agraf@suse.de>
> > >> Signed-off-by: Kevin Wolf <kwolf@suse.de>
> > >> ---
> > >>     
> > >
> > > Hi Alexander
> > >
> > > I'm a little suspect here:
> > >
> > >   
> > >> 	if (unlikely(ktime_to_ns(now) <=
> > >> 		ktime_to_ns(apic->timer.last_update))) {
> > >> 		/* Wrap around */
> > >> 		passed = ktime_add(( {
> > >> 				    (ktime_t) {
> > >> 				    .tv64 = KTIME_MAX -
> > >> 				    (apic->timer.last_update).tv64}; }
> > >> 				   ), now);
> > >> 		apic_debug("time elapsed\n");
> > >> 	} else
> > >> 		passed = ktime_sub(now, apic->timer.last_update);
> > >>     
> > >
> > > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is 
> > > really ktime_get() which is almost impossible to wrap around. If it's 
> > > overflow, at least we need a warning. I think this piece of code due to clock 
> > > source change.
> > >
> > > So I doubt: due to some reason, now <= apic->timer.last_update, which cause a 
> > > big wrap around operation.
> > >
> > > And the most suspect:
> > >
> > >   
> > >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> > >> {
> > >> 	struct kvm_lapic *apic = vcpu->arch.apic;
> > >>
> > >> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> > >> 		apic->timer.last_update = ktime_add_ns(
> > >> 				apic->timer.last_update,
> > >> 				apic->timer.period);
> > >> }
> > >>     
> > >
> > > Not sure what's happening, have you tried this? (In fact, I am little willing 
> > > to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() 
> > > as in pit.)
> > >   
> > 
> > Yes, this code is the culprit. Using that patch of yours now is always >
> > last_update. I'd still like to see that while loop go away ;-).

Sheng,

Will separate the discussions on items:


1) On the current code, last_update is assigned by kvm_apic_timer_intr_post
as the expiration time of the next interrupt, while apic_get_tmcct
interprets it as "countdown start":

    passed = ktime_sub(now, apic->timer.last_update);

This explains why your first patch against kvm_apic_timer_intr_post
fixed the bug Alexander was experiencing.

BTW, yes, the "overflow" handling is not necessary as you noted.

2) The current count calculation is not right (apart from any eventual
bugs), or I don't understand it. It uses the interrupt injection
time as "countdown start" (or rearm in Linux terminology), in
kvm_apic_timer_intr_post.

But the host hrtimer starts counting once rearmed (assuming periodic
configuration). So the inaccuracy of APIC_TMCCT depends on this delay
between hrtimer rearm and interrupt injection.

What is this scheme supposed to achieve? It seems much simpler, and
more accurate, to return scaled hrtimer_get_remaining on APIC_TMCCT
emulation.

3) And then there's interrupt reinjection. Once interrupts accumulate,
and we reinject, current count reads become completly bogus. This is
the reason for time drift on RHEL4-style kernels with "clock=tsc". The
algorithm calculates the interrupt handling delay by using the PIT
count (equivalent to APIC_TMCCT). But PIT count emulation uses the
hrtimer expiration time, which has nothing to do with the accumulated
interrupts.

So what I propose is to first switch lapic current count emulation to a
straightforward scaled hrtimer_get_remaining equivalent.

For the reinjection case, maintain an average of the delay between host
interrupt and interrupt injection (this can be generic, so both PIT
and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
emulation whenever pending > 2.

What you think?

And as Avi noted, there should be a distinction between accumulated
interrupts, we only want to reinject the ones caused by host load (and
there are some interesting optimizations that can be done later, such
as stopping hrtimer from ticking until guest acks one interrupt, and
calculating the missed ones). But better start with the basic.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-11  4:55       ` Marcelo Tosatti
@ 2009-01-13  7:47         ` Sheng Yang
  2009-01-13 22:01           ` Marcelo Tosatti
  0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-13  7:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote:
> On Sat, Jan 10, 2009 at 07:21:13PM +0800, Sheng Yang wrote:
> > On Fri, Jan 09, 2009 at 01:57:30PM +0100, Alexander Graf wrote:
> > > Sheng Yang wrote:
> > > > On Friday 09 January 2009 00:36:06 Alexander Graf wrote:
> > > >> While booting Linux in VMware ESX, I encountered a strange effect
> > > >> in the in-kernel lapic implementation: time went backwards!
> > > >>
> > > >> ---
> > > >
> > > > Hi Alexander
> > > >
> > > > I'm a little suspect here:
> > > >> 	if (unlikely(ktime_to_ns(now) <=
> > > >> 		ktime_to_ns(apic->timer.last_update))) {
> > > >> 		/* Wrap around */
> > > >> 		passed = ktime_add(( {
> > > >> 				    (ktime_t) {
> > > >> 				    .tv64 = KTIME_MAX -
> > > >> 				    (apic->timer.last_update).tv64}; }
> > > >> 				   ), now);
> > > >> 		apic_debug("time elapsed\n");
> > > >> 	} else
> > > >> 		passed = ktime_sub(now, apic->timer.last_update);
> > > >
> > > > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and
> > > > get_time() is really ktime_get() which is almost impossible to wrap
> > > > around. If it's overflow, at least we need a warning. I think this
> > > > piece of code due to clock source change.
> > > >
> > > > So I doubt: due to some reason, now <= apic->timer.last_update, which
> > > > cause a big wrap around operation.
> > > >
> > > > And the most suspect:
> > > >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> > > >> {
> > > >> 	struct kvm_lapic *apic = vcpu->arch.apic;
> > > >>
> > > >> 	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> > > >> 		apic->timer.last_update = ktime_add_ns(
> > > >> 				apic->timer.last_update,
> > > >> 				apic->timer.period);
> > > >> }
> > > >
> > > > Not sure what's happening, have you tried this? (In fact, I am little
> > > > willing to replace all apic->timer.dev.base->get_time() with more
> > > > explicit ktime_get() as in pit.)
> > >
> > > Yes, this code is the culprit. Using that patch of yours now is always
> > > > last_update. I'd still like to see that while loop go away ;-).
>
> Sheng,
>
> Will separate the discussions on items:
>
>
> 1) On the current code, last_update is assigned by kvm_apic_timer_intr_post
> as the expiration time of the next interrupt, while apic_get_tmcct
> interprets it as "countdown start":
>
>     passed = ktime_sub(now, apic->timer.last_update);
>
> This explains why your first patch against kvm_apic_timer_intr_post
> fixed the bug Alexander was experiencing.
>
> BTW, yes, the "overflow" handling is not necessary as you noted.
>
> 2) The current count calculation is not right (apart from any eventual
> bugs), or I don't understand it. It uses the interrupt injection
> time as "countdown start" (or rearm in Linux terminology), in
> kvm_apic_timer_intr_post.
>
> But the host hrtimer starts counting once rearmed (assuming periodic
> configuration). So the inaccuracy of APIC_TMCCT depends on this delay
> between hrtimer rearm and interrupt injection.
>
> What is this scheme supposed to achieve? It seems much simpler, and
> more accurate, to return scaled hrtimer_get_remaining on APIC_TMCCT
> emulation.

Marcelo,

Sorry to late reply...

After rechecked the code, got some thoughts. last_update can indicated rearm 
time as following:

The original apic_timer_intr_post() got:

>         if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
>                 apic->timer.last_update = ktime_add_ns(
>                                 apic->timer.last_update,
>                                 apic->timer.period);

I think the purpose is let guest see a reasonable TMCCT. Which means:

1. Timer start to fire at start_apic_timer(). last_update=now(). 

2. Every one LAPIC timer interrupt was injected into the guest, so the time at 
least elapsed timer.period(the first alarm set at "period" later), then 
last_update is increased by period. Notice last_update's base is set before 
timer fire, so it's not indicated the next one, but *the time this interrupt 
should be injected*(time can be delayed)... So last_update = n*period + base.

3. If there is pending interrupt, last_update won't be updated, so we have 
this:

>                         while (counter_passed > tmcct)
>                                 counter_passed -= tmcct;
>                         tmcct -= counter_passed;
(notice tmcct is TMICT here.)

And the returned tmcct value indicated (counter_passed % tmict), a offset 
regardless of pending interrupt numbers.

I think now the overflow seems OK, but I am not sure why last_update can be 
bigger than ktime_get()? Maybe due to vcpu migration? Suspect some racy or 
boundary condition existed...

And base on this, I don't think my quick fix is correct...

>
> 3) And then there's interrupt reinjection. Once interrupts accumulate,
> and we reinject, current count reads become completly bogus. This is
> the reason for time drift on RHEL4-style kernels with "clock=tsc". The
> algorithm calculates the interrupt handling delay by using the PIT
> count (equivalent to APIC_TMCCT). But PIT count emulation uses the
> hrtimer expiration time, which has nothing to do with the accumulated
> interrupts.
>
> So what I propose is to first switch lapic current count emulation to a
> straightforward scaled hrtimer_get_remaining equivalent.
>
> For the reinjection case, maintain an average of the delay between host
> interrupt and interrupt injection (this can be generic, so both PIT
> and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
> emulation whenever pending > 2.
>
> What you think?

1. If we simply use ktime_get() to update last_update, the interval can't be 
same between different last_update. I think we may got some problem here, but 
not for sure. For example, three delayed interrupt was injected one after one, 
last_update would show very little interval, and APIC_TMCCT may got trouble.  
Maintain an average of the delay seems a little tricky here, and I am not sure 
if it would help the problem...  Average is average at most, not the real 
affection at the time...

2. For the current mechanism, the interval is the same, so last_update always 
equal to n*period + base. If there are more than one pending interrupts, TMCCT 
should also can return the relative correct value.

So I am leaning toward to fix current problem, though I haven't find the root 
cause yet...

> And as Avi noted, there should be a distinction between accumulated
> interrupts, we only want to reinject the ones caused by host load (and
> there are some interesting optimizations that can be done later, such
> as stopping hrtimer from ticking until guest acks one interrupt, and
> calculating the missed ones). But better start with the basic.

Agree...

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-13  7:47         ` Sheng Yang
@ 2009-01-13 22:01           ` Marcelo Tosatti
  2009-01-14  9:17             ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-13 22:01 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

On Tue, Jan 13, 2009 at 03:47:49PM +0800, Sheng Yang wrote:
> On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote:
> The original apic_timer_intr_post() got:
> 
> >         if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> >                 apic->timer.last_update = ktime_add_ns(
> >                                 apic->timer.last_update,
> >                                 apic->timer.period);
> 
> I think the purpose is let guest see a reasonable TMCCT. Which means:
> 
> 1. Timer start to fire at start_apic_timer(). last_update=now(). 
> 
> 2. Every one LAPIC timer interrupt was injected into the guest, so the time at 
> least elapsed timer.period(the first alarm set at "period" later), then 
> last_update is increased by period. Notice last_update's base is set before 
> timer fire, so it's not indicated the next one, but *the time this interrupt 
> should be injected*(time can be delayed)... So last_update = n*period + base.

Right.

> 3. If there is pending interrupt, last_update won't be updated, so we have 
> this:
> 
> >                         while (counter_passed > tmcct)
> >                                 counter_passed -= tmcct;
> >                         tmcct -= counter_passed;
> (notice tmcct is TMICT here.)
> 
> And the returned tmcct value indicated (counter_passed % tmict), a offset 
> regardless of pending interrupt numbers.

Right. The problem is that for accumulated interrupts, the guest will
receive the interrupt as fast as the qemu process can be scheduled (as
long as its not masked, of course). There could be higher priority
vectors in the mix, but thats not the common case.

So calculating the offset using last interrupt injection is not very 
reasonable in this case.

> I think now the overflow seems OK, but I am not sure why last_update can be 
> bigger than ktime_get()? Maybe due to vcpu migration? Suspect some racy or 
> boundary condition existed...

Yep, it seems to be some small inaccuracy in the accounting that
causes it. 

> And base on this, I don't think my quick fix is correct...
> 
> >
> > 3) And then there's interrupt reinjection. Once interrupts accumulate,
> > and we reinject, current count reads become completly bogus. This is
> > the reason for time drift on RHEL4-style kernels with "clock=tsc". The
> > algorithm calculates the interrupt handling delay by using the PIT
> > count (equivalent to APIC_TMCCT). But PIT count emulation uses the
> > hrtimer expiration time, which has nothing to do with the accumulated
> > interrupts.
> >
> > So what I propose is to first switch lapic current count emulation to a
> > straightforward scaled hrtimer_get_remaining equivalent.
> >
> > For the reinjection case, maintain an average of the delay between host
> > interrupt and interrupt injection (this can be generic, so both PIT
> > and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
> > emulation whenever pending > 2.
> >
> > What you think?
> 
> 1. If we simply use ktime_get() to update last_update, the interval can't be 
> same between different last_update. I think we may got some problem here, but 
> not for sure. For example, three delayed interrupt was injected one after one, 
> last_update would show very little interval, and APIC_TMCCT may got trouble.  

I mean get rid of last_update and use hrtimer_get_remaining on
apic_get_tmcct for no pending interrupts. And estimated delay between
timer fire and injection if there are pending interrupts.

> Maintain an average of the delay seems a little tricky here, and I am not sure 
> if it would help the problem...  Average is average at most, not the real 
> affection at the time...

Yes. Probably using the last delay between timer fire and injection is
more accurate than an average.

> 
> 2. For the current mechanism, the interval is the same, so last_update always 
> equal to n*period + base. If there are more than one pending interrupts, TMCCT 
> should also can return the relative correct value.
> 
> So I am leaning toward to fix current problem, though I haven't find the root 
> cause yet...

I think it can be simpler, without the need to deal with overflow at
all. 

Understanding the present root cause is important though.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-13 22:01           ` Marcelo Tosatti
@ 2009-01-14  9:17             ` Sheng Yang
  2009-01-14 17:03               ` Marcelo Tosatti
  0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-14  9:17 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

On Wednesday 14 January 2009 06:01:17 Marcelo Tosatti wrote:
> On Tue, Jan 13, 2009 at 03:47:49PM +0800, Sheng Yang wrote:
> > On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote:
> >
> > The original apic_timer_intr_post() got:
> > >         if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> > >                 apic->timer.last_update = ktime_add_ns(
> > >                                 apic->timer.last_update,
> > >                                 apic->timer.period);
> >
> > I think the purpose is let guest see a reasonable TMCCT. Which means:
> >
> > 1. Timer start to fire at start_apic_timer(). last_update=now().
> >
> > 2. Every one LAPIC timer interrupt was injected into the guest, so the
> > time at least elapsed timer.period(the first alarm set at "period"
> > later), then last_update is increased by period. Notice last_update's
> > base is set before timer fire, so it's not indicated the next one, but
> > *the time this interrupt should be injected*(time can be delayed)... So
> > last_update = n*period + base.
>
> Right.
>
> > 3. If there is pending interrupt, last_update won't be updated, so we
> > have
> >
> > this:
> > >                         while (counter_passed > tmcct)
> > >                                 counter_passed -= tmcct;
> > >                         tmcct -= counter_passed;
> >
> > (notice tmcct is TMICT here.)
> >
> > And the returned tmcct value indicated (counter_passed % tmict), a offset
> > regardless of pending interrupt numbers.
>
> Right. The problem is that for accumulated interrupts, the guest will
> receive the interrupt as fast as the qemu process can be scheduled (as
> long as its not masked, of course). There could be higher priority
> vectors in the mix, but thats not the common case.
>
> So calculating the offset using last interrupt injection is not very
> reasonable in this case.

Um... I think it's not easy to find a complete reasonable result here, I 
haven't look into how OS us TMCCT, maybe it can help us to find a more 
reasonable result here.

> > I think now the overflow seems OK, but I am not sure why last_update can
> > be bigger than ktime_get()? Maybe due to vcpu migration? Suspect some
> > racy or boundary condition existed...
>
> Yep, it seems to be some small inaccuracy in the accounting that
> causes it.
>
> > And base on this, I don't think my quick fix is correct...
> >
> > > 3) And then there's interrupt reinjection. Once interrupts accumulate,
> > > and we reinject, current count reads become completly bogus. This is
> > > the reason for time drift on RHEL4-style kernels with "clock=tsc". The
> > > algorithm calculates the interrupt handling delay by using the PIT
> > > count (equivalent to APIC_TMCCT). But PIT count emulation uses the
> > > hrtimer expiration time, which has nothing to do with the accumulated
> > > interrupts.
> > >
> > > So what I propose is to first switch lapic current count emulation to a
> > > straightforward scaled hrtimer_get_remaining equivalent.
> > >
> > > For the reinjection case, maintain an average of the delay between host
> > > interrupt and interrupt injection (this can be generic, so both PIT
> > > and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
> > > emulation whenever pending > 2.
> > >
> > > What you think?
> >
> > 1. If we simply use ktime_get() to update last_update, the interval can't
> > be same between different last_update. I think we may got some problem
> > here, but not for sure. For example, three delayed interrupt was injected
> > one after one, last_update would show very little interval, and
> > APIC_TMCCT may got trouble.
>
> I mean get rid of last_update and use hrtimer_get_remaining on
> apic_get_tmcct for no pending interrupts. And estimated delay between
> timer fire and injection if there are pending interrupts.
>
> > Maintain an average of the delay seems a little tricky here, and I am not
> > sure if it would help the problem...  Average is average at most, not the
> > real affection at the time...
>
> Yes. Probably using the last delay between timer fire and injection is
> more accurate than an average.
>
> > 2. For the current mechanism, the interval is the same, so last_update
> > always equal to n*period + base. If there are more than one pending
> > interrupts, TMCCT should also can return the relative correct value.
> >
> > So I am leaning toward to fix current problem, though I haven't find the
> > root cause yet...
>
> I think it can be simpler, without the need to deal with overflow at
> all.
>
> Understanding the present root cause is important though.

Yes, but still don't have many clues now... I would look through the code and 
purpose some patches for Alex to test...

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-14  9:17             ` Sheng Yang
@ 2009-01-14 17:03               ` Marcelo Tosatti
  2009-01-15  7:20                 ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-14 17:03 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

On Wed, Jan 14, 2009 at 05:17:22PM +0800, Sheng Yang wrote:
> > So calculating the offset using last interrupt injection is not very
> > reasonable in this case.
> 
> Um... I think it's not easy to find a complete reasonable result here, I 
> haven't look into how OS us TMCCT, maybe it can help us to find a more 
> reasonable result here.

Linux, FreeBSD use it for calibration only. And Windows too, apparently
(at least a couple of versions tested).

What about this?

Index: kvm/arch/x86/kvm/lapic.c
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -511,52 +511,31 @@ static void apic_send_ipi(struct kvm_lap
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
 {
-	u64 counter_passed;
-	ktime_t passed, now;
-	u32 tmcct;
+	ktime_t remaining;
+	u32 tmcct = 0;
 
 	ASSERT(apic != NULL);
 
-	now = apic->timer.dev.base->get_time();
-	tmcct = apic_get_reg(apic, APIC_TMICT);
-
 	/* if initial count is 0, current count should also be 0 */
-	if (tmcct == 0)
+	if (apic_get_reg(apic, APIC_TMICT) == 0)
 		return 0;
 
-	if (unlikely(ktime_to_ns(now) <=
-		ktime_to_ns(apic->timer.last_update))) {
-		/* Wrap around */
-		passed = ktime_add(( {
-				    (ktime_t) {
-				    .tv64 = KTIME_MAX -
-				    (apic->timer.last_update).tv64}; }
-				   ), now);
-		apic_debug("time elapsed\n");
-	} else
-		passed = ktime_sub(now, apic->timer.last_update);
-
-	counter_passed = div64_u64(ktime_to_ns(passed),
-				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
-
-	if (counter_passed > tmcct) {
-		if (unlikely(!apic_lvtt_period(apic))) {
-			/* one-shot timers stick at 0 until reset */
-			tmcct = 0;
-		} else {
-			/*
-			 * periodic timers reset to APIC_TMICT when they
-			 * hit 0. The while loop simulates this happening N
-			 * times. (counter_passed %= tmcct) would also work,
-			 * but might be slower or not work on 32-bit??
-			 */
-			while (counter_passed > tmcct)
-				counter_passed -= tmcct;
-			tmcct -= counter_passed;
-		}
-	} else {
-		tmcct -= counter_passed;
-	}
+	/*
+	 * Since reinjection is not rate-limited, use the delay
+ 	 * to inject the last interrupt as an estimate.
+ 	 */
+	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
+		remaining = apic->timer.injection_delay;
+		if (ktime_to_ns(remaining) > apic->timer.period)
+			remaining = ns_to_ktime(apic->timer.period);
+        } else
+		remaining = hrtimer_expires_remaining(&apic->timer.dev);
+
+	if (remaining.tv64 > 0)
+		tmcct = div64_u64(ktime_to_ns(remaining),
+			(APIC_BUS_CYCLE_NS * apic->timer.divide_count));
+
+	WARN_ON(tmcct > apic_get_reg(apic, APIC_TMICT));
 
 	return tmcct;
 }
@@ -653,8 +632,6 @@ static void start_apic_timer(struct kvm_
 {
 	ktime_t now = apic->timer.dev.base->get_time();
 
-	apic->timer.last_update = now;
-
 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
 	atomic_set(&apic->timer.pending, 0);
@@ -972,6 +949,7 @@ static int __apic_timer_fn(struct kvm_la
 		set_bit(KVM_REQ_PENDING_TIMER, &apic->vcpu->requests);
 	if (waitqueue_active(q))
 		wake_up_interruptible(q);
+	apic->timer.int_trigger = ktime_get();
 
 	if (apic_lvtt_period(apic)) {
 		result = 1;
@@ -1112,12 +1090,14 @@ void kvm_inject_apic_timer_irqs(struct k
 
 void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
 {
+	ktime_t delta;
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
-		apic->timer.last_update = ktime_add_ns(
-				apic->timer.last_update,
-				apic->timer.period);
+	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec &&
+	    atomic_read(&apic->timer.pending) == 0) {
+		delta = ktime_sub(ktime_get(), apic->timer.int_trigger);
+		apic->timer.injection_delay = delta;
+	}
 }
 
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
Index: kvm/arch/x86/kvm/lapic.h
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.h
+++ kvm/arch/x86/kvm/lapic.h
@@ -12,7 +12,8 @@ struct kvm_lapic {
 		atomic_t pending;
 		s64 period;	/* unit: ns */
 		u32 divide_count;
-		ktime_t last_update;
+		ktime_t injection_delay;
+		ktime_t int_trigger;
 		struct hrtimer dev;
 	} timer;
 	struct kvm_vcpu *vcpu;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-14 17:03               ` Marcelo Tosatti
@ 2009-01-15  7:20                 ` Sheng Yang
  2009-01-16  5:01                   ` Marcelo Tosatti
  0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-15  7:20 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

On Thursday 15 January 2009 01:03:59 Marcelo Tosatti wrote:
> On Wed, Jan 14, 2009 at 05:17:22PM +0800, Sheng Yang wrote:
> > > So calculating the offset using last interrupt injection is not very
> > > reasonable in this case.
> >
> > Um... I think it's not easy to find a complete reasonable result here, I
> > haven't look into how OS us TMCCT, maybe it can help us to find a more
> > reasonable result here.
>
> Linux, FreeBSD use it for calibration only. And Windows too, apparently
> (at least a couple of versions tested).
>
> What about this?
>
> Index: kvm/arch/x86/kvm/lapic.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.c
> +++ kvm/arch/x86/kvm/lapic.c
> @@ -511,52 +511,31 @@ static void apic_send_ipi(struct kvm_lap
>
>  static u32 apic_get_tmcct(struct kvm_lapic *apic)
>  {
> -	u64 counter_passed;
> -	ktime_t passed, now;
> -	u32 tmcct;
> +	ktime_t remaining;
> +	u32 tmcct = 0;
>
>  	ASSERT(apic != NULL);
>
> -	now = apic->timer.dev.base->get_time();
> -	tmcct = apic_get_reg(apic, APIC_TMICT);
> -
>  	/* if initial count is 0, current count should also be 0 */
> -	if (tmcct == 0)
> +	if (apic_get_reg(apic, APIC_TMICT) == 0)
>  		return 0;
>
> -	if (unlikely(ktime_to_ns(now) <=
> -		ktime_to_ns(apic->timer.last_update))) {
> -		/* Wrap around */
> -		passed = ktime_add(( {
> -				    (ktime_t) {
> -				    .tv64 = KTIME_MAX -
> -				    (apic->timer.last_update).tv64}; }
> -				   ), now);
> -		apic_debug("time elapsed\n");
> -	} else
> -		passed = ktime_sub(now, apic->timer.last_update);
> -
> -	counter_passed = div64_u64(ktime_to_ns(passed),
> -				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
> -
> -	if (counter_passed > tmcct) {
> -		if (unlikely(!apic_lvtt_period(apic))) {
> -			/* one-shot timers stick at 0 until reset */
> -			tmcct = 0;
> -		} else {
> -			/*
> -			 * periodic timers reset to APIC_TMICT when they
> -			 * hit 0. The while loop simulates this happening N
> -			 * times. (counter_passed %= tmcct) would also work,
> -			 * but might be slower or not work on 32-bit??
> -			 */
> -			while (counter_passed > tmcct)
> -				counter_passed -= tmcct;
> -			tmcct -= counter_passed;
> -		}
> -	} else {
> -		tmcct -= counter_passed;
> -	}
> +	/*
> +	 * Since reinjection is not rate-limited, use the delay
> + 	 * to inject the last interrupt as an estimate.
> + 	 */
> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> +		remaining = apic->timer.injection_delay;
> +		if (ktime_to_ns(remaining) > apic->timer.period)
> +			remaining = ns_to_ktime(apic->timer.period);
> +        } else
> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);

A little doubt...

A: time_fire
B: intr_post
C: read TMCCT

The sequence can be ABC or ACB.

injection_delay = time(B) - time(A)

So it didn't count time from read TMCCT... And guest get interrupt at time(B), 
not quite understand why time(B) - time(A) matters here...

I think the reasonable here means, this interval is usable later after the 
accumulated interrupts are injected. From this point of view, I think current 
solution is reasonable. It just assume the delayed interrupts have been 
injected.

However, seriously, any value here is wrong, no elegant one. But I still 
prefer to the current solution...

And here is not the really problem for now I think. The current mechanism is 
mostly OK, but where is the bug... We have either have a simple fix(e.g. if 
now < last_update, then return 0) or dig into it. Did it worth a try? Anyway, 
it would return a buggy result if we have pending interrupts...

-- 
regards
Yang, Sheng

> +
> +	if (remaining.tv64 > 0)
> +		tmcct = div64_u64(ktime_to_ns(remaining),
> +			(APIC_BUS_CYCLE_NS * apic->timer.divide_count));
> +
> +	WARN_ON(tmcct > apic_get_reg(apic, APIC_TMICT));
>
>  	return tmcct;
>  }
> @@ -653,8 +632,6 @@ static void start_apic_timer(struct kvm_
>  {
>  	ktime_t now = apic->timer.dev.base->get_time();
>
> -	apic->timer.last_update = now;
> -
>  	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>  		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>  	atomic_set(&apic->timer.pending, 0);
> @@ -972,6 +949,7 @@ static int __apic_timer_fn(struct kvm_la
>  		set_bit(KVM_REQ_PENDING_TIMER, &apic->vcpu->requests);
>  	if (waitqueue_active(q))
>  		wake_up_interruptible(q);
> +	apic->timer.int_trigger = ktime_get();
>
>  	if (apic_lvtt_period(apic)) {
>  		result = 1;
> @@ -1112,12 +1090,14 @@ void kvm_inject_apic_timer_irqs(struct k
>
>  void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
>  {
> +	ktime_t delta;
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>
> -	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> -		apic->timer.last_update = ktime_add_ns(
> -				apic->timer.last_update,
> -				apic->timer.period);
> +	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec &&
> +	    atomic_read(&apic->timer.pending) == 0) {
> +		delta = ktime_sub(ktime_get(), apic->timer.int_trigger);
> +		apic->timer.injection_delay = delta;
> +	}
>  }
>
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> Index: kvm/arch/x86/kvm/lapic.h
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.h
> +++ kvm/arch/x86/kvm/lapic.h
> @@ -12,7 +12,8 @@ struct kvm_lapic {
>  		atomic_t pending;
>  		s64 period;	/* unit: ns */
>  		u32 divide_count;
> -		ktime_t last_update;
> +		ktime_t injection_delay;
> +		ktime_t int_trigger;
>  		struct hrtimer dev;
>  	} timer;
>  	struct kvm_vcpu *vcpu;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-15  7:20                 ` Sheng Yang
@ 2009-01-16  5:01                   ` Marcelo Tosatti
  2009-01-20 10:41                     ` Alexander Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-16  5:01 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Alexander Graf, kvm, avi, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2169 bytes --]

On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> > +	 * Since reinjection is not rate-limited, use the delay
> > + 	 * to inject the last interrupt as an estimate.
> > + 	 */
> > +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> > +		remaining = apic->timer.injection_delay;
> > +		if (ktime_to_ns(remaining) > apic->timer.period)
> > +			remaining = ns_to_ktime(apic->timer.period);
> > +        } else
> > +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> 
> A little doubt...
> 
> A: time_fire
> B: intr_post
> C: read TMCCT
> 
> The sequence can be ABC or ACB.
> 
> injection_delay = time(B) - time(A)
> 
> So it didn't count time from read TMCCT... And guest get interrupt at time(B), 
> not quite understand why time(B) - time(A) matters here...

Its an estimate of the delay it takes to inject an interrupt to the
guest once fired. Its only used if there have been accumulated ones, so
ACB sequence with pending=0 will use hrtimer_expires_remaining().

> I think the reasonable here means, this interval is usable later after the 
> accumulated interrupts are injected. From this point of view, I think current 
> solution is reasonable. It just assume the delayed interrupts have been 
> injected.
> 
> However, seriously, any value here is wrong, no elegant one. 

I agree.

> But I still prefer to the current solution...

Why? The proposed version is smaller and simpler than the current
one IMO, and also not vulnerable to whatever bug is causing now <
last_update.

And more precise, since the current scheme uses interrupt injection time
as if it was "count-down restart" time, which is not true.

> And here is not the really problem for now I think. The current mechanism is 
> mostly OK, but where is the bug... We have either have a simple fix(e.g. if 
> now < last_update, then return 0) or dig into it. Did it worth a try? Anyway, 
> it would return a buggy result if we have pending interrupts...

The overflow calculation is not necessary as discussed. Alexander, can
you please try the following? Sheng, do you have any other suggestion to
test?

/proc/timer_list output of the host when ESX is running too.





[-- Attachment #2: kvm-lapic-debug-loop.patch --]
[-- Type: text/plain, Size: 3426 bytes --]

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..e9f13cc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -509,6 +509,37 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 	}
 }
 
+
+static void record_last_update(struct kvm_lapic *apic)
+{
+	int rec_idx = apic->timer.rec_idx;
+
+	apic->timer.rec[rec_idx].when = ktime_get();
+	apic->timer.rec[rec_idx].last_update = apic->timer.last_update;
+	apic->timer.rec[rec_idx].pending = atomic_read(&apic->timer.pending);
+	apic->timer.rec_idx++;
+
+	if (apic->timer.rec_idx >= KVM_LAPIC_REC_NR)
+		apic->timer.rec_idx = 0;
+}
+
+static void print_last_update_record(struct kvm_lapic *apic)
+{
+	int iter = 0;
+	int i = apic->timer.rec_idx;
+
+	while (iter < KVM_LAPIC_REC_NR) {
+		printk("rec[%d] when=%lld last_update=%lld pend=%d\n",
+			i, ktime_to_ns(apic->timer.rec[i].when),
+			ktime_to_ns(apic->timer.rec[i].last_update),
+			apic->timer.rec[i].pending);
+		iter++;
+		i--;
+		if (i < 0)
+			i = KVM_LAPIC_REC_NR-1;
+	}
+}
+
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
 {
 	u64 counter_passed;
@@ -532,7 +563,10 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 				    .tv64 = KTIME_MAX -
 				    (apic->timer.last_update).tv64}; }
 				   ), now);
-		apic_debug("time elapsed\n");
+		printk("last_update = %lld now = %lld\n",
+			ktime_to_ns(apic->timer.last_update),
+			ktime_to_ns(now));
+		print_last_update_record(apic);
 	} else
 		passed = ktime_sub(now, apic->timer.last_update);
 
@@ -544,14 +578,7 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 			/* one-shot timers stick at 0 until reset */
 			tmcct = 0;
 		} else {
-			/*
-			 * periodic timers reset to APIC_TMICT when they
-			 * hit 0. The while loop simulates this happening N
-			 * times. (counter_passed %= tmcct) would also work,
-			 * but might be slower or not work on 32-bit??
-			 */
-			while (counter_passed > tmcct)
-				counter_passed -= tmcct;
+			counter_passed %= (u64)tmcct;
 			tmcct -= counter_passed;
 		}
 	} else {
@@ -666,7 +693,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
 		      ktime_add_ns(now, apic->timer.period),
 		      HRTIMER_MODE_ABS);
 
-	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
+	printk("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
 			   "timer initial count 0x%x, period %lldns, "
 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
@@ -1114,10 +1141,12 @@ void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
+	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec) {
 		apic->timer.last_update = ktime_add_ns(
 				apic->timer.last_update,
 				apic->timer.period);
+		record_last_update(apic);
+	}
 }
 
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8185888..27baadb 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -5,6 +5,8 @@
 
 #include <linux/kvm_host.h>
 
+#define KVM_LAPIC_REC_NR 50
+
 struct kvm_lapic {
 	unsigned long base_address;
 	struct kvm_io_device dev;
@@ -13,6 +15,13 @@ struct kvm_lapic {
 		s64 period;	/* unit: ns */
 		u32 divide_count;
 		ktime_t last_update;
+		struct {
+			ktime_t when;
+			ktime_t last_update;
+			int pending;
+		} rec[KVM_LAPIC_REC_NR];
+		int rec_idx;
+
 		struct hrtimer dev;
 	} timer;
 	struct kvm_vcpu *vcpu;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-16  5:01                   ` Marcelo Tosatti
@ 2009-01-20 10:41                     ` Alexander Graf
  2009-01-20 11:20                       ` Sheng Yang
  2009-01-20 13:43                       ` Sheng Yang
  0 siblings, 2 replies; 30+ messages in thread
From: Alexander Graf @ 2009-01-20 10:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, kvm, avi, Kevin Wolf

Marcelo Tosatti wrote:
> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
>   
>>> +	 * Since reinjection is not rate-limited, use the delay
>>> + 	 * to inject the last interrupt as an estimate.
>>> + 	 */
>>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
>>> +		remaining = apic->timer.injection_delay;
>>> +		if (ktime_to_ns(remaining) > apic->timer.period)
>>> +			remaining = ns_to_ktime(apic->timer.period);
>>> +        } else
>>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
>>>       
>> A little doubt...
>>
>> A: time_fire
>> B: intr_post
>> C: read TMCCT
>>
>> The sequence can be ABC or ACB.
>>
>> injection_delay = time(B) - time(A)
>>
>> So it didn't count time from read TMCCT... And guest get interrupt at time(B), 
>> not quite understand why time(B) - time(A) matters here...
>>     
>
> Its an estimate of the delay it takes to inject an interrupt to the
> guest once fired. Its only used if there have been accumulated ones, so
> ACB sequence with pending=0 will use hrtimer_expires_remaining().
>
>   
>> I think the reasonable here means, this interval is usable later after the 
>> accumulated interrupts are injected. From this point of view, I think current 
>> solution is reasonable. It just assume the delayed interrupts have been 
>> injected.
>>
>> However, seriously, any value here is wrong, no elegant one. 
>>     
>
> I agree.
>
>   
>> But I still prefer to the current solution...
>>     
>
> Why? The proposed version is smaller and simpler than the current
> one IMO, and also not vulnerable to whatever bug is causing now <
> last_update.
>
> And more precise, since the current scheme uses interrupt injection time
> as if it was "count-down restart" time, which is not true.
>
>   
>> And here is not the really problem for now I think. The current mechanism is 
>> mostly OK, but where is the bug... We have either have a simple fix(e.g. if 
>> now < last_update, then return 0) or dig into it. Did it worth a try? Anyway, 
>> it would return a buggy result if we have pending interrupts...
>>     
>
> The overflow calculation is not necessary as discussed. Alexander, can
> you please try the following? Sheng, do you have any other suggestion to
> test?
>
> /proc/timer_list output of the host when ESX is running too.
>   

Sorry for the late reply. Here's the dmesg output and /proc/timer_list
after the issue occured. I left the warning message in that I put there
in case the value is too high:

        counter_passed = div64_u64(ktime_to_ns(passed),
                                   (APIC_BUS_CYCLE_NS *
apic->timer.divide_count));
 
+       if (counter_passed > 0x7f00000000000000) {
+               /* If we're in here we probably encountered a bug! */
+               printk(KERN_INFO "LAPIC: Too high counter_passed value:
0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
ktime_to_ns(passed), passed, now.tv64, apic
->timer.last_update.tv64);
+       }
+
        if (counter_passed > tmcct) {
                if (unlikely(!apic_lvtt_period(apic))) {
                        /* one-shot timers stick at 0 until reset */



start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
last_update = 1014860818546063 now = 1014860818426085
rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.







agraf@busu:~/git/kvm-userspace/patch-virtio> cat /proc/timer_list
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 1015003705479836 nsecs

cpu: 0
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1231432907192509789 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <ffff88005c0a9d68>, tick_sched_timer, S:01,
tick_nohz_restart_sched_tick, swapper/0
 # expires at 1015003708000000 nsecs [in 2520164 nsecs]
 #1: <ffff88005c0a9d68>, posix_timer_fn, S:01, common_timer_set,
qemu-system-x86/12319
 # expires at 1015003730134958 nsecs [in 24655122 nsecs]
 #2: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/16571
 # expires at 1015004228913679 nsecs [in 523433843 nsecs]
 #3: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/5575
 # expires at 1015004235164609 nsecs [in 529684773 nsecs]
 #4: <ffff88005c0a9d68>, it_real_fn, S:01, do_setitimer, ntpd/3897
 # expires at 1015004515252084 nsecs [in 809772248 nsecs]
 #5: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, mono/4506
 # expires at 1015007076193303 nsecs [in 3370713467 nsecs]
 #6: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/5568
 # expires at 1015011817382709 nsecs [in 8111902873 nsecs]
 #7: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, ypbind/3441
 # expires at 1015018457521563 nsecs [in 14752041727 nsecs]
 #8: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, cron/4093
 # expires at 1015033940939337 nsecs [in 30235459501 nsecs]
 #9: <ffff88005c0a9d68>, it_real_fn, S:01, do_setitimer, qmgr/4085
 # expires at 1015280112163402 nsecs [in 276406683566 nsecs]
 #10: <ffff88005c0a9d68>, it_real_fn, S:01, do_setitimer, pickup/10231
 # expires at 1015285116223756 nsecs [in 281410743920 nsecs]
 #11: <ffff88005c0a9d68>, it_real_fn, S:01, do_setitimer, master/4058
 # expires at 1015285116254144 nsecs [in 281410774308 nsecs]
  .expires_next   : 1015003708000000 nsecs
  .hres_active    : 1
  .nr_events      : 580379958
  .nohz_mode      : 2
  .idle_tick      : 1015003700000000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4548643220
  .idle_calls     : 593550517
  .idle_sleeps    : 362170032
  .idle_entrytime : 1015003701647385 nsecs
  .idle_waketime  : 1015003701630992 nsecs
  .idle_exittime  : 1015003702532497 nsecs
  .idle_sleeptime : 571458026968578 nsecs
  .last_jiffies   : 4548643221
  .next_jiffies   : 4548643224
  .idle_expires   : 1015003712000000 nsecs
jiffies: 4548643222

cpu: 1
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1231432907192509789 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait, beagled/4716
 # expires at 1015003708490644 nsecs [in 3010808 nsecs]
 #1: <ffff88005c0a9d68>, tick_sched_timer, S:01,
tick_nohz_restart_sched_tick, swapper/0
 # expires at 1015003708500000 nsecs [in 3020164 nsecs]
 #2: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep,
xosview.bin/9582
 # expires at 1015003726118330 nsecs [in 20638494 nsecs]
 #3: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait, beagled/4718
 # expires at 1015003794490837 nsecs [in 89011001 nsecs]
 #4: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait, automount/3918
 # expires at 1015003984221642 nsecs [in 278741806 nsecs]
 #5: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/9982
 # expires at 1015004091979648 nsecs [in 386499812 nsecs]
 #6: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep,
beagled-helper/10041
 # expires at 1015004496780704 nsecs [in 791300868 nsecs]
 #7: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, mono/10031
 # expires at 1015004573838387 nsecs [in 868358551 nsecs]
 #8: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, mono/4568
 # expires at 1015007037900538 nsecs [in 3332420702 nsecs]
 #9: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, irqbalance/3631
 # expires at 1015010807493897 nsecs [in 7102014061 nsecs]
 #10: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/12567
 # expires at 1015052672245807 nsecs [in 48966765971 nsecs]
 #11: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep, smartd/4119
 # expires at 1015236365925605 nsecs [in 232660445769 nsecs]
  .expires_next   : 1015003708490644 nsecs
  .hres_active    : 1
  .nr_events      : 1177733430
  .nohz_mode      : 2
  .idle_tick      : 1015003704500000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4548643221
  .idle_calls     : 1078408244
  .idle_sleeps    : 846829238
  .idle_entrytime : 1015003704530991 nsecs
  .idle_waketime  : 1015003698212870 nsecs
  .idle_exittime  : 1015003701647606 nsecs
  .idle_sleeptime : 775344327548400 nsecs
  .last_jiffies   : 4548643221
  .next_jiffies   : 4548643247
  .idle_expires   : 1015003804000000 nsecs
jiffies: 4548643222

cpu: 2
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1231432907192509789 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <ffff88005c0a9d68>, apic_timer_fn, S:01, start_apic_timer,
qemu-system-x86/12327
 # expires at 1015003705716442 nsecs [in 236606 nsecs]
 #1: <ffff88005c0a9d68>, tick_sched_timer, S:01,
tick_nohz_restart_sched_tick, swapper/0
 # expires at 1015003709000000 nsecs [in 3520164 nsecs]
 #2: <ffff88005c0a9d68>, pit_timer_fn, S:01, pit_ioport_write,
qemu-system-x86/12327
 # expires at 1015003712306931 nsecs [in 6827095 nsecs]
 #3: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait, firefox/19140
 # expires at 1015003723685421 nsecs [in 18205585 nsecs]
 #4: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait, automount/3919
 # expires at 1015003987442435 nsecs [in 281962599 nsecs]
 #5: <ffff88005c0a9d68>, it_real_fn, S:01, do_setitimer, screen/14384
 # expires at 1015005458349005 nsecs [in 1752869169 nsecs]
  .expires_next   : 1015003705817818 nsecs
  .hres_active    : 1
  .nr_events      : 523400970
  .nohz_mode      : 2
  .idle_tick      : 1014859089000000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4548607067
  .idle_calls     : 336510407
  .idle_sleeps    : 258867154
  .idle_entrytime : 1014859085481448 nsecs
  .idle_waketime  : 1014859053575960 nsecs
  .idle_exittime  : 1014859085485692 nsecs
  .idle_sleeptime : 395655240495416 nsecs
  .last_jiffies   : 4548607067
  .next_jiffies   : 4548607171
  .idle_expires   : 1014859500000000 nsecs
jiffies: 4548643222

cpu: 3
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1231432907192509789 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <ffff88005c0a9d68>, tick_sched_timer, S:01,
tick_nohz_restart_sched_tick, swapper/0
 # expires at 1015003709500000 nsecs [in 4020164 nsecs]
 #1: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, do_nanosleep,
xosview.bin/1922
 # expires at 1015003770033068 nsecs [in 64553232 nsecs]
 #2: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/7131
 # expires at 1015003800127636 nsecs [in 94647800 nsecs]
 #3: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/10001
 # expires at 1015003962537676 nsecs [in 257057840 nsecs]
 #4: <ffff88005c0a9d68>, sched_rt_period_timer, S:01,
__enqueue_rt_entity, swapper/1
 # expires at 1015004000000000 nsecs [in 294520164 nsecs]
 #5: <ffff88005c0a9d68>, hrtimer_wakeup, S:01, futex_wait,
thunderbird-bin/5577
 # expires at 1015004228745798 nsecs [in 523265962 nsecs]
  .expires_next   : 1015003709500000 nsecs
  .hres_active    : 1
  .nr_events      : 1260974322
  .nohz_mode      : 2
  .idle_tick      : 1015003381500000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4548643141
  .idle_calls     : 1146068541
  .idle_sleeps    : 910709854
  .idle_entrytime : 1015003705511942 nsecs
  .idle_waketime  : 1015003380002663 nsecs
  .idle_exittime  : 1015003408508278 nsecs
  .idle_sleeptime : 782489673980131 nsecs
  .last_jiffies   : 4548643222
  .next_jiffies   : 4548643241
  .idle_expires   : 1015003660000000 nsecs
jiffies: 4548643222


Tick Device: mode:     1
Clock Event Device: hpet
 max_delta_ns:   85899346200
 min_delta_ns:   5000
 mult:           107374182
 shift:          32
 mode:           3
 next_event:     9223372036854775807 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:       hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000


Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   669608197
 min_delta_ns:   1197
 mult:           53805782
 shift:          32
 mode:           3
 next_event:     1015003708000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   669608197
 min_delta_ns:   1197
 mult:           53805782
 shift:          32
 mode:           3
 next_event:     1015003708490644 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   669608197
 min_delta_ns:   1197
 mult:           53805782
 shift:          32
 mode:           3
 next_event:     1015003705919194 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   669608197
 min_delta_ns:   1197
 mult:           53805782
 shift:          32
 mode:           3
 next_event:     1015003709500000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 10:41                     ` Alexander Graf
@ 2009-01-20 11:20                       ` Sheng Yang
  2009-01-20 12:09                         ` Alexander Graf
  2009-01-20 13:43                       ` Sheng Yang
  1 sibling, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-20 11:20 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Marcelo Tosatti, kvm, avi, Kevin Wolf

On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>> +	 * Since reinjection is not rate-limited, use the delay
> >>> + 	 * to inject the last interrupt as an estimate.
> >>> + 	 */
> >>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>> +		remaining = apic->timer.injection_delay;
> >>> +		if (ktime_to_ns(remaining) > apic->timer.period)
> >>> +			remaining = ns_to_ktime(apic->timer.period);
> >>> +        } else
> >>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>
> >> A little doubt...
> >>
> >> A: time_fire
> >> B: intr_post
> >> C: read TMCCT
> >>
> >> The sequence can be ABC or ACB.
> >>
> >> injection_delay = time(B) - time(A)
> >>
> >> So it didn't count time from read TMCCT... And guest get interrupt at
> >> time(B), not quite understand why time(B) - time(A) matters here...
> >
> > Its an estimate of the delay it takes to inject an interrupt to the
> > guest once fired. Its only used if there have been accumulated ones, so
> > ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >
> >> I think the reasonable here means, this interval is usable later after
> >> the accumulated interrupts are injected. From this point of view, I
> >> think current solution is reasonable. It just assume the delayed
> >> interrupts have been injected.
> >>
> >> However, seriously, any value here is wrong, no elegant one.
> >
> > I agree.
> >
> >> But I still prefer to the current solution...
> >
> > Why? The proposed version is smaller and simpler than the current
> > one IMO, and also not vulnerable to whatever bug is causing now <
> > last_update.
> >
> > And more precise, since the current scheme uses interrupt injection time
> > as if it was "count-down restart" time, which is not true.
> >
> >> And here is not the really problem for now I think. The current
> >> mechanism is mostly OK, but where is the bug... We have either have a
> >> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
> >> it worth a try? Anyway, it would return a buggy result if we have
> >> pending interrupts...
> >
> > The overflow calculation is not necessary as discussed. Alexander, can
> > you please try the following? Sheng, do you have any other suggestion to
> > test?
> >
> > /proc/timer_list output of the host when ESX is running too.
>
> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> after the issue occured. I left the warning message in that I put there
> in case the value is too high:
>
>         counter_passed = div64_u64(ktime_to_ns(passed),
>                                    (APIC_BUS_CYCLE_NS *
> apic->timer.divide_count));
>
> +       if (counter_passed > 0x7f00000000000000) {
> +               /* If we're in here we probably encountered a bug! */
> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> ktime_to_ns(passed), passed, now.tv64, apic
> ->timer.last_update.tv64);
> +       }
> +
>         if (counter_passed > tmcct) {
>                 if (unlikely(!apic_lvtt_period(apic))) {
>                         /* one-shot timers stick at 0 until reset */
>
>
>
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> last_update = 1014860818546063 now = 1014860818426085
> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>

Thanks, Alex! 

When update time of last_update is 0x39B02B29428E5, and the start_apic_timer 
at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like 
a window here?:

> static void start_apic_timer(struct kvm_lapic *apic)
> {
> 	ktime_t now = apic->timer.dev.base->get_time();
>
> 	apic->timer.last_update = now;

[window?]
>
> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
> 	atomic_set(&apic->timer.pending, 0);
>
> 	if (!apic->timer.period)
> 		return;
>
> 	hrtimer_start(&apic->timer.dev,
> 		      ktime_add_ns(now, apic->timer.period),
> 		      HRTIMER_MODE_ABS);
>
> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
> 			   PRIx64 ", "
> 			   "timer initial count 0x%x, period %lldns, "
> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
> 			   apic_get_reg(apic, APIC_TMICT),
> 			   apic->timer.period,
> 			   ktime_to_ns(ktime_add_ns(now,
> 					apic->timer.period)));
> }

Looks like due to the period is reduced, but the last_update use old value and 
got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of 
vcpu B happened at vcpu A? Both seems unreasonable...

Alex, can you help to add another line above to confirm the problem? e.g.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..15f8ed5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
 {
        ktime_t now = apic->timer.dev.base->get_time();

+       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
+
        apic->timer.last_update = now;

        apic->timer.period = apic_get_reg(apic, APIC_TMICT) *

Also add a similar one at the beginning of print_last_update_record().

Thanks! (And thanks for Marcelo's excellent debug patch! :) )

-- 
regards
Yang, Sheng


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 11:20                       ` Sheng Yang
@ 2009-01-20 12:09                         ` Alexander Graf
  2009-01-20 12:30                           ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2009-01-20 12:09 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, kvm, avi, Kevin Wolf

Sheng Yang wrote:
> On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
>>>       
>>>>> +	 * Since reinjection is not rate-limited, use the delay
>>>>> + 	 * to inject the last interrupt as an estimate.
>>>>> + 	 */
>>>>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
>>>>> +		remaining = apic->timer.injection_delay;
>>>>> +		if (ktime_to_ns(remaining) > apic->timer.period)
>>>>> +			remaining = ns_to_ktime(apic->timer.period);
>>>>> +        } else
>>>>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
>>>>>           
>>>> A little doubt...
>>>>
>>>> A: time_fire
>>>> B: intr_post
>>>> C: read TMCCT
>>>>
>>>> The sequence can be ABC or ACB.
>>>>
>>>> injection_delay = time(B) - time(A)
>>>>
>>>> So it didn't count time from read TMCCT... And guest get interrupt at
>>>> time(B), not quite understand why time(B) - time(A) matters here...
>>>>         
>>> Its an estimate of the delay it takes to inject an interrupt to the
>>> guest once fired. Its only used if there have been accumulated ones, so
>>> ACB sequence with pending=0 will use hrtimer_expires_remaining().
>>>
>>>       
>>>> I think the reasonable here means, this interval is usable later after
>>>> the accumulated interrupts are injected. From this point of view, I
>>>> think current solution is reasonable. It just assume the delayed
>>>> interrupts have been injected.
>>>>
>>>> However, seriously, any value here is wrong, no elegant one.
>>>>         
>>> I agree.
>>>
>>>       
>>>> But I still prefer to the current solution...
>>>>         
>>> Why? The proposed version is smaller and simpler than the current
>>> one IMO, and also not vulnerable to whatever bug is causing now <
>>> last_update.
>>>
>>> And more precise, since the current scheme uses interrupt injection time
>>> as if it was "count-down restart" time, which is not true.
>>>
>>>       
>>>> And here is not the really problem for now I think. The current
>>>> mechanism is mostly OK, but where is the bug... We have either have a
>>>> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
>>>> it worth a try? Anyway, it would return a buggy result if we have
>>>> pending interrupts...
>>>>         
>>> The overflow calculation is not necessary as discussed. Alexander, can
>>> you please try the following? Sheng, do you have any other suggestion to
>>> test?
>>>
>>> /proc/timer_list output of the host when ESX is running too.
>>>       
>> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
>> after the issue occured. I left the warning message in that I put there
>> in case the value is too high:
>>
>>         counter_passed = div64_u64(ktime_to_ns(passed),
>>                                    (APIC_BUS_CYCLE_NS *
>> apic->timer.divide_count));
>>
>> +       if (counter_passed > 0x7f00000000000000) {
>> +               /* If we're in here we probably encountered a bug! */
>> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
>> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
>> ktime_to_ns(passed), passed, now.tv64, apic
>> ->timer.last_update.tv64);
>> +       }
>> +
>>         if (counter_passed > tmcct) {
>>                 if (unlikely(!apic_lvtt_period(apic))) {
>>                         /* one-shot timers stick at 0 until reset */
>>
>>
>>
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
>> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
>> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
>> last_update = 1014860818546063 now = 1014860818426085
>> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
>> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
>> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
>> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
>> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
>> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
>> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
>> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
>> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
>> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
>> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
>> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
>> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
>> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
>> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
>> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
>> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
>> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
>> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
>> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
>> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
>> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
>> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
>> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
>> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
>> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
>> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
>> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
>> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
>> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
>> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
>> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
>> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
>> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
>> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
>> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
>> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
>> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
>> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
>> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
>> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
>> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
>> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
>> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
>> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
>> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
>> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
>> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
>> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
>> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
>> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
>> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
>> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
>> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>>
>>     
>
> Thanks, Alex! 
>
> When update time of last_update is 0x39B02B29428E5, and the start_apic_timer 
> at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like 
> a window here?:
>
>   
>> static void start_apic_timer(struct kvm_lapic *apic)
>> {
>> 	ktime_t now = apic->timer.dev.base->get_time();
>>
>> 	apic->timer.last_update = now;
>>     
>
> [window?]
>   
>> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>> 	atomic_set(&apic->timer.pending, 0);
>>
>> 	if (!apic->timer.period)
>> 		return;
>>
>> 	hrtimer_start(&apic->timer.dev,
>> 		      ktime_add_ns(now, apic->timer.period),
>> 		      HRTIMER_MODE_ABS);
>>
>> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
>> 			   PRIx64 ", "
>> 			   "timer initial count 0x%x, period %lldns, "
>> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
>> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
>> 			   apic_get_reg(apic, APIC_TMICT),
>> 			   apic->timer.period,
>> 			   ktime_to_ns(ktime_add_ns(now,
>> 					apic->timer.period)));
>> }
>>     
>
> Looks like due to the period is reduced, but the last_update use old value and 
> got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of 
> vcpu B happened at vcpu A? Both seems unreasonable...
>
> Alex, can you help to add another line above to confirm the problem? e.g.
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..15f8ed5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  {
>         ktime_t now = apic->timer.dev.base->get_time();
>
> +       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
> +
>         apic->timer.last_update = now;
>
>         apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>
> Also add a similar one at the beginning of print_last_update_record().
>
> Thanks! (And thanks for Marcelo's excellent debug patch! :) )
>
>   

The VMware ESX VM is UP, so I don't think we'll get anything from the
vcpu_id apart from 0 :-). Nevertheless here's the dmesg output:

enter start_apic_timer! vcpu 0
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer
initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26.
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer
initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62.
last_update = 1020361050214818 now = 1020361050178604
enter print_last_update_record! vcpu 0
rec[5] when=1020361043617361 last_update=1020361043746818 pend=0
rec[4] when=1020361050088268 last_update=1020361050214818 pend=0
rec[3] when=1020361049970143 last_update=1020361050082818 pend=0
rec[2] when=1020361049822220 last_update=1020361049950818 pend=0
rec[1] when=1020361049690023 last_update=1020361049818818 pend=0
rec[0] when=1020361049559209 last_update=1020361049686818 pend=0
rec[49] when=1020361049427469 last_update=1020361049554818 pend=0
rec[48] when=1020361049294815 last_update=1020361049422818 pend=0
rec[47] when=1020361049162401 last_update=1020361049290818 pend=0
rec[46] when=1020361049030431 last_update=1020361049158818 pend=0
rec[45] when=1020361048898352 last_update=1020361049026818 pend=0
rec[44] when=1020361048765413 last_update=1020361048894818 pend=0
rec[43] when=1020361048634310 last_update=1020361048762818 pend=0
rec[42] when=1020361048501343 last_update=1020361048630818 pend=0
rec[41] when=1020361048369388 last_update=1020361048498818 pend=0
rec[40] when=1020361048241616 last_update=1020361048366818 pend=0
rec[39] when=1020361048108017 last_update=1020361048234818 pend=0
rec[38] when=1020361047973826 last_update=1020361048102818 pend=0
rec[37] when=1020361047841371 last_update=1020361047970818 pend=0
rec[36] when=1020361047709322 last_update=1020361047838818 pend=0
rec[35] when=1020361047577366 last_update=1020361047706818 pend=0
rec[34] when=1020361047445316 last_update=1020361047574818 pend=0
rec[33] when=1020361047313423 last_update=1020361047442818 pend=0
rec[32] when=1020361047181310 last_update=1020361047310818 pend=0
rec[31] when=1020361047049990 last_update=1020361047178818 pend=0
rec[30] when=1020361046917375 last_update=1020361047046818 pend=0
rec[29] when=1020361046785332 last_update=1020361046914818 pend=0
rec[28] when=1020361046653374 last_update=1020361046782818 pend=0
rec[27] when=1020361046521325 last_update=1020361046650818 pend=0
rec[26] when=1020361046389374 last_update=1020361046518818 pend=0
rec[25] when=1020361046257989 last_update=1020361046386818 pend=0
rec[24] when=1020361046125335 last_update=1020361046254818 pend=0
rec[23] when=1020361045993393 last_update=1020361046122818 pend=0
rec[22] when=1020361045861330 last_update=1020361045990818 pend=0
rec[21] when=1020361045729389 last_update=1020361045858818 pend=0
rec[20] when=1020361045602604 last_update=1020361045726818 pend=0
rec[19] when=1020361045471032 last_update=1020361045594818 pend=0
rec[18] when=1020361045333385 last_update=1020361045462818 pend=0
rec[17] when=1020361045201397 last_update=1020361045330818 pend=0
rec[16] when=1020361045074634 last_update=1020361045198818 pend=0
rec[15] when=1020361044941707 last_update=1020361045066818 pend=0
rec[14] when=1020361044805373 last_update=1020361044934818 pend=0
rec[13] when=1020361044673417 last_update=1020361044802818 pend=0
rec[12] when=1020361044547848 last_update=1020361044670818 pend=0
rec[11] when=1020361044409439 last_update=1020361044538818 pend=0
rec[10] when=1020361044277316 last_update=1020361044406818 pend=0
rec[9] when=1020361044150456 last_update=1020361044274818 pend=0
rec[8] when=1020361044013204 last_update=1020361044142818 pend=0
rec[7] when=1020361043881203 last_update=1020361044010818 pend=0
rec[6] when=1020361043749913 last_update=1020361043878818 pend=0
LAPIC: Too high counter_passed value: 0x7fffffffffff7289 |
0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2)
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer
initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 12:09                         ` Alexander Graf
@ 2009-01-20 12:30                           ` Sheng Yang
  0 siblings, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2009-01-20 12:30 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Marcelo Tosatti, kvm, avi, Kevin Wolf

On Tuesday 20 January 2009 20:09:23 Alexander Graf wrote:
> Sheng Yang wrote:
> > On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>>>> +	 * Since reinjection is not rate-limited, use the delay
> >>>>> + 	 * to inject the last interrupt as an estimate.
> >>>>> + 	 */
> >>>>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>>>> +		remaining = apic->timer.injection_delay;
> >>>>> +		if (ktime_to_ns(remaining) > apic->timer.period)
> >>>>> +			remaining = ns_to_ktime(apic->timer.period);
> >>>>> +        } else
> >>>>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>>>
> >>>> A little doubt...
> >>>>
> >>>> A: time_fire
> >>>> B: intr_post
> >>>> C: read TMCCT
> >>>>
> >>>> The sequence can be ABC or ACB.
> >>>>
> >>>> injection_delay = time(B) - time(A)
> >>>>
> >>>> So it didn't count time from read TMCCT... And guest get interrupt at
> >>>> time(B), not quite understand why time(B) - time(A) matters here...
> >>>
> >>> Its an estimate of the delay it takes to inject an interrupt to the
> >>> guest once fired. Its only used if there have been accumulated ones, so
> >>> ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >>>
> >>>> I think the reasonable here means, this interval is usable later after
> >>>> the accumulated interrupts are injected. From this point of view, I
> >>>> think current solution is reasonable. It just assume the delayed
> >>>> interrupts have been injected.
> >>>>
> >>>> However, seriously, any value here is wrong, no elegant one.
> >>>
> >>> I agree.
> >>>
> >>>> But I still prefer to the current solution...
> >>>
> >>> Why? The proposed version is smaller and simpler than the current
> >>> one IMO, and also not vulnerable to whatever bug is causing now <
> >>> last_update.
> >>>
> >>> And more precise, since the current scheme uses interrupt injection
> >>> time as if it was "count-down restart" time, which is not true.
> >>>
> >>>> And here is not the really problem for now I think. The current
> >>>> mechanism is mostly OK, but where is the bug... We have either have a
> >>>> simple fix(e.g. if now < last_update, then return 0) or dig into it.
> >>>> Did it worth a try? Anyway, it would return a buggy result if we have
> >>>> pending interrupts...
> >>>
> >>> The overflow calculation is not necessary as discussed. Alexander, can
> >>> you please try the following? Sheng, do you have any other suggestion
> >>> to test?
> >>>
> >>> /proc/timer_list output of the host when ESX is running too.
> >>
> >> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> >> after the issue occured. I left the warning message in that I put there
> >> in case the value is too high:
> >>
> >>         counter_passed = div64_u64(ktime_to_ns(passed),
> >>                                    (APIC_BUS_CYCLE_NS *
> >> apic->timer.divide_count));
> >>
> >> +       if (counter_passed > 0x7f00000000000000) {
> >> +               /* If we're in here we probably encountered a bug! */
> >> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> >> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> >> ktime_to_ns(passed), passed, now.tv64, apic
> >> ->timer.last_update.tv64);
> >> +       }
> >> +
> >>         if (counter_passed > tmcct) {
> >>                 if (unlikely(!apic_lvtt_period(apic))) {
> >>                         /* one-shot timers stick at 0 until reset */
> >>
> >>
> >>
> >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> >> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> >> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> >> last_update = 1014860818546063 now = 1014860818426085
> >> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> >> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> >> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> >> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> >> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> >> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> >> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> >> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> >> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> >> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> >> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> >> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> >> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> >> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> >> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> >> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> >> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> >> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> >> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> >> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> >> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> >> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> >> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> >> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> >> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> >> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> >> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> >> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> >> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> >> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> >> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> >> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> >> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> >> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> >> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> >> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> >> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> >> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> >> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> >> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> >> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> >> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> >> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> >> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> >> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> >> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> >> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> >> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> >> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> >> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> >> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> >> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5
> >> (0x39b02b295fd8f) start_apic_timer: bus cycle is 1ns, now
> >> 0x00039b02b2998ada, timer initial count 0x8400, period 33792ns, expire @
> >> 0x00039b02b29a0eda.
> >
> > Thanks, Alex!
> >
> > When update time of last_update is 0x39B02B29428E5, and the
> > start_apic_timer at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757,
> > very close. Looks like
> >
> > a window here?:
> >> static void start_apic_timer(struct kvm_lapic *apic)
> >> {
> >> 	ktime_t now = apic->timer.dev.base->get_time();
> >>
> >> 	apic->timer.last_update = now;
> >
> > [window?]
> >
> >> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> >> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
> >> 	atomic_set(&apic->timer.pending, 0);
> >>
> >> 	if (!apic->timer.period)
> >> 		return;
> >>
> >> 	hrtimer_start(&apic->timer.dev,
> >> 		      ktime_add_ns(now, apic->timer.period),
> >> 		      HRTIMER_MODE_ABS);
> >>
> >> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
> >> 			   PRIx64 ", "
> >> 			   "timer initial count 0x%x, period %lldns, "
> >> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
> >> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
> >> 			   apic_get_reg(apic, APIC_TMICT),
> >> 			   apic->timer.period,
> >> 			   ktime_to_ns(ktime_add_ns(now,
> >> 					apic->timer.period)));
> >> }
> >
> > Looks like due to the period is reduced, but the last_update use old
> > value and got just updated... But guest vcpu B program vcpu A's lapic? Or
> > intr_post of vcpu B happened at vcpu A? Both seems unreasonable...
> >
> > Alex, can you help to add another line above to confirm the problem? e.g.
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index afac68c..15f8ed5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
> >  {
> >         ktime_t now = apic->timer.dev.base->get_time();
> >
> > +       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
> > +
> >         apic->timer.last_update = now;
> >
> >         apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> >
> > Also add a similar one at the beginning of print_last_update_record().
> >
> > Thanks! (And thanks for Marcelo's excellent debug patch! :) )
>
> The VMware ESX VM is UP, so I don't think we'll get anything from the
> vcpu_id apart from 0 :-). Nevertheless here's the dmesg output:
>
> enter start_apic_timer! vcpu 0
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26.
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62.
> last_update = 1020361050214818 now = 1020361050178604
> enter print_last_update_record! vcpu 0
> rec[5] when=1020361043617361 last_update=1020361043746818 pend=0
> rec[4] when=1020361050088268 last_update=1020361050214818 pend=0
> rec[3] when=1020361049970143 last_update=1020361050082818 pend=0
> rec[2] when=1020361049822220 last_update=1020361049950818 pend=0
> rec[1] when=1020361049690023 last_update=1020361049818818 pend=0
> rec[0] when=1020361049559209 last_update=1020361049686818 pend=0
> rec[49] when=1020361049427469 last_update=1020361049554818 pend=0
> rec[48] when=1020361049294815 last_update=1020361049422818 pend=0
> rec[47] when=1020361049162401 last_update=1020361049290818 pend=0
> rec[46] when=1020361049030431 last_update=1020361049158818 pend=0
> rec[45] when=1020361048898352 last_update=1020361049026818 pend=0
> rec[44] when=1020361048765413 last_update=1020361048894818 pend=0
> rec[43] when=1020361048634310 last_update=1020361048762818 pend=0
> rec[42] when=1020361048501343 last_update=1020361048630818 pend=0
> rec[41] when=1020361048369388 last_update=1020361048498818 pend=0
> rec[40] when=1020361048241616 last_update=1020361048366818 pend=0
> rec[39] when=1020361048108017 last_update=1020361048234818 pend=0
> rec[38] when=1020361047973826 last_update=1020361048102818 pend=0
> rec[37] when=1020361047841371 last_update=1020361047970818 pend=0
> rec[36] when=1020361047709322 last_update=1020361047838818 pend=0
> rec[35] when=1020361047577366 last_update=1020361047706818 pend=0
> rec[34] when=1020361047445316 last_update=1020361047574818 pend=0
> rec[33] when=1020361047313423 last_update=1020361047442818 pend=0
> rec[32] when=1020361047181310 last_update=1020361047310818 pend=0
> rec[31] when=1020361047049990 last_update=1020361047178818 pend=0
> rec[30] when=1020361046917375 last_update=1020361047046818 pend=0
> rec[29] when=1020361046785332 last_update=1020361046914818 pend=0
> rec[28] when=1020361046653374 last_update=1020361046782818 pend=0
> rec[27] when=1020361046521325 last_update=1020361046650818 pend=0
> rec[26] when=1020361046389374 last_update=1020361046518818 pend=0
> rec[25] when=1020361046257989 last_update=1020361046386818 pend=0
> rec[24] when=1020361046125335 last_update=1020361046254818 pend=0
> rec[23] when=1020361045993393 last_update=1020361046122818 pend=0
> rec[22] when=1020361045861330 last_update=1020361045990818 pend=0
> rec[21] when=1020361045729389 last_update=1020361045858818 pend=0
> rec[20] when=1020361045602604 last_update=1020361045726818 pend=0
> rec[19] when=1020361045471032 last_update=1020361045594818 pend=0
> rec[18] when=1020361045333385 last_update=1020361045462818 pend=0
> rec[17] when=1020361045201397 last_update=1020361045330818 pend=0
> rec[16] when=1020361045074634 last_update=1020361045198818 pend=0
> rec[15] when=1020361044941707 last_update=1020361045066818 pend=0
> rec[14] when=1020361044805373 last_update=1020361044934818 pend=0
> rec[13] when=1020361044673417 last_update=1020361044802818 pend=0
> rec[12] when=1020361044547848 last_update=1020361044670818 pend=0
> rec[11] when=1020361044409439 last_update=1020361044538818 pend=0
> rec[10] when=1020361044277316 last_update=1020361044406818 pend=0
> rec[9] when=1020361044150456 last_update=1020361044274818 pend=0
> rec[8] when=1020361044013204 last_update=1020361044142818 pend=0
> rec[7] when=1020361043881203 last_update=1020361044010818 pend=0
> rec[6] when=1020361043749913 last_update=1020361043878818 pend=0
> LAPIC: Too high counter_passed value: 0x7fffffffffff7289 |
> 0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2)
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer
> initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454.

Oh, you don't need to. :) I thought you are running on MP system... OK, seems 
it would become more easier. :)

Checked the data again, seems last_update is always ahead of "when", at about 
one period. That's not desired. Last_update should delay about one period...

Continue to check the code...

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 10:41                     ` Alexander Graf
  2009-01-20 11:20                       ` Sheng Yang
@ 2009-01-20 13:43                       ` Sheng Yang
  2009-01-20 18:51                         ` Marcelo Tosatti
  1 sibling, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-20 13:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Alexander Graf, avi, Kevin Wolf

On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>> +	 * Since reinjection is not rate-limited, use the delay
> >>> + 	 * to inject the last interrupt as an estimate.
> >>> + 	 */
> >>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>> +		remaining = apic->timer.injection_delay;
> >>> +		if (ktime_to_ns(remaining) > apic->timer.period)
> >>> +			remaining = ns_to_ktime(apic->timer.period);
> >>> +        } else
> >>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>
> >> A little doubt...
> >>
> >> A: time_fire
> >> B: intr_post
> >> C: read TMCCT
> >>
> >> The sequence can be ABC or ACB.
> >>
> >> injection_delay = time(B) - time(A)
> >>
> >> So it didn't count time from read TMCCT... And guest get interrupt at
> >> time(B), not quite understand why time(B) - time(A) matters here...
> >
> > Its an estimate of the delay it takes to inject an interrupt to the
> > guest once fired. Its only used if there have been accumulated ones, so
> > ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >
> >> I think the reasonable here means, this interval is usable later after
> >> the accumulated interrupts are injected. From this point of view, I
> >> think current solution is reasonable. It just assume the delayed
> >> interrupts have been injected.
> >>
> >> However, seriously, any value here is wrong, no elegant one.
> >
> > I agree.
> >
> >> But I still prefer to the current solution...
> >
> > Why? The proposed version is smaller and simpler than the current
> > one IMO, and also not vulnerable to whatever bug is causing now <
> > last_update.
> >
> > And more precise, since the current scheme uses interrupt injection time
> > as if it was "count-down restart" time, which is not true.
> >
> >> And here is not the really problem for now I think. The current
> >> mechanism is mostly OK, but where is the bug... We have either have a
> >> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
> >> it worth a try? Anyway, it would return a buggy result if we have
> >> pending interrupts...
> >
> > The overflow calculation is not necessary as discussed. Alexander, can
> > you please try the following? Sheng, do you have any other suggestion to
> > test?
> >
> > /proc/timer_list output of the host when ESX is running too.
>
> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> after the issue occured. I left the warning message in that I put there
> in case the value is too high:
>
>         counter_passed = div64_u64(ktime_to_ns(passed),
>                                    (APIC_BUS_CYCLE_NS *
> apic->timer.divide_count));
>
> +       if (counter_passed > 0x7f00000000000000) {
> +               /* If we're in here we probably encountered a bug! */
> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> ktime_to_ns(passed), passed, now.tv64, apic
> ->timer.last_update.tv64);
> +       }
> +
>         if (counter_passed > tmcct) {
>                 if (unlikely(!apic_lvtt_period(apic))) {
>                         /* one-shot timers stick at 0 until reset */
>
>
>
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> last_update = 1014860818546063 now = 1014860818426085
> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>
Marcelo, I realize some interesting things.

In fact, on my machine, when I measured the delta of now() and last_update at 
kvm_apic_timer_intr_post(), the delta was bigger and bigger...(and now() is 
always bigger for me, so still no clue for why data can be like above, 
last_update always ahead of when for about one period...).

Then I found something not good in original design - it ignored the interval 
between time fire and injection, so we got:

last_update = now() + n * period;

And the time we update last_update:

time = now() + n * period + n * interval.

So last_update time is more and more inaccurate... Though it was revised by 
tmcct function, it's still not a good way to go.

Then I understand your purpose more.

+       if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
+               remaining = apic->timer.injection_delay;
+               if (ktime_to_ns(remaining) > apic->timer.period)
+                       remaining = ns_to_ktime(apic->timer.period);
+        } else
+               remaining = hrtimer_expires_remaining(&apic->timer.dev);

And about your patch, how about take interval between intr_post() and read 
tmcct in to account as well? That can keep the consistent with 
hrtimer_get_remaining() in the read tmcct. And I think if remaining > period, 
remaining = remain % period maybe more reasonable here.

How do you think?

-- 
regards
Yang, Sheng




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 13:43                       ` Sheng Yang
@ 2009-01-20 18:51                         ` Marcelo Tosatti
  2009-01-21  2:40                           ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-20 18:51 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm, Alexander Graf, avi, Kevin Wolf

Hi Sheng,

On Tue, Jan 20, 2009 at 09:43:15PM +0800, Sheng Yang wrote:
> Marcelo, I realize some interesting things.
> 
> In fact, on my machine, when I measured the delta of now() and last_update at 
> kvm_apic_timer_intr_post(), the delta was bigger and bigger...(and now() is 
> always bigger for me, so still no clue for why data can be like above, 
> last_update always ahead of when for about one period...).
> 
> Then I found something not good in original design - it ignored the interval 
> between time fire and injection, so we got:
> 
> last_update = now() + n * period;
> 
> And the time we update last_update:
> 
> time = now() + n * period + n * interval.
> 
> So last_update time is more and more inaccurate... Though it was revised by 
> tmcct function, it's still not a good way to go.
> 
> Then I understand your purpose more.
> 
> +       if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> +               remaining = apic->timer.injection_delay;
> +               if (ktime_to_ns(remaining) > apic->timer.period)
> +                       remaining = ns_to_ktime(apic->timer.period);
> +        } else
> +               remaining = hrtimer_expires_remaining(&apic->timer.dev);
> 
> And about your patch, how about take interval between intr_post() and read 
> tmcct in to account as well? 

You mean to sum up interval between intr_post() and read tmcct into
injection_delay? 

I don't get it. Can you be more descriptive please?

> That can keep the consistent with hrtimer_get_remaining() in the read 
> tmcct.

> And I think if remaining > period, 
> remaining = remain % period maybe more reasonable here.

Yes, thats better.

> How do you think?
> 
> -- 
> regards
> Yang, Sheng
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-20 18:51                         ` Marcelo Tosatti
@ 2009-01-21  2:40                           ` Sheng Yang
  2009-01-21  4:23                             ` Marcelo Tosatti
  0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-21  2:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Alexander Graf, avi, Kevin Wolf

On Wednesday 21 January 2009 02:51:39 Marcelo Tosatti wrote:
> Hi Sheng,
>
> On Tue, Jan 20, 2009 at 09:43:15PM +0800, Sheng Yang wrote:
> > Marcelo, I realize some interesting things.
> >
> > In fact, on my machine, when I measured the delta of now() and
> > last_update at kvm_apic_timer_intr_post(), the delta was bigger and
> > bigger...(and now() is always bigger for me, so still no clue for why
> > data can be like above, last_update always ahead of when for about one
> > period...).
> >
> > Then I found something not good in original design - it ignored the
> > interval between time fire and injection, so we got:
> >
> > last_update = now() + n * period;
> >
> > And the time we update last_update:
> >
> > time = now() + n * period + n * interval.
> >
> > So last_update time is more and more inaccurate... Though it was revised
> > by tmcct function, it's still not a good way to go.
> >
> > Then I understand your purpose more.
> >
> > +       if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> > +               remaining = apic->timer.injection_delay;
> > +               if (ktime_to_ns(remaining) > apic->timer.period)
> > +                       remaining = ns_to_ktime(apic->timer.period);
> > +        } else
> > +               remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >
> > And about your patch, how about take interval between intr_post() and
> > read tmcct in to account as well?
>
> You mean to sum up interval between intr_post() and read tmcct into
> injection_delay?
>
> I don't get it. Can you be more descriptive please?

Mind continue to change...

A: time_fire
B: intr_post
C: read TMCCT

I think the sequence can be AABC or AACB(I just consider AABC last night...). 
We have determined to return A->C as TMCCT, and the interval between different 
A is period which can be ignored. So how about simply return 
hrtimer_expires_remaining() for all tmcct reading including pending ones?

(And sorry for I still can't get why you return injection delay for TMCCT...)

-- 
regards
Yang, Sheng



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21  2:40                           ` Sheng Yang
@ 2009-01-21  4:23                             ` Marcelo Tosatti
  2009-01-21  5:11                               ` Sheng Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-21  4:23 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm, Alexander Graf, avi, Kevin Wolf

On Wed, Jan 21, 2009 at 10:40:49AM +0800, Sheng Yang wrote:
> Mind continue to change...
> 
> A: time_fire
> B: intr_post
> C: read TMCCT
> 
> I think the sequence can be AABC or AACB(I just consider AABC last night...). 
> We have determined to return A->C as TMCCT, and the interval between different 
> A is period which can be ignored. So how about simply return 
> hrtimer_expires_remaining() for all tmcct reading including pending ones?
> 
> (And sorry for I still can't get why you return injection delay for TMCCT...)

You're right. How about this.


diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index c019b8e..cf17ed5 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -87,13 +87,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
 
-void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
-{
-	kvm_apic_timer_intr_post(vcpu, vec);
-	/* TODO: PIT, RTC etc. */
-}
-EXPORT_SYMBOL_GPL(kvm_timer_intr_post);
-
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
 {
 	__kvm_migrate_apic_timer(vcpu);
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 2bf32a0..82579ee 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -89,7 +89,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
 
 void kvm_pic_reset(struct kvm_kpic_state *s);
 
-void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..b66f71e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -511,52 +511,22 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
 {
-	u64 counter_passed;
-	ktime_t passed, now;
+	ktime_t remaining;
+	s64 ns;
 	u32 tmcct;
 
 	ASSERT(apic != NULL);
 
-	now = apic->timer.dev.base->get_time();
-	tmcct = apic_get_reg(apic, APIC_TMICT);
-
 	/* if initial count is 0, current count should also be 0 */
-	if (tmcct == 0)
+	if (apic_get_reg(apic, APIC_TMICT) == 0)
 		return 0;
 
-	if (unlikely(ktime_to_ns(now) <=
-		ktime_to_ns(apic->timer.last_update))) {
-		/* Wrap around */
-		passed = ktime_add(( {
-				    (ktime_t) {
-				    .tv64 = KTIME_MAX -
-				    (apic->timer.last_update).tv64}; }
-				   ), now);
-		apic_debug("time elapsed\n");
-	} else
-		passed = ktime_sub(now, apic->timer.last_update);
-
-	counter_passed = div64_u64(ktime_to_ns(passed),
-				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
-
-	if (counter_passed > tmcct) {
-		if (unlikely(!apic_lvtt_period(apic))) {
-			/* one-shot timers stick at 0 until reset */
-			tmcct = 0;
-		} else {
-			/*
-			 * periodic timers reset to APIC_TMICT when they
-			 * hit 0. The while loop simulates this happening N
-			 * times. (counter_passed %= tmcct) would also work,
-			 * but might be slower or not work on 32-bit??
-			 */
-			while (counter_passed > tmcct)
-				counter_passed -= tmcct;
-			tmcct -= counter_passed;
-		}
-	} else {
-		tmcct -= counter_passed;
-	}
+	remaining = hrtimer_expires_remaining(&apic->timer.dev);
+	if (remaining.tv64 < 0)
+		remaining = ktime_set(0, 0);
+
+	ns = ktime_to_ns(remaining) % apic->timer.period;
+	tmcct = div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
 
 	return tmcct;
 }
@@ -653,8 +623,6 @@ static void start_apic_timer(struct kvm_lapic *apic)
 {
 	ktime_t now = apic->timer.dev.base->get_time();
 
-	apic->timer.last_update = now;
-
 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
 	atomic_set(&apic->timer.pending, 0);
@@ -1110,16 +1078,6 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 	}
 }
 
-void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
-{
-	struct kvm_lapic *apic = vcpu->arch.apic;
-
-	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
-		apic->timer.last_update = ktime_add_ns(
-				apic->timer.last_update,
-				apic->timer.period);
-}
-
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 {
 	int vector = kvm_apic_has_interrupt(vcpu);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8185888..45ab6ee 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,7 +12,6 @@ struct kvm_lapic {
 		atomic_t pending;
 		s64 period;	/* unit: ns */
 		u32 divide_count;
-		ktime_t last_update;
 		struct hrtimer dev;
 	} timer;
 	struct kvm_vcpu *vcpu;
@@ -42,7 +41,6 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu);
 int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
-void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 14e517e..db5021b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2305,7 +2305,6 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	/* Okay, we can deliver the interrupt: grab it and update PIC state. */
 	intr_vector = kvm_cpu_get_interrupt(vcpu);
 	svm_inject_irq(svm, intr_vector);
-	kvm_timer_intr_post(vcpu, intr_vector);
 out:
 	update_cr8_intercept(vcpu);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b56d21..25aaf11 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3377,7 +3377,6 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
 	}
 	if (vcpu->arch.interrupt.pending) {
 		vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
-		kvm_timer_intr_post(vcpu, vcpu->arch.interrupt.nr);
 		if (kvm_cpu_has_interrupt(vcpu))
 			enable_irq_window(vcpu);
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21  4:23                             ` Marcelo Tosatti
@ 2009-01-21  5:11                               ` Sheng Yang
  2009-01-21 15:07                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2009-01-21  5:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Alexander Graf, avi, Kevin Wolf

On Wednesday 21 January 2009 12:23:30 Marcelo Tosatti wrote:
> On Wed, Jan 21, 2009 at 10:40:49AM +0800, Sheng Yang wrote:
> > Mind continue to change...
> >
> > A: time_fire
> > B: intr_post
> > C: read TMCCT
> >
> > I think the sequence can be AABC or AACB(I just consider AABC last
> > night...). We have determined to return A->C as TMCCT, and the interval
> > between different A is period which can be ignored. So how about simply
> > return
> > hrtimer_expires_remaining() for all tmcct reading including pending ones?
> >
> > (And sorry for I still can't get why you return injection delay for
> > TMCCT...)
>
> You're right. How about this.
>
>
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index c019b8e..cf17ed5 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -87,13 +87,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu
> *vcpu) }
>  EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
>
> -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> -{
> -	kvm_apic_timer_intr_post(vcpu, vec);
> -	/* TODO: PIT, RTC etc. */
> -}
> -EXPORT_SYMBOL_GPL(kvm_timer_intr_post);
> -
>  void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
>  {
>  	__kvm_migrate_apic_timer(vcpu);
> diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
> index 2bf32a0..82579ee 100644
> --- a/arch/x86/kvm/irq.h
> +++ b/arch/x86/kvm/irq.h
> @@ -89,7 +89,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
>
>  void kvm_pic_reset(struct kvm_kpic_state *s);
>
> -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
>  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
>  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..b66f71e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -511,52 +511,22 @@ static void apic_send_ipi(struct kvm_lapic *apic)
>
>  static u32 apic_get_tmcct(struct kvm_lapic *apic)
>  {
> -	u64 counter_passed;
> -	ktime_t passed, now;
> +	ktime_t remaining;
> +	s64 ns;
>  	u32 tmcct;
>
>  	ASSERT(apic != NULL);
>
> -	now = apic->timer.dev.base->get_time();
> -	tmcct = apic_get_reg(apic, APIC_TMICT);
> -
>  	/* if initial count is 0, current count should also be 0 */
> -	if (tmcct == 0)
> +	if (apic_get_reg(apic, APIC_TMICT) == 0)
>  		return 0;
>
> -	if (unlikely(ktime_to_ns(now) <=
> -		ktime_to_ns(apic->timer.last_update))) {
> -		/* Wrap around */
> -		passed = ktime_add(( {
> -				    (ktime_t) {
> -				    .tv64 = KTIME_MAX -
> -				    (apic->timer.last_update).tv64}; }
> -				   ), now);
> -		apic_debug("time elapsed\n");
> -	} else
> -		passed = ktime_sub(now, apic->timer.last_update);
> -
> -	counter_passed = div64_u64(ktime_to_ns(passed),
> -				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
> -
> -	if (counter_passed > tmcct) {
> -		if (unlikely(!apic_lvtt_period(apic))) {
> -			/* one-shot timers stick at 0 until reset */
> -			tmcct = 0;
> -		} else {
> -			/*
> -			 * periodic timers reset to APIC_TMICT when they
> -			 * hit 0. The while loop simulates this happening N
> -			 * times. (counter_passed %= tmcct) would also work,
> -			 * but might be slower or not work on 32-bit??
> -			 */
> -			while (counter_passed > tmcct)
> -				counter_passed -= tmcct;
> -			tmcct -= counter_passed;
> -		}
> -	} else {
> -		tmcct -= counter_passed;
> -	}
> +	remaining = hrtimer_expires_remaining(&apic->timer.dev);
> +	if (remaining.tv64 < 0)

Use ktime_to_ns() macro is better.

The remaining parts are fine with me. But please do more test. :)

Thanks for work!

-- 
regards
Yang, Sheng

> +		remaining = ktime_set(0, 0);
> +
> +	ns = ktime_to_ns(remaining) % apic->timer.period;
> +	tmcct = div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
>
>  	return tmcct;
>  }
> @@ -653,8 +623,6 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  {
>  	ktime_t now = apic->timer.dev.base->get_time();
>
> -	apic->timer.last_update = now;
> -
>  	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>  		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>  	atomic_set(&apic->timer.pending, 0);
> @@ -1110,16 +1078,6 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu
> *vcpu) }
>  }
>
> -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> -{
> -	struct kvm_lapic *apic = vcpu->arch.apic;
> -
> -	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> -		apic->timer.last_update = ktime_add_ns(
> -				apic->timer.last_update,
> -				apic->timer.period);
> -}
> -
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	int vector = kvm_apic_has_interrupt(vcpu);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 8185888..45ab6ee 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -12,7 +12,6 @@ struct kvm_lapic {
>  		atomic_t pending;
>  		s64 period;	/* unit: ns */
>  		u32 divide_count;
> -		ktime_t last_update;
>  		struct hrtimer dev;
>  	} timer;
>  	struct kvm_vcpu *vcpu;
> @@ -42,7 +41,6 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
>  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu);
>  int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
> -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
>
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 14e517e..db5021b 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2305,7 +2305,6 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
>  	/* Okay, we can deliver the interrupt: grab it and update PIC state. */
>  	intr_vector = kvm_cpu_get_interrupt(vcpu);
>  	svm_inject_irq(svm, intr_vector);
> -	kvm_timer_intr_post(vcpu, intr_vector);
>  out:
>  	update_cr8_intercept(vcpu);
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9b56d21..25aaf11 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3377,7 +3377,6 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
>  	}
>  	if (vcpu->arch.interrupt.pending) {
>  		vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
> -		kvm_timer_intr_post(vcpu, vcpu->arch.interrupt.nr);
>  		if (kvm_cpu_has_interrupt(vcpu))
>  			enable_irq_window(vcpu);
>  	}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21  5:11                               ` Sheng Yang
@ 2009-01-21 15:07                                 ` Marcelo Tosatti
  2009-01-21 16:01                                   ` Alexander Graf
                                                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-21 15:07 UTC (permalink / raw)
  To: Sheng Yang, Alexander Graf; +Cc: kvm, avi, Kevin Wolf

On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
> Use ktime_to_ns() macro is better.
> 
> The remaining parts are fine with me. But please do more test. :)
> 
> Thanks for work!

Alexander, can you please confirm this works for you, thanks.


KVM: x86: fix LAPIC pending count calculation

Simplify LAPIC TMCCT calculation by using hrtimer provided
function to query remaining time until expiration.

Fixes host hang with nested ESX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index c019b8e..cf17ed5 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -87,13 +87,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
 
-void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
-{
-	kvm_apic_timer_intr_post(vcpu, vec);
-	/* TODO: PIT, RTC etc. */
-}
-EXPORT_SYMBOL_GPL(kvm_timer_intr_post);
-
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
 {
 	__kvm_migrate_apic_timer(vcpu);
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 2bf32a0..82579ee 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -89,7 +89,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
 
 void kvm_pic_reset(struct kvm_kpic_state *s);
 
-void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..d8adc50 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -511,52 +511,22 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
 {
-	u64 counter_passed;
-	ktime_t passed, now;
+	ktime_t remaining;
+	s64 ns;
 	u32 tmcct;
 
 	ASSERT(apic != NULL);
 
-	now = apic->timer.dev.base->get_time();
-	tmcct = apic_get_reg(apic, APIC_TMICT);
-
 	/* if initial count is 0, current count should also be 0 */
-	if (tmcct == 0)
+	if (apic_get_reg(apic, APIC_TMICT) == 0)
 		return 0;
 
-	if (unlikely(ktime_to_ns(now) <=
-		ktime_to_ns(apic->timer.last_update))) {
-		/* Wrap around */
-		passed = ktime_add(( {
-				    (ktime_t) {
-				    .tv64 = KTIME_MAX -
-				    (apic->timer.last_update).tv64}; }
-				   ), now);
-		apic_debug("time elapsed\n");
-	} else
-		passed = ktime_sub(now, apic->timer.last_update);
-
-	counter_passed = div64_u64(ktime_to_ns(passed),
-				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
-
-	if (counter_passed > tmcct) {
-		if (unlikely(!apic_lvtt_period(apic))) {
-			/* one-shot timers stick at 0 until reset */
-			tmcct = 0;
-		} else {
-			/*
-			 * periodic timers reset to APIC_TMICT when they
-			 * hit 0. The while loop simulates this happening N
-			 * times. (counter_passed %= tmcct) would also work,
-			 * but might be slower or not work on 32-bit??
-			 */
-			while (counter_passed > tmcct)
-				counter_passed -= tmcct;
-			tmcct -= counter_passed;
-		}
-	} else {
-		tmcct -= counter_passed;
-	}
+	remaining = hrtimer_expires_remaining(&apic->timer.dev);
+	if (ktime_to_ns(remaining) < 0)
+		remaining = ktime_set(0, 0);
+
+	ns = ktime_to_ns(remaining) % apic->timer.period;
+	tmcct = div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
 
 	return tmcct;
 }
@@ -653,8 +623,6 @@ static void start_apic_timer(struct kvm_lapic *apic)
 {
 	ktime_t now = apic->timer.dev.base->get_time();
 
-	apic->timer.last_update = now;
-
 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
 	atomic_set(&apic->timer.pending, 0);
@@ -1110,16 +1078,6 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
 	}
 }
 
-void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
-{
-	struct kvm_lapic *apic = vcpu->arch.apic;
-
-	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
-		apic->timer.last_update = ktime_add_ns(
-				apic->timer.last_update,
-				apic->timer.period);
-}
-
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 {
 	int vector = kvm_apic_has_interrupt(vcpu);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8185888..45ab6ee 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,7 +12,6 @@ struct kvm_lapic {
 		atomic_t pending;
 		s64 period;	/* unit: ns */
 		u32 divide_count;
-		ktime_t last_update;
 		struct hrtimer dev;
 	} timer;
 	struct kvm_vcpu *vcpu;
@@ -42,7 +41,6 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu);
 int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
-void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 14e517e..db5021b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2305,7 +2305,6 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	/* Okay, we can deliver the interrupt: grab it and update PIC state. */
 	intr_vector = kvm_cpu_get_interrupt(vcpu);
 	svm_inject_irq(svm, intr_vector);
-	kvm_timer_intr_post(vcpu, intr_vector);
 out:
 	update_cr8_intercept(vcpu);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b56d21..25aaf11 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3377,7 +3377,6 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
 	}
 	if (vcpu->arch.interrupt.pending) {
 		vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
-		kvm_timer_intr_post(vcpu, vcpu->arch.interrupt.nr);
 		if (kvm_cpu_has_interrupt(vcpu))
 			enable_irq_window(vcpu);
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21 15:07                                 ` Marcelo Tosatti
@ 2009-01-21 16:01                                   ` Alexander Graf
  2009-01-21 16:03                                     ` Alexander Graf
  2009-01-21 16:18                                   ` Alexander Graf
  2009-01-22 13:08                                   ` Avi Kivity
  2 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2009-01-21 16:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, kvm, avi, Kevin Wolf

Marcelo Tosatti wrote:
> On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
>   
>> Use ktime_to_ns() macro is better.
>>
>> The remaining parts are fine with me. But please do more test. :)
>>
>> Thanks for work!
>>     
>
> Alexander, can you please confirm this works for you, thanks.
>   

I don't seem to have hrtimer_expires_remaining in my 2.6.27 kernel.
Actually, 2.6.27 hrtimer doesn't even know have the _expires struct member.

Alex


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21 16:01                                   ` Alexander Graf
@ 2009-01-21 16:03                                     ` Alexander Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2009-01-21 16:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, kvm, avi, Kevin Wolf

Alexander Graf wrote:
> Marcelo Tosatti wrote:
>   
>> On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
>>   
>>     
>>> Use ktime_to_ns() macro is better.
>>>
>>> The remaining parts are fine with me. But please do more test. :)
>>>
>>> Thanks for work!
>>>     
>>>       
>> Alexander, can you please confirm this works for you, thanks.
>>   
>>     
>
> I don't seem to have hrtimer_expires_remaining in my 2.6.27 kernel.
> Actually, 2.6.27 hrtimer doesn't even know have the _expires struct member.
>   

Scratch that, I'm trying to work around it. It has an "expires" struct
member.

Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21 15:07                                 ` Marcelo Tosatti
  2009-01-21 16:01                                   ` Alexander Graf
@ 2009-01-21 16:18                                   ` Alexander Graf
  2009-01-21 16:55                                     ` Marcelo Tosatti
  2009-01-22 13:08                                   ` Avi Kivity
  2 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2009-01-21 16:18 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, kvm, avi, Kevin Wolf

Marcelo Tosatti wrote:
> On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
>   
>> Use ktime_to_ns() macro is better.
>>
>> The remaining parts are fine with me. But please do more test. :)
>>
>> Thanks for work!
>>     
>
> Alexander, can you please confirm this works for you, thanks.
>   

Looks good so far. I can't really tell if it's doing the right thing,
but at least it's not crashing the host anymore ;). And the VM does come
up (somewhat).

Signed-off-by: Alexander Graf <agraf@suse.de>

>
> KVM: x86: fix LAPIC pending count calculation
>
> Simplify LAPIC TMCCT calculation by using hrtimer provided
> function to query remaining time until expiration.
>
> Fixes host hang with nested ESX.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
>
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index c019b8e..cf17ed5 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -87,13 +87,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
>  
> -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> -{
> -	kvm_apic_timer_intr_post(vcpu, vec);
> -	/* TODO: PIT, RTC etc. */
> -}
> -EXPORT_SYMBOL_GPL(kvm_timer_intr_post);
> -
>  void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
>  {
>  	__kvm_migrate_apic_timer(vcpu);
> diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
> index 2bf32a0..82579ee 100644
> --- a/arch/x86/kvm/irq.h
> +++ b/arch/x86/kvm/irq.h
> @@ -89,7 +89,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
>  
>  void kvm_pic_reset(struct kvm_kpic_state *s);
>  
> -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
>  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
>  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..d8adc50 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -511,52 +511,22 @@ static void apic_send_ipi(struct kvm_lapic *apic)
>  
>  static u32 apic_get_tmcct(struct kvm_lapic *apic)
>  {
> -	u64 counter_passed;
> -	ktime_t passed, now;
> +	ktime_t remaining;
> +	s64 ns;
>  	u32 tmcct;
>  
>  	ASSERT(apic != NULL);
>  
> -	now = apic->timer.dev.base->get_time();
> -	tmcct = apic_get_reg(apic, APIC_TMICT);
> -
>  	/* if initial count is 0, current count should also be 0 */
> -	if (tmcct == 0)
> +	if (apic_get_reg(apic, APIC_TMICT) == 0)
>  		return 0;
>  
> -	if (unlikely(ktime_to_ns(now) <=
> -		ktime_to_ns(apic->timer.last_update))) {
> -		/* Wrap around */
> -		passed = ktime_add(( {
> -				    (ktime_t) {
> -				    .tv64 = KTIME_MAX -
> -				    (apic->timer.last_update).tv64}; }
> -				   ), now);
> -		apic_debug("time elapsed\n");
> -	} else
> -		passed = ktime_sub(now, apic->timer.last_update);
> -
> -	counter_passed = div64_u64(ktime_to_ns(passed),
> -				   (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
> -
> -	if (counter_passed > tmcct) {
> -		if (unlikely(!apic_lvtt_period(apic))) {
> -			/* one-shot timers stick at 0 until reset */
> -			tmcct = 0;
> -		} else {
> -			/*
> -			 * periodic timers reset to APIC_TMICT when they
> -			 * hit 0. The while loop simulates this happening N
> -			 * times. (counter_passed %= tmcct) would also work,
> -			 * but might be slower or not work on 32-bit??
> -			 */
> -			while (counter_passed > tmcct)
> -				counter_passed -= tmcct;
> -			tmcct -= counter_passed;
> -		}
> -	} else {
> -		tmcct -= counter_passed;
> -	}
> +	remaining = hrtimer_expires_remaining(&apic->timer.dev);
> +	if (ktime_to_ns(remaining) < 0)
> +		remaining = ktime_set(0, 0);
> +
> +	ns = ktime_to_ns(remaining) % apic->timer.period;
> +	tmcct = div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->timer.divide_count));
>  
>  	return tmcct;
>  }
> @@ -653,8 +623,6 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  {
>  	ktime_t now = apic->timer.dev.base->get_time();
>  
> -	apic->timer.last_update = now;
> -
>  	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>  		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>  	atomic_set(&apic->timer.pending, 0);
> @@ -1110,16 +1078,6 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
> -{
> -	struct kvm_lapic *apic = vcpu->arch.apic;
> -
> -	if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> -		apic->timer.last_update = ktime_add_ns(
> -				apic->timer.last_update,
> -				apic->timer.period);
> -}
> -
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	int vector = kvm_apic_has_interrupt(vcpu);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 8185888..45ab6ee 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -12,7 +12,6 @@ struct kvm_lapic {
>  		atomic_t pending;
>  		s64 period;	/* unit: ns */
>  		u32 divide_count;
> -		ktime_t last_update;
>  		struct hrtimer dev;
>  	} timer;
>  	struct kvm_vcpu *vcpu;
> @@ -42,7 +41,6 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data);
>  void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu);
>  int kvm_lapic_enabled(struct kvm_vcpu *vcpu);
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
> -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 14e517e..db5021b 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2305,7 +2305,6 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
>  	/* Okay, we can deliver the interrupt: grab it and update PIC state. */
>  	intr_vector = kvm_cpu_get_interrupt(vcpu);
>  	svm_inject_irq(svm, intr_vector);
> -	kvm_timer_intr_post(vcpu, intr_vector);
>  out:
>  	update_cr8_intercept(vcpu);
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9b56d21..25aaf11 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3377,7 +3377,6 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
>  	}
>  	if (vcpu->arch.interrupt.pending) {
>  		vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
> -		kvm_timer_intr_post(vcpu, vcpu->arch.interrupt.nr);
>  		if (kvm_cpu_has_interrupt(vcpu))
>  			enable_irq_window(vcpu);
>  	}
>   


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21 16:18                                   ` Alexander Graf
@ 2009-01-21 16:55                                     ` Marcelo Tosatti
  0 siblings, 0 replies; 30+ messages in thread
From: Marcelo Tosatti @ 2009-01-21 16:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Sheng Yang, kvm, avi, Kevin Wolf

On Wed, Jan 21, 2009 at 05:18:00PM +0100, Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
> >   
> >> Use ktime_to_ns() macro is better.
> >>
> >> The remaining parts are fine with me. But please do more test. :)
> >>
> >> Thanks for work!
> >>     
> >
> > Alexander, can you please confirm this works for you, thanks.
> >   
> 
> Looks good so far. I can't really tell if it's doing the right thing,
> but at least it's not crashing the host anymore ;). And the VM does come
> up (somewhat).

What might be going wrong there is if there are accumulated interrupts
to be injected (pending > 0). You might want to check that in
apic_get_tmcct.

> Signed-off-by: Alexander Graf <agraf@suse.de>

Thanks


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-21 15:07                                 ` Marcelo Tosatti
  2009-01-21 16:01                                   ` Alexander Graf
  2009-01-21 16:18                                   ` Alexander Graf
@ 2009-01-22 13:08                                   ` Avi Kivity
  2009-01-23 17:58                                     ` Alex Williamson
  2 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2009-01-22 13:08 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, Alexander Graf, kvm, Kevin Wolf

Marcelo Tosatti wrote:
> On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
>   
>> Use ktime_to_ns() macro is better.
>>
>> The remaining parts are fine with me. But please do more test. :)
>>
>> Thanks for work!
>>     
>
> Alexander, can you please confirm this works for you, thanks.
>
>
> KVM: x86: fix LAPIC pending count calculation
>
> Simplify LAPIC TMCCT calculation by using hrtimer provided
> function to query remaining time until expiration.
>
> Fixes host hang with nested ESX.
>
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH] Fix almost infinite loop in APIC
  2009-01-22 13:08                                   ` Avi Kivity
@ 2009-01-23 17:58                                     ` Alex Williamson
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Williamson @ 2009-01-23 17:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Sheng Yang, Alexander Graf, kvm, Kevin Wolf

On Thu, 2009-01-22 at 15:08 +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
> > On Wed, Jan 21, 2009 at 01:11:23PM +0800, Sheng Yang wrote:
> >   
> >> Use ktime_to_ns() macro is better.
> >>
> >> The remaining parts are fine with me. But please do more test. :)
> >>
> >> Thanks for work!
> >>     
> >
> > Alexander, can you please confirm this works for you, thanks.
> >
> >
> > KVM: x86: fix LAPIC pending count calculation
> >
> > Simplify LAPIC TMCCT calculation by using hrtimer provided
> > function to query remaining time until expiration.
> >
> > Fixes host hang with nested ESX.
> >
> >   
> 
> Applied, thanks.

This seems to introduce a "Unknown symbol __moddi3" in the kvm module on
a 32bit build.

Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-01-23 17:58 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-08 16:36 [PATCH] Fix almost infinite loop in APIC Alexander Graf
2009-01-09  6:34 ` Sheng Yang
2009-01-09 10:49   ` Alexander Graf
2009-01-09 12:57   ` Alexander Graf
2009-01-10 11:21     ` Sheng Yang
2009-01-11  4:55       ` Marcelo Tosatti
2009-01-13  7:47         ` Sheng Yang
2009-01-13 22:01           ` Marcelo Tosatti
2009-01-14  9:17             ` Sheng Yang
2009-01-14 17:03               ` Marcelo Tosatti
2009-01-15  7:20                 ` Sheng Yang
2009-01-16  5:01                   ` Marcelo Tosatti
2009-01-20 10:41                     ` Alexander Graf
2009-01-20 11:20                       ` Sheng Yang
2009-01-20 12:09                         ` Alexander Graf
2009-01-20 12:30                           ` Sheng Yang
2009-01-20 13:43                       ` Sheng Yang
2009-01-20 18:51                         ` Marcelo Tosatti
2009-01-21  2:40                           ` Sheng Yang
2009-01-21  4:23                             ` Marcelo Tosatti
2009-01-21  5:11                               ` Sheng Yang
2009-01-21 15:07                                 ` Marcelo Tosatti
2009-01-21 16:01                                   ` Alexander Graf
2009-01-21 16:03                                     ` Alexander Graf
2009-01-21 16:18                                   ` Alexander Graf
2009-01-21 16:55                                     ` Marcelo Tosatti
2009-01-22 13:08                                   ` Avi Kivity
2009-01-23 17:58                                     ` Alex Williamson
2009-01-10 11:25     ` Sheng Yang
2009-01-10 11:28     ` Sheng Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.