linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
       [not found] ` <SG2PR02MB15507073C531BAA82E9C8E67805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
@ 2016-06-07 10:34   ` Paolo Bonzini
       [not found]     ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-07 10:34 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Ingo Molnar, Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Radim Krčmář



On 07/06/2016 10:00, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") 
> set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario:
> 
> | steal is smaller than rq->prev_steal_time we end up with an insane large
> | value which then gets added to rq->prev_steal_time, resulting in a permanent
> | wreckage of the accounting.
> 
> However, it is still buggy.
> 
> rq->prev_steal_time = 0:
> 
> As Rik pointed out:
> 
> | setting rq->prev_irq_time to 0 in the guest, and then getting a giant value from 
> | the host, could result in a very large of steal_jiffies.
> 
> rq->prev_steal_time_rq = 0:
> 
> | steal = paravirt_steal_clock(cpu_of(rq));
> | steal -= rq->prev_steal_time_rq;
> |
> | if (unlikely(steal > delta))
> |	 steal = delta;
> |
> | rq->prev_steal_time_rq += steal;
> | delta -= steal;
> |
> | rq->clock_task += delta;
> 
> steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq 
> grows in delta granularity, rq->clock_task can't ramp up until rq->prev_steal_time_rq 
> catches up steal clock since delta value will be 0 after reducing steal time from 
> normal execution time. That's why I obersved that cpuhg/1-12 continue running 
> until rq->prev_steal_time_rq catches up steal clock timestamp.
> 
> I believe rq->prev_irq_time has similar issue. So this patch fix it by setting 
> rq->prev_* to current irq time and steal clock timestamp after a cpu hotplug 
> comes back.

I'm not sure this patch is necessary.  Instead you could just revert
commit e9532e69b8d1.  The previous patch obviously makes it unnecessary
to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset
of rq->prev_irq_time looks like a no-op to me.

Paolo

> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  kernel/sched/sched.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f30..e6758af 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1813,12 +1813,12 @@ static inline void cpufreq_trigger_update(u64 time) {}
>  static inline void account_reset_rq(struct rq *rq)
>  {
>  #ifdef CONFIG_IRQ_TIME_ACCOUNTING
> -	rq->prev_irq_time = 0;
> +	rq->prev_irq_time = irq_time_read(cpu_of(rq));
>  #endif
>  #ifdef CONFIG_PARAVIRT
> -	rq->prev_steal_time = 0;
> +	rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
>  #endif
>  #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> -	rq->prev_steal_time_rq = 0;
> +	rq->prev_steal_time_rq = paravirt_steal_clock(cpu_of(rq));
>  #endif
>  }
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
       [not found] ` <SG2PR02MB15501BB06CAE47E4406FF7AD805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
@ 2016-06-07 10:47   ` Paolo Bonzini
  2016-06-07 11:23     ` Wanpeng Li
  2016-06-07 12:15     ` Wanpeng Li
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-07 10:47 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Ingo Molnar, Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Radim Krčmář



On 07/06/2016 10:00, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch adds guest steal-time support to full dynticks CPU
> time accounting. After the following commit:
> 
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
> 
> ... time sampling became jiffy based, even if it's still listened
> to ring boundaries, so steal_account_process_tick() is reused
> to account how many 'ticks' are stolen-time, after the last accumulation.

I still have no idea how to parse this.  What are "ring boundaries"?
Rik, can you suggest a better commit message?

> Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com>

Please split Suggested-by and Reviewed-by.

> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 75f98c5..9ff036b 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
>  		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
>  }
>  
> -static __always_inline bool steal_account_process_tick(void)
> +static __always_inline unsigned long steal_account_process_tick(void)
>  {
>  #ifdef CONFIG_PARAVIRT
>  	if (static_key_false(&paravirt_steal_enabled)) {
> @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
>  		return steal_jiffies;
>  	}
>  #endif
> -	return false;
> +	return 0;
>  }
>  
>  /*
> @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
>  
>  static void __vtime_account_system(struct task_struct *tsk)
>  {
> -	cputime_t delta_cpu = get_vtime_delta(tsk);
> +	cputime_t delta_time = get_vtime_delta(tsk);
> +	cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick());
>  
> -	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
> +	if (steal_time < delta_time) {
> +		delta_time -= steal_time;
> +		account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time));
> +	}
>  }
>  
>  void vtime_account_system(struct task_struct *tsk)
> @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk)
>  
>  void vtime_account_user(struct task_struct *tsk)
>  {
> -	cputime_t delta_cpu;
> +	cputime_t delta_time, steal_time;
>  
>  	write_seqcount_begin(&tsk->vtime_seqcount);
>  	tsk->vtime_snap_whence = VTIME_SYS;
>  	if (vtime_delta(tsk)) {
> -		delta_cpu = get_vtime_delta(tsk);
> -		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
> +		delta_time = get_vtime_delta(tsk);
> +		steal_time = jiffies_to_cputime(steal_account_process_tick());
> +
> +		if (steal_time < delta_time) {
> +			delta_time -= steal_time;
> +			account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
> +		}
>  	}
>  	write_seqcount_end(&tsk->vtime_seqcount);
>  }
> 

You're adding almost the same code to two callers of get_vtime_delta out
of three.  I don't know the vtime accounting code very well, but why
doesn't the same apply to account_idle_time?

If it does, you should instead change get_vtime_delta to process steal
time and subtract it from the result.

Secondarily, when can it happen that steal_time > delta_time?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
  2016-06-07 10:47   ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini
@ 2016-06-07 11:23     ` Wanpeng Li
  2016-06-07 12:15     ` Wanpeng Li
  1 sibling, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-06-07 11:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar,
	Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Radim Krčmář

2016-06-07 18:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/06/2016 10:00, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This patch adds guest steal-time support to full dynticks CPU
>> time accounting. After the following commit:
>>
>> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
>>
>> ... time sampling became jiffy based, even if it's still listened
>> to ring boundaries, so steal_account_process_tick() is reused
>> to account how many 'ticks' are stolen-time, after the last accumulation.
>
> I still have no idea how to parse this.  What are "ring boundaries"?
> Rik, can you suggest a better commit message?

It is original from this slides.
http://ertl.jp/~shinpei/conf/ospert13/slides/FredericWeisbecker.pdf,
slide 28.

>
>> Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com>
>
> Please split Suggested-by and Reviewed-by.
>
>> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
>> index 75f98c5..9ff036b 100644
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
>>               cpustat[CPUTIME_IDLE] += (__force u64) cputime;
>>  }
>>
>> -static __always_inline bool steal_account_process_tick(void)
>> +static __always_inline unsigned long steal_account_process_tick(void)
>>  {
>>  #ifdef CONFIG_PARAVIRT
>>       if (static_key_false(&paravirt_steal_enabled)) {
>> @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
>>               return steal_jiffies;
>>       }
>>  #endif
>> -     return false;
>> +     return 0;
>>  }
>>
>>  /*
>> @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
>>
>>  static void __vtime_account_system(struct task_struct *tsk)
>>  {
>> -     cputime_t delta_cpu = get_vtime_delta(tsk);
>> +     cputime_t delta_time = get_vtime_delta(tsk);
>> +     cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick());
>>
>> -     account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
>> +     if (steal_time < delta_time) {
>> +             delta_time -= steal_time;
>> +             account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time));
>> +     }
>>  }
>>
>>  void vtime_account_system(struct task_struct *tsk)
>> @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk)
>>
>>  void vtime_account_user(struct task_struct *tsk)
>>  {
>> -     cputime_t delta_cpu;
>> +     cputime_t delta_time, steal_time;
>>
>>       write_seqcount_begin(&tsk->vtime_seqcount);
>>       tsk->vtime_snap_whence = VTIME_SYS;
>>       if (vtime_delta(tsk)) {
>> -             delta_cpu = get_vtime_delta(tsk);
>> -             account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
>> +             delta_time = get_vtime_delta(tsk);
>> +             steal_time = jiffies_to_cputime(steal_account_process_tick());
>> +
>> +             if (steal_time < delta_time) {
>> +                     delta_time -= steal_time;
>> +                     account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
>> +             }
>>       }
>>       write_seqcount_end(&tsk->vtime_seqcount);
>>  }
>>
>
> You're adding almost the same code to two callers of get_vtime_delta out
> of three.  I don't know the vtime accounting code very well, but why
> doesn't the same apply to account_idle_time?
>
> If it does, you should instead change get_vtime_delta to process steal
> time and subtract it from the result.
>
> Secondarily, when can it happen that steal_time > delta_time?
>
> Thanks,
>
> Paolo



-- 
Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
       [not found]     ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
@ 2016-06-07 11:50       ` Wanpeng Li
  2016-06-07 11:53         ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2016-06-07 11:50 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krcmar, Wanpeng Li, Ingo Molnar,
	Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker

2016-06-07 19:41 GMT+08:00 Wanpeng Li <wanpeng.li@hotmail.com>:
> On 07/06/2016 10:00, Wanpeng Li wrote:
>
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU
>> hotplug")
>> set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the
>> scenario:
>>
>> | steal is smaller than rq->prev_steal_time we end up with an insane large
>> | value which then gets added to rq->prev_steal_time, resulting in a
>> permanent
>> | wreckage of the accounting.
>>
>> However, it is still buggy.
>>
>> rq->prev_steal_time = 0:
>>
>> As Rik pointed out:
>>
>> | setting rq->prev_irq_time to 0 in the guest, and then getting a giant
>> value from
>> | the host, could result in a very large of steal_jiffies.
>>
>> rq->prev_steal_time_rq = 0:
>>
>> | steal = paravirt_steal_clock(cpu_of(rq));
>> | steal -= rq->prev_steal_time_rq;
>> |
>> | if (unlikely(steal > delta))
>> |      steal = delta;
>> |
>> | rq->prev_steal_time_rq += steal;
>> | delta -= steal;
>> |
>> | rq->clock_task += delta;
>>
>> steal is a giant value and rq->prev_steal_time_rq is 0,
>> rq->prev_steal_time_rq
>> grows in delta granularity, rq->clock_task can't ramp up until
>> rq->prev_steal_time_rq
>> catches up steal clock since delta value will be 0 after reducing steal
>> time from
>> normal execution time. That's why I obersved that cpuhg/1-12 continue
>> running
>> until rq->prev_steal_time_rq catches up steal clock timestamp.
>>
>> I believe rq->prev_irq_time has similar issue. So this patch fix it by
>> setting
>> rq->prev_* to current irq time and steal clock timestamp after a cpu
>> hotplug
>> comes back.
>
> I'm not sure this patch is necessary.  Instead you could just revert
> commit e9532e69b8d1.  The previous patch obviously makes it unnecessary
> to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset
> of rq->prev_irq_time looks like a no-op to me.

The reason why I'm not just simple revert it is that commit mentioned
"steal is smaller than rq->prev_steal_time we end up with an insane
large value which then gets added to rq->prev_steal_time, resulting in
a permanent wreckage of the accounting." Though I didn't meet such
scenario. So I just do what that commit really want to do.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
  2016-06-07 11:50       ` Fw: " Wanpeng Li
@ 2016-06-07 11:53         ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-07 11:53 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krcmar, Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker



On 07/06/2016 13:50, Wanpeng Li wrote:
>> > I'm not sure this patch is necessary.  Instead you could just revert
>> > commit e9532e69b8d1.  The previous patch obviously makes it unnecessary
>> > to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset
>> > of rq->prev_irq_time looks like a no-op to me.
> The reason why I'm not just simple revert it is that commit mentioned
> "steal is smaller than rq->prev_steal_time we end up with an insane
> large value which then gets added to rq->prev_steal_time, resulting in
> a permanent wreckage of the accounting."

With this patch, you go back to having underflow if steal is smaller
than rq->prev_steal_time.  The point is that it should never be smaller;
it was only smaller because of the bug that you are fixing in patch 1.

Thanks,

Paolo

 Though I didn't meet such
> scenario. So I just do what that commit really want to do.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
  2016-06-07 10:47   ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini
  2016-06-07 11:23     ` Wanpeng Li
@ 2016-06-07 12:15     ` Wanpeng Li
  2016-06-07 12:34       ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2016-06-07 12:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar,
	Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Radim Krčmář

2016-06-07 18:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/06/2016 10:00, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This patch adds guest steal-time support to full dynticks CPU
>> time accounting. After the following commit:
>>
>> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
>>
>> ... time sampling became jiffy based, even if it's still listened
>> to ring boundaries, so steal_account_process_tick() is reused
>> to account how many 'ticks' are stolen-time, after the last accumulation.
>
> I still have no idea how to parse this.  What are "ring boundaries"?
> Rik, can you suggest a better commit message?
>
>> Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com>
>
> Please split Suggested-by and Reviewed-by.
>
>> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
>> index 75f98c5..9ff036b 100644
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
>>               cpustat[CPUTIME_IDLE] += (__force u64) cputime;
>>  }
>>
>> -static __always_inline bool steal_account_process_tick(void)
>> +static __always_inline unsigned long steal_account_process_tick(void)
>>  {
>>  #ifdef CONFIG_PARAVIRT
>>       if (static_key_false(&paravirt_steal_enabled)) {
>> @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
>>               return steal_jiffies;
>>       }
>>  #endif
>> -     return false;
>> +     return 0;
>>  }
>>
>>  /*
>> @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
>>
>>  static void __vtime_account_system(struct task_struct *tsk)
>>  {
>> -     cputime_t delta_cpu = get_vtime_delta(tsk);
>> +     cputime_t delta_time = get_vtime_delta(tsk);
>> +     cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick());
>>
>> -     account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
>> +     if (steal_time < delta_time) {
>> +             delta_time -= steal_time;
>> +             account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time));
>> +     }
>>  }
>>
>>  void vtime_account_system(struct task_struct *tsk)
>> @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk)
>>
>>  void vtime_account_user(struct task_struct *tsk)
>>  {
>> -     cputime_t delta_cpu;
>> +     cputime_t delta_time, steal_time;
>>
>>       write_seqcount_begin(&tsk->vtime_seqcount);
>>       tsk->vtime_snap_whence = VTIME_SYS;
>>       if (vtime_delta(tsk)) {
>> -             delta_cpu = get_vtime_delta(tsk);
>> -             account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
>> +             delta_time = get_vtime_delta(tsk);
>> +             steal_time = jiffies_to_cputime(steal_account_process_tick());
>> +
>> +             if (steal_time < delta_time) {
>> +                     delta_time -= steal_time;
>> +                     account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
>> +             }
>>       }
>>       write_seqcount_end(&tsk->vtime_seqcount);
>>  }
>>
>
> You're adding almost the same code to two callers of get_vtime_delta out
> of three.  I don't know the vtime accounting code very well, but why
> doesn't the same apply to account_idle_time?

St stuff is accounted when vCPUs(tasks on host) are enqueued in rb
trees of pCPUs, which means that they are ready to run until they
finially reach CPUs. However, when vCPUs are idle, they will be
dequeued from rb trees and the time will be not accounted as st.

>
> If it does, you should instead change get_vtime_delta to process steal
> time and subtract it from the result.
>
> Secondarily, when can it happen that steal_time > delta_time?

Rik explanation it when he reply to v1.
http://www.gossamer-threads.com/lists/linux/kernel/2441175?do=post_view_threaded#2441175

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
  2016-06-07 12:15     ` Wanpeng Li
@ 2016-06-07 12:34       ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-07 12:34 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar,
	Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Radim Krčmář



On 07/06/2016 14:15, Wanpeng Li wrote:
>> >
>> > You're adding almost the same code to two callers of get_vtime_delta out
>> > of three.  I don't know the vtime accounting code very well, but why
>> > doesn't the same apply to account_idle_time?
> St stuff is accounted when vCPUs(tasks on host) are enqueued in rb
> trees of pCPUs, which means that they are ready to run until they
> finially reach CPUs. However, when vCPUs are idle, they will be
> dequeued from rb trees and the time will be not accounted as st.

Why not?  If idle=poll, for example, any time the guest is suspended
(and thus cannot poll) does count as stolen time.

In addition, you are going to account the stolen time anyway sooner or
later, and then it will be accounted wrong (subtracted to either user or
system time).  I really believe you should do the change directly in
get_vtime_delta.

Paolo

>> > If it does, you should instead change get_vtime_delta to process steal
>> > time and subtract it from the result.
>> >
>> > Secondarily, when can it happen that steal_time > delta_time?
> Rik explanation it when he reply to v1.
> http://www.gossamer-threads.com/lists/linux/kernel/2441175?do=post_view_threaded#2441175

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
  2016-06-07  8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
@ 2016-06-07 16:18   ` kbuild test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2016-06-07 16:18 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kbuild-all, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
	Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Paolo Bonzini, Radim Krčmář

[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]

Hi,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.7-rc2 next-20160607]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Wanpeng-Li/KVM-fix-steal-clock-warp-during-guest-cpu-hotplug/20160607-163708
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   In file included from kernel/sched/cputime.c:7:0:
   kernel/sched/sched.h: In function 'account_reset_rq':
   kernel/sched/sched.h:1819:24: error: implicit declaration of function 'paravirt_steal_clock' [-Werror=implicit-function-declaration]
     rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
                           ^
   In file included from kernel/sched/cputime.c:9:0:
   arch/arm/include/asm/paravirt.h: At top level:
>> arch/arm/include/asm/paravirt.h:14:19: error: conflicting types for 'paravirt_steal_clock'
    static inline u64 paravirt_steal_clock(int cpu)
                      ^
   In file included from kernel/sched/cputime.c:7:0:
   kernel/sched/sched.h:1819:24: note: previous implicit declaration of 'paravirt_steal_clock' was here
     rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
                           ^
   cc1: some warnings being treated as errors

vim +/paravirt_steal_clock +14 arch/arm/include/asm/paravirt.h

02c2433b Stefano Stabellini 2015-11-23   8  
02c2433b Stefano Stabellini 2015-11-23   9  struct pv_time_ops {
02c2433b Stefano Stabellini 2015-11-23  10  	unsigned long long (*steal_clock)(int cpu);
02c2433b Stefano Stabellini 2015-11-23  11  };
02c2433b Stefano Stabellini 2015-11-23  12  extern struct pv_time_ops pv_time_ops;
02c2433b Stefano Stabellini 2015-11-23  13  
02c2433b Stefano Stabellini 2015-11-23 @14  static inline u64 paravirt_steal_clock(int cpu)
02c2433b Stefano Stabellini 2015-11-23  15  {
02c2433b Stefano Stabellini 2015-11-23  16  	return pv_time_ops.steal_clock(cpu);
02c2433b Stefano Stabellini 2015-11-23  17  }

:::::: The code at line 14 was first introduced by commit
:::::: 02c2433b3aa6b57313c261c9811bbbe49528101c arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops

:::::: TO: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
:::::: CC: David Vrabel <david.vrabel@citrix.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 57606 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
  2016-06-07  8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
@ 2016-06-07  8:33 ` Wanpeng Li
  2016-06-07 16:18   ` kbuild test robot
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2016-06-07  8:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpeng.li@hotmail.com>

Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") 
set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario:

| steal is smaller than rq->prev_steal_time we end up with an insane large
| value which then gets added to rq->prev_steal_time, resulting in a permanent
| wreckage of the accounting.

However, it is still buggy.

rq->prev_steal_time = 0:

As Rik pointed out:

| setting rq->prev_irq_time to 0 in the guest, and then getting a giant value from 
| the host, could result in a very large of steal_jiffies.

rq->prev_steal_time_rq = 0:

| steal = paravirt_steal_clock(cpu_of(rq));
| steal -= rq->prev_steal_time_rq;
|
| if (unlikely(steal > delta))
|	 steal = delta;
|
| rq->prev_steal_time_rq += steal;
| delta -= steal;
|
| rq->clock_task += delta;

steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq 
grows in delta granularity, rq->clock_task can't ramp up until rq->prev_steal_time_rq 
catches up steal clock since delta value will be 0 after reducing steal time from 
normal execution time. That's why I obersved that cpuhg/1-12 continue running 
until rq->prev_steal_time_rq catches up steal clock timestamp.

I believe rq->prev_irq_time has similar issue. So this patch fix it by setting 
rq->prev_* to current irq time and steal clock timestamp after a cpu hotplug 
comes back.

Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/sched/sched.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..e6758af 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1813,12 +1813,12 @@ static inline void cpufreq_trigger_update(u64 time) {}
 static inline void account_reset_rq(struct rq *rq)
 {
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-	rq->prev_irq_time = 0;
+	rq->prev_irq_time = irq_time_read(cpu_of(rq));
 #endif
 #ifdef CONFIG_PARAVIRT
-	rq->prev_steal_time = 0;
+	rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
 #endif
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
-	rq->prev_steal_time_rq = 0;
+	rq->prev_steal_time_rq = paravirt_steal_clock(cpu_of(rq));
 #endif
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-07 16:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1465661590-4732-1-git-send-email-wanpeng.li@hotmail.com>
     [not found] ` <SG2PR02MB15507073C531BAA82E9C8E67805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
2016-06-07 10:34   ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Paolo Bonzini
     [not found]     ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
2016-06-07 11:50       ` Fw: " Wanpeng Li
2016-06-07 11:53         ` Paolo Bonzini
     [not found] ` <SG2PR02MB15501BB06CAE47E4406FF7AD805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
2016-06-07 10:47   ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini
2016-06-07 11:23     ` Wanpeng Li
2016-06-07 12:15     ` Wanpeng Li
2016-06-07 12:34       ` Paolo Bonzini
2016-06-07  8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
2016-06-07  8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
2016-06-07 16:18   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).