* Re: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug [not found] ` <SG2PR02MB15507073C531BAA82E9C8E67805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> @ 2016-06-07 10:34 ` Paolo Bonzini [not found] ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> 0 siblings, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2016-06-07 10:34 UTC (permalink / raw) To: Wanpeng Li, linux-kernel, kvm Cc: Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Radim Krčmář On 07/06/2016 10:00, Wanpeng Li wrote: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") > set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario: > > | steal is smaller than rq->prev_steal_time we end up with an insane large > | value which then gets added to rq->prev_steal_time, resulting in a permanent > | wreckage of the accounting. > > However, it is still buggy. > > rq->prev_steal_time = 0: > > As Rik pointed out: > > | setting rq->prev_irq_time to 0 in the guest, and then getting a giant value from > | the host, could result in a very large of steal_jiffies. > > rq->prev_steal_time_rq = 0: > > | steal = paravirt_steal_clock(cpu_of(rq)); > | steal -= rq->prev_steal_time_rq; > | > | if (unlikely(steal > delta)) > | steal = delta; > | > | rq->prev_steal_time_rq += steal; > | delta -= steal; > | > | rq->clock_task += delta; > > steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq > grows in delta granularity, rq->clock_task can't ramp up until rq->prev_steal_time_rq > catches up steal clock since delta value will be 0 after reducing steal time from > normal execution time. That's why I obersved that cpuhg/1-12 continue running > until rq->prev_steal_time_rq catches up steal clock timestamp. > > I believe rq->prev_irq_time has similar issue. So this patch fix it by setting > rq->prev_* to current irq time and steal clock timestamp after a cpu hotplug > comes back. I'm not sure this patch is necessary. Instead you could just revert commit e9532e69b8d1. The previous patch obviously makes it unnecessary to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset of rq->prev_irq_time looks like a no-op to me. Paolo > Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' > Acked-by: Rik van Riel <riel@redhat.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Rik van Riel <riel@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- > kernel/sched/sched.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 72f1f30..e6758af 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -1813,12 +1813,12 @@ static inline void cpufreq_trigger_update(u64 time) {} > static inline void account_reset_rq(struct rq *rq) > { > #ifdef CONFIG_IRQ_TIME_ACCOUNTING > - rq->prev_irq_time = 0; > + rq->prev_irq_time = irq_time_read(cpu_of(rq)); > #endif > #ifdef CONFIG_PARAVIRT > - rq->prev_steal_time = 0; > + rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq)); > #endif > #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING > - rq->prev_steal_time_rq = 0; > + rq->prev_steal_time_rq = paravirt_steal_clock(cpu_of(rq)); > #endif > } > ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>]
* Re: Fw: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug [not found] ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> @ 2016-06-07 11:50 ` Wanpeng Li 2016-06-07 11:53 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Wanpeng Li @ 2016-06-07 11:50 UTC (permalink / raw) To: linux-kernel, kvm Cc: Paolo Bonzini, Radim Krcmar, Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker 2016-06-07 19:41 GMT+08:00 Wanpeng Li <wanpeng.li@hotmail.com>: > On 07/06/2016 10:00, Wanpeng Li wrote: > >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU >> hotplug") >> set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the >> scenario: >> >> | steal is smaller than rq->prev_steal_time we end up with an insane large >> | value which then gets added to rq->prev_steal_time, resulting in a >> permanent >> | wreckage of the accounting. >> >> However, it is still buggy. >> >> rq->prev_steal_time = 0: >> >> As Rik pointed out: >> >> | setting rq->prev_irq_time to 0 in the guest, and then getting a giant >> value from >> | the host, could result in a very large of steal_jiffies. >> >> rq->prev_steal_time_rq = 0: >> >> | steal = paravirt_steal_clock(cpu_of(rq)); >> | steal -= rq->prev_steal_time_rq; >> | >> | if (unlikely(steal > delta)) >> | steal = delta; >> | >> | rq->prev_steal_time_rq += steal; >> | delta -= steal; >> | >> | rq->clock_task += delta; >> >> steal is a giant value and rq->prev_steal_time_rq is 0, >> rq->prev_steal_time_rq >> grows in delta granularity, rq->clock_task can't ramp up until >> rq->prev_steal_time_rq >> catches up steal clock since delta value will be 0 after reducing steal >> time from >> normal execution time. That's why I obersved that cpuhg/1-12 continue >> running >> until rq->prev_steal_time_rq catches up steal clock timestamp. >> >> I believe rq->prev_irq_time has similar issue. So this patch fix it by >> setting >> rq->prev_* to current irq time and steal clock timestamp after a cpu >> hotplug >> comes back. > > I'm not sure this patch is necessary. Instead you could just revert > commit e9532e69b8d1. The previous patch obviously makes it unnecessary > to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset > of rq->prev_irq_time looks like a no-op to me. The reason why I'm not just simple revert it is that commit mentioned "steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting." Though I didn't meet such scenario. So I just do what that commit really want to do. Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug 2016-06-07 11:50 ` Fw: " Wanpeng Li @ 2016-06-07 11:53 ` Paolo Bonzini 0 siblings, 0 replies; 8+ messages in thread From: Paolo Bonzini @ 2016-06-07 11:53 UTC (permalink / raw) To: Wanpeng Li, linux-kernel, kvm Cc: Radim Krcmar, Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker On 07/06/2016 13:50, Wanpeng Li wrote: >> > I'm not sure this patch is necessary. Instead you could just revert >> > commit e9532e69b8d1. The previous patch obviously makes it unnecessary >> > to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset >> > of rq->prev_irq_time looks like a no-op to me. > The reason why I'm not just simple revert it is that commit mentioned > "steal is smaller than rq->prev_steal_time we end up with an insane > large value which then gets added to rq->prev_steal_time, resulting in > a permanent wreckage of the accounting." With this patch, you go back to having underflow if steal is smaller than rq->prev_steal_time. The point is that it should never be smaller; it was only smaller because of the bug that you are fixing in patch 1. Thanks, Paolo Though I didn't meet such > scenario. So I just do what that commit really want to do. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <SG2PR02MB15501BB06CAE47E4406FF7AD805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>]
* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting [not found] ` <SG2PR02MB15501BB06CAE47E4406FF7AD805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> @ 2016-06-07 10:47 ` Paolo Bonzini 2016-06-07 11:23 ` Wanpeng Li 2016-06-07 12:15 ` Wanpeng Li 0 siblings, 2 replies; 8+ messages in thread From: Paolo Bonzini @ 2016-06-07 10:47 UTC (permalink / raw) To: Wanpeng Li, linux-kernel, kvm Cc: Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Radim Krčmář On 07/06/2016 10:00, Wanpeng Li wrote: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > This patch adds guest steal-time support to full dynticks CPU > time accounting. After the following commit: > > ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity") > > ... time sampling became jiffy based, even if it's still listened > to ring boundaries, so steal_account_process_tick() is reused > to account how many 'ticks' are stolen-time, after the last accumulation. I still have no idea how to parse this. What are "ring boundaries"? Rik, can you suggest a better commit message? > Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com> Please split Suggested-by and Reviewed-by. > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 75f98c5..9ff036b 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime) > cpustat[CPUTIME_IDLE] += (__force u64) cputime; > } > > -static __always_inline bool steal_account_process_tick(void) > +static __always_inline unsigned long steal_account_process_tick(void) > { > #ifdef CONFIG_PARAVIRT > if (static_key_false(¶virt_steal_enabled)) { > @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void) > return steal_jiffies; > } > #endif > - return false; > + return 0; > } > > /* > @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk) > > static void __vtime_account_system(struct task_struct *tsk) > { > - cputime_t delta_cpu = get_vtime_delta(tsk); > + cputime_t delta_time = get_vtime_delta(tsk); > + cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick()); > > - account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu)); > + if (steal_time < delta_time) { > + delta_time -= steal_time; > + account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time)); > + } > } > > void vtime_account_system(struct task_struct *tsk) > @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk) > > void vtime_account_user(struct task_struct *tsk) > { > - cputime_t delta_cpu; > + cputime_t delta_time, steal_time; > > write_seqcount_begin(&tsk->vtime_seqcount); > tsk->vtime_snap_whence = VTIME_SYS; > if (vtime_delta(tsk)) { > - delta_cpu = get_vtime_delta(tsk); > - account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); > + delta_time = get_vtime_delta(tsk); > + steal_time = jiffies_to_cputime(steal_account_process_tick()); > + > + if (steal_time < delta_time) { > + delta_time -= steal_time; > + account_user_time(tsk, delta_time, cputime_to_scaled(delta_time)); > + } > } > write_seqcount_end(&tsk->vtime_seqcount); > } > You're adding almost the same code to two callers of get_vtime_delta out of three. I don't know the vtime accounting code very well, but why doesn't the same apply to account_idle_time? If it does, you should instead change get_vtime_delta to process steal time and subtract it from the result. Secondarily, when can it happen that steal_time > delta_time? Thanks, Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting 2016-06-07 10:47 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini @ 2016-06-07 11:23 ` Wanpeng Li 2016-06-07 12:15 ` Wanpeng Li 1 sibling, 0 replies; 8+ messages in thread From: Wanpeng Li @ 2016-06-07 11:23 UTC (permalink / raw) To: Paolo Bonzini Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Radim Krčmář 2016-06-07 18:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 07/06/2016 10:00, Wanpeng Li wrote: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> This patch adds guest steal-time support to full dynticks CPU >> time accounting. After the following commit: >> >> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity") >> >> ... time sampling became jiffy based, even if it's still listened >> to ring boundaries, so steal_account_process_tick() is reused >> to account how many 'ticks' are stolen-time, after the last accumulation. > > I still have no idea how to parse this. What are "ring boundaries"? > Rik, can you suggest a better commit message? It is original from this slides. http://ertl.jp/~shinpei/conf/ospert13/slides/FredericWeisbecker.pdf, slide 28. > >> Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com> > > Please split Suggested-by and Reviewed-by. > >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c >> index 75f98c5..9ff036b 100644 >> --- a/kernel/sched/cputime.c >> +++ b/kernel/sched/cputime.c >> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime) >> cpustat[CPUTIME_IDLE] += (__force u64) cputime; >> } >> >> -static __always_inline bool steal_account_process_tick(void) >> +static __always_inline unsigned long steal_account_process_tick(void) >> { >> #ifdef CONFIG_PARAVIRT >> if (static_key_false(¶virt_steal_enabled)) { >> @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void) >> return steal_jiffies; >> } >> #endif >> - return false; >> + return 0; >> } >> >> /* >> @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk) >> >> static void __vtime_account_system(struct task_struct *tsk) >> { >> - cputime_t delta_cpu = get_vtime_delta(tsk); >> + cputime_t delta_time = get_vtime_delta(tsk); >> + cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick()); >> >> - account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu)); >> + if (steal_time < delta_time) { >> + delta_time -= steal_time; >> + account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time)); >> + } >> } >> >> void vtime_account_system(struct task_struct *tsk) >> @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk) >> >> void vtime_account_user(struct task_struct *tsk) >> { >> - cputime_t delta_cpu; >> + cputime_t delta_time, steal_time; >> >> write_seqcount_begin(&tsk->vtime_seqcount); >> tsk->vtime_snap_whence = VTIME_SYS; >> if (vtime_delta(tsk)) { >> - delta_cpu = get_vtime_delta(tsk); >> - account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); >> + delta_time = get_vtime_delta(tsk); >> + steal_time = jiffies_to_cputime(steal_account_process_tick()); >> + >> + if (steal_time < delta_time) { >> + delta_time -= steal_time; >> + account_user_time(tsk, delta_time, cputime_to_scaled(delta_time)); >> + } >> } >> write_seqcount_end(&tsk->vtime_seqcount); >> } >> > > You're adding almost the same code to two callers of get_vtime_delta out > of three. I don't know the vtime accounting code very well, but why > doesn't the same apply to account_idle_time? > > If it does, you should instead change get_vtime_delta to process steal > time and subtract it from the result. > > Secondarily, when can it happen that steal_time > delta_time? > > Thanks, > > Paolo -- Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting 2016-06-07 10:47 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini 2016-06-07 11:23 ` Wanpeng Li @ 2016-06-07 12:15 ` Wanpeng Li 2016-06-07 12:34 ` Paolo Bonzini 1 sibling, 1 reply; 8+ messages in thread From: Wanpeng Li @ 2016-06-07 12:15 UTC (permalink / raw) To: Paolo Bonzini Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Radim Krčmář 2016-06-07 18:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 07/06/2016 10:00, Wanpeng Li wrote: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> This patch adds guest steal-time support to full dynticks CPU >> time accounting. After the following commit: >> >> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity") >> >> ... time sampling became jiffy based, even if it's still listened >> to ring boundaries, so steal_account_process_tick() is reused >> to account how many 'ticks' are stolen-time, after the last accumulation. > > I still have no idea how to parse this. What are "ring boundaries"? > Rik, can you suggest a better commit message? > >> Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com> > > Please split Suggested-by and Reviewed-by. > >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c >> index 75f98c5..9ff036b 100644 >> --- a/kernel/sched/cputime.c >> +++ b/kernel/sched/cputime.c >> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime) >> cpustat[CPUTIME_IDLE] += (__force u64) cputime; >> } >> >> -static __always_inline bool steal_account_process_tick(void) >> +static __always_inline unsigned long steal_account_process_tick(void) >> { >> #ifdef CONFIG_PARAVIRT >> if (static_key_false(¶virt_steal_enabled)) { >> @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void) >> return steal_jiffies; >> } >> #endif >> - return false; >> + return 0; >> } >> >> /* >> @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk) >> >> static void __vtime_account_system(struct task_struct *tsk) >> { >> - cputime_t delta_cpu = get_vtime_delta(tsk); >> + cputime_t delta_time = get_vtime_delta(tsk); >> + cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick()); >> >> - account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu)); >> + if (steal_time < delta_time) { >> + delta_time -= steal_time; >> + account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time)); >> + } >> } >> >> void vtime_account_system(struct task_struct *tsk) >> @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk) >> >> void vtime_account_user(struct task_struct *tsk) >> { >> - cputime_t delta_cpu; >> + cputime_t delta_time, steal_time; >> >> write_seqcount_begin(&tsk->vtime_seqcount); >> tsk->vtime_snap_whence = VTIME_SYS; >> if (vtime_delta(tsk)) { >> - delta_cpu = get_vtime_delta(tsk); >> - account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); >> + delta_time = get_vtime_delta(tsk); >> + steal_time = jiffies_to_cputime(steal_account_process_tick()); >> + >> + if (steal_time < delta_time) { >> + delta_time -= steal_time; >> + account_user_time(tsk, delta_time, cputime_to_scaled(delta_time)); >> + } >> } >> write_seqcount_end(&tsk->vtime_seqcount); >> } >> > > You're adding almost the same code to two callers of get_vtime_delta out > of three. I don't know the vtime accounting code very well, but why > doesn't the same apply to account_idle_time? St stuff is accounted when vCPUs(tasks on host) are enqueued in rb trees of pCPUs, which means that they are ready to run until they finially reach CPUs. However, when vCPUs are idle, they will be dequeued from rb trees and the time will be not accounted as st. > > If it does, you should instead change get_vtime_delta to process steal > time and subtract it from the result. > > Secondarily, when can it happen that steal_time > delta_time? Rik explanation it when he reply to v1. http://www.gossamer-threads.com/lists/linux/kernel/2441175?do=post_view_threaded#2441175 Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting 2016-06-07 12:15 ` Wanpeng Li @ 2016-06-07 12:34 ` Paolo Bonzini 0 siblings, 0 replies; 8+ messages in thread From: Paolo Bonzini @ 2016-06-07 12:34 UTC (permalink / raw) To: Wanpeng Li Cc: Wanpeng Li, linux-kernel, kvm, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Radim Krčmář On 07/06/2016 14:15, Wanpeng Li wrote: >> > >> > You're adding almost the same code to two callers of get_vtime_delta out >> > of three. I don't know the vtime accounting code very well, but why >> > doesn't the same apply to account_idle_time? > St stuff is accounted when vCPUs(tasks on host) are enqueued in rb > trees of pCPUs, which means that they are ready to run until they > finially reach CPUs. However, when vCPUs are idle, they will be > dequeued from rb trees and the time will be not accounted as st. Why not? If idle=poll, for example, any time the guest is suspended (and thus cannot poll) does count as stolen time. In addition, you are going to account the stolen time anyway sooner or later, and then it will be accounted wrong (subtracted to either user or system time). I really believe you should do the change directly in get_vtime_delta. Paolo >> > If it does, you should instead change get_vtime_delta to process steal >> > time and subtract it from the result. >> > >> > Secondarily, when can it happen that steal_time > delta_time? > Rik explanation it when he reply to v1. > http://www.gossamer-threads.com/lists/linux/kernel/2441175?do=post_view_threaded#2441175 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug @ 2016-06-07 8:33 Wanpeng Li 2016-06-07 8:33 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li 0 siblings, 1 reply; 8+ messages in thread From: Wanpeng Li @ 2016-06-07 8:33 UTC (permalink / raw) To: linux-kernel, kvm Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker From: Wanpeng Li <wanpeng.li@hotmail.com> I observed that sometimes st is 100% instantaneous, then idle is 100% even if there is a cpu hog on the guest cpu after the cpu hotplug comes back(N.B. this can not always be readily reproduced). I add trace to capture it as below: cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0 cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291 <idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000 <idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437 The steal clock warp and then steal_jiffies underflow. Rik also pointed out to me: | I have seen stuff like that with live migration too, in the past The root cause of steal clock warp during hotplug is kvm_steal_time reset to 0 after cpu hotplug comes back which should be preexiting guest value. This patch fix it by don't reset kvm_steal_time during guest cpu hotplug. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- v2 -> v3: * fix the root cause v1 -> v2: * update patch subject, description and comments * deal with the case where steal time suddenly increases by a ludicrous amount arch/x86/kernel/kvm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index eea2a6f..1ef5e48 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void) if (!has_steal_clock) return; - memset(st, 0, sizeof(*st)); - wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED)); pr_info("kvm-stealtime: cpu %d, msr %llx\n", cpu, (unsigned long long) slow_virt_to_phys(st)); -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting 2016-06-07 8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li @ 2016-06-07 8:33 ` Wanpeng Li 0 siblings, 0 replies; 8+ messages in thread From: Wanpeng Li @ 2016-06-07 8:33 UTC (permalink / raw) To: linux-kernel, kvm Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel), Rik van Riel, Thomas Gleixner, Frederic Weisbecker, Paolo Bonzini, Radim Krčmář From: Wanpeng Li <wanpeng.li@hotmail.com> This patch adds guest steal-time support to full dynticks CPU time accounting. After the following commit: ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity") ... time sampling became jiffy based, even if it's still listened to ring boundaries, so steal_account_process_tick() is reused to account how many 'ticks' are stolen-time, after the last accumulation. Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- v3 -> v4: * fix grammar errors, thanks Ingo * cleanup fragile codes, thanks Ingo v2 -> v3: * convert steal time jiffies to cputime v1 -> v2: * fix divide zero bug, thanks Rik kernel/sched/cputime.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 75f98c5..9ff036b 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime) cpustat[CPUTIME_IDLE] += (__force u64) cputime; } -static __always_inline bool steal_account_process_tick(void) +static __always_inline unsigned long steal_account_process_tick(void) { #ifdef CONFIG_PARAVIRT if (static_key_false(¶virt_steal_enabled)) { @@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void) return steal_jiffies; } #endif - return false; + return 0; } /* @@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk) static void __vtime_account_system(struct task_struct *tsk) { - cputime_t delta_cpu = get_vtime_delta(tsk); + cputime_t delta_time = get_vtime_delta(tsk); + cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick()); - account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu)); + if (steal_time < delta_time) { + delta_time -= steal_time; + account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time)); + } } void vtime_account_system(struct task_struct *tsk) @@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk) void vtime_account_user(struct task_struct *tsk) { - cputime_t delta_cpu; + cputime_t delta_time, steal_time; write_seqcount_begin(&tsk->vtime_seqcount); tsk->vtime_snap_whence = VTIME_SYS; if (vtime_delta(tsk)) { - delta_cpu = get_vtime_delta(tsk); - account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu)); + delta_time = get_vtime_delta(tsk); + steal_time = jiffies_to_cputime(steal_account_process_tick()); + + if (steal_time < delta_time) { + delta_time -= steal_time; + account_user_time(tsk, delta_time, cputime_to_scaled(delta_time)); + } } write_seqcount_end(&tsk->vtime_seqcount); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-06-07 12:34 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1465661590-4732-1-git-send-email-wanpeng.li@hotmail.com> [not found] ` <SG2PR02MB15507073C531BAA82E9C8E67805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> 2016-06-07 10:34 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Paolo Bonzini [not found] ` <SG2PR02MB15505ACED7653E515DE21620805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> 2016-06-07 11:50 ` Fw: " Wanpeng Li 2016-06-07 11:53 ` Paolo Bonzini [not found] ` <SG2PR02MB15501BB06CAE47E4406FF7AD805D0@SG2PR02MB1550.apcprd02.prod.outlook.com> 2016-06-07 10:47 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Paolo Bonzini 2016-06-07 11:23 ` Wanpeng Li 2016-06-07 12:15 ` Wanpeng Li 2016-06-07 12:34 ` Paolo Bonzini 2016-06-07 8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li 2016-06-07 8:33 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).