On Fri, 2016-03-04 at 15:59 +0100, Thomas Gleixner wrote: > On cpu hotplug the steal time accounting can keep a stale rq- > >prev_steal_time > value over cpu down and up. So after the cpu comes up again the delta > calculation in steal_account_process_tick() wreckages itself due to > the > unsigned math: > >          u64 steal = paravirt_steal_clock(smp_processor_id()); >           >          steal -= this_rq()->prev_steal_time; > > So if steal is smaller than rq->prev_steal_time we end up with an > insane large > value which then gets added to rq->prev_steal_time, resulting in a > permanent > wreckage of the accounting. As a consequence the per cpu stats in > /proc/stat > become stale. > > Nice trick to tell the world how idle the system is (100%) while the > cpu is > 100% busy running tasks. Though we prefer realistic numbers. > > None of the accounting values which use a previous value to account > for > fractions is reset at cpu hotplug time. update_rq_clock_task() has a > sanity > check for prev_irq_time and prev_steal_time_rq, but that sanity check > solely > deals with clock warps and limits the /proc/stat visible wreckage. > The > prev_time values are still wrong. > > Solution is simple: Reset rq->prev_*_time when the cpu is plugged in > again. > > Fixes: commit e6e6685accfa "KVM guest: Steal time accounting" > Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for > stolen time" > Fixes: commit aa483808516c "sched: Remove irq time from available CPU > power" > Signed-off-by: Thomas Gleixner > Cc: stable@vger.kernel.org Acked-by: Rik van Riel -- All Rights Reversed.