On Thu, 2016-08-11 at 18:11 +0800, Wanpeng Li wrote: > 2016-08-11 0:52 GMT+08:00 Rik van Riel : > > On Wed, 10 Aug 2016 07:39:08 +0800 > > Wanpeng Li wrote: > > > > > The regression is caused by your commit "sched,time: Count > > > actually > > > elapsed irq & softirq time". > > > > Wanpeng, does this patch fix your issue? > > I test this against kvm guest (nohz_full, four vCPUs running on one > pCPU, four cpuhog processes running on four vCPUs). > before this fix patch: > vCPU0's st is 100%, other vCPUs' st are ~75%. > after this fix patch: > all vCPUs' st are ~85%. > However, w/o commit "sched,time: Count actually elapsed irq & softirq > time", all vCPUs' st are ~75%. If you pass ULONG_MAX as the maxtime argument to steal_account_process_time(), does the steal time get accounted properly at 75%? If that is the case, I have a hypothesis: 1) The guest is running so much slower when sharing    a CPU 4 ways, that it is accounting only ~90% of    wall clock time as CPU time, due to missing the    other 10% or so of clock ticks. 2) account_process_tick() only ever processes one tick    at a time - if it gets called only 90x a second for    a 100Hz guest, but all the steal time recorded by    the host is fully accounted (ULONG_MAX limit), then    that could make up for lost/skipped timer ticks. 3) not accounting "extra" steal time (beyond the amount    of time accounted by account_process_tick) would reduce    the total amount of time that gets accounted if there    are missed ticks, taking time away from user/system/etc Does the above make sense? Am I overlooking some mechanism through which lost/skipped ticks are made up for in the kernel?  I looked through the code in kernel/time/ briefly, but did not spot it... -- All Rights Reversed.