On Tue, 2016-08-16 at 14:54 +0800, Wanpeng Li wrote: > 2016-08-16 10:11 GMT+08:00 Rik van Riel : > > On Tue, 2016-08-16 at 09:31 +0800, Wanpeng Li wrote: > > > 2016-08-15 23:00 GMT+08:00 Rik van Riel : > > > > On Mon, 2016-08-15 at 16:53 +0800, Wanpeng Li wrote: > > > > > 2016-08-12 23:58 GMT+08:00 Rik van Riel : > > > > > [...] > > > > > > Wanpeng, does the patch below work for you? > > > > > > > > > > It will break steal time for full dynticks guest, and there > > > > > is a > > > > > calltrace of thread_group_cputime_adjusted call stack, RIP is > > > > > cputime_adjust+0xff/0x130. > > > > > > > > How?  This patch is equivalent to passing ULONG_MAX to > > > > steal_account_process_time, which you tried to no ill > > > > effect before. > > > > > > https://lkml.org/lkml/2016/6/8/404/ Paolo original suggested to > > > add > > > the max cputime limit to the vtime, when the cpu is running in > > > nohz > > > full mode and stop the tick, jiffies will be updated depends on > > > clock > > > source instead of clock event device in > > > guest(tick_nohz_update_jiffies() callsite, ktime_get()), so it > > > will > > > not be affected by lost clock ticks, my patch keeps the limit for > > > vtime and remove the limit to non-vtime. However, your patch > > > removes > > > the limit for both scenarios and results in the below calltrace > > > for > > > vtime. > > > > I understand what it does. > > > > What I would like to understand is WHY enforcing the limit > > is the right thing when using vtime, and the wrong thing > > in all other scenarios. > > I observed that function get_vtime_delta() underflow which means that > delta < other when debugging your bugfix patch, I believe that is why > Paolo suggested to add the max cputime limit to vtime, he also > pointed > out the potentional underflow before > https://lkml.org/lkml/2016/6/8/404/ Looking at get_vtime_delta() I can see exactly how the underflow can happen.  The interval returned by account_other_time() is NOT rounded down to the nearest jiffy, while the base interval it is subtracted from is. Furthermore, even if we did not have that rounding issue, a guest could get preempted in-between determining delta, and calling account_other_time(), which could also cause the issue. Could you re-send your patch with a comment in get_vtime_delta(), as well as the changelog, explaining exactly why account_other_time() should be limited from get_vtime_delta(), but not from the other three call sites? Documentation could save future developers a bunch of debugging time on this code. thanks, Rik -- All Rights Reversed.