From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753160AbcHPGyy (ORCPT ); Tue, 16 Aug 2016 02:54:54 -0400 Received: from mail-wm0-f47.google.com ([74.125.82.47]:37920 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752974AbcHPGyx (ORCPT ); Tue, 16 Aug 2016 02:54:53 -0400 MIME-Version: 1.0 In-Reply-To: <1471313483.32433.33.camel@redhat.com> References: <1468421405-20056-1-git-send-email-fweisbec@gmail.com> <1468421405-20056-2-git-send-email-fweisbec@gmail.com> <1470751579.13905.77.camel@redhat.com> <20160810125212.78564dc2@annuminas.surriel.com> <1470969892.13905.120.camel@redhat.com> <20160812115803.0f26211c@annuminas.surriel.com> <1471273244.32433.22.camel@redhat.com> <1471313483.32433.33.camel@redhat.com> From: Wanpeng Li Date: Tue, 16 Aug 2016 14:54:50 +0800 Message-ID: Subject: Re: [PATCH] time,virt: resync steal time when guest & host lose sync To: Rik van Riel Cc: Frederic Weisbecker , Ingo Molnar , LKML , Paolo Bonzini , Peter Zijlstra , Wanpeng Li , Thomas Gleixner , Radim Krcmar , Mike Galbraith Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2016-08-16 10:11 GMT+08:00 Rik van Riel : > On Tue, 2016-08-16 at 09:31 +0800, Wanpeng Li wrote: >> 2016-08-15 23:00 GMT+08:00 Rik van Riel : >> > On Mon, 2016-08-15 at 16:53 +0800, Wanpeng Li wrote: >> > > 2016-08-12 23:58 GMT+08:00 Rik van Riel : >> > > [...] >> > > > Wanpeng, does the patch below work for you? >> > > >> > > It will break steal time for full dynticks guest, and there is a >> > > calltrace of thread_group_cputime_adjusted call stack, RIP is >> > > cputime_adjust+0xff/0x130. >> > >> > How? This patch is equivalent to passing ULONG_MAX to >> > steal_account_process_time, which you tried to no ill >> > effect before. >> >> https://lkml.org/lkml/2016/6/8/404/ Paolo original suggested to add >> the max cputime limit to the vtime, when the cpu is running in nohz >> full mode and stop the tick, jiffies will be updated depends on clock >> source instead of clock event device in >> guest(tick_nohz_update_jiffies() callsite, ktime_get()), so it will >> not be affected by lost clock ticks, my patch keeps the limit for >> vtime and remove the limit to non-vtime. However, your patch removes >> the limit for both scenarios and results in the below calltrace for >> vtime. > > I understand what it does. > > What I would like to understand is WHY enforcing the limit > is the right thing when using vtime, and the wrong thing > in all other scenarios. I observed that function get_vtime_delta() underflow which means that delta < other when debugging your bugfix patch, I believe that is why Paolo suggested to add the max cputime limit to vtime, he also pointed out the potentional underflow before https://lkml.org/lkml/2016/6/8/404/ > > Can you explain why you change the limit to ULONG_MAX in > three call sites, but not in the last one? > > What is different about the first three, versus the last > one? > > Are you sure it should be changed in three places, and > not in eg. two? > > This seems like something we should try to understand, > rather than patch up haphazardly. > > The changelog of your patch could use an explanation of > why the change is the correct way to go. > >> > >> > Do you have the full call trace? > > OK, so you are seeing a divide by zero in cputime_adjust. > > Specifically, this would be scale_stime getting passed > a (utime + stime) that adds up to 0. Stranger still, that > only gets called if neither utime or stime is 0, meaning > that one of utime or stime is negative, at the exact same > magnitude as the other. > > Looking at thread_group_cputime(), I see some room for > rounding errors. > > do { > seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, > &seq); > times->utime = sig->utime; > times->stime = sig->stime; > times->sum_exec_runtime = sig->sum_sched_runtime; > > for_each_thread(tsk, t) { > task_cputime(t, &utime, &stime); > times->utime += utime; > times->stime += stime; > times->sum_exec_runtime += > task_sched_runtime(t); > } > /* If lockless access failed, take the lock. */ > nextseq = 1; > } while (need_seqretry(&sig->stats_lock, seq)); > > Specifically, task_cputime calls vtime_delta, which works > off jiffies, while task_sched_runtime works straight off > the sched clock. I try to replace task_sched_runtime() by t->se.sum_exec_runtime just for testing (refer to Commit d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles"), divide zero still appear. > > I can see how this would lead to a non-zero ->sum_exec_runtime, > while ->utime and/or ->stime are still at zero. This is fine. > > What I do not see is how ->utime or ->stime could end up negative, > which is what would be needed to hit that divide by zero. stime is negative since underflow incurred in function get_vtime_delta(), however, it is u64, so it is a very big number, and the total is 0 after for loop in scale_time(), but not 0 before that. Regards, Wanpeng Li