From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753460AbdDKOWy (ORCPT ); Tue, 11 Apr 2017 10:22:54 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:56305 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751816AbdDKOWx (ORCPT ); Tue, 11 Apr 2017 10:22:53 -0400 Date: Tue, 11 Apr 2017 16:22:48 +0200 (CEST) From: Thomas Gleixner To: Wanpeng Li cc: Mike Galbraith , Rik van Riel , Luiz Capitulino , Frederic Weisbecker , "linux-kernel@vger.kernel.org" , Peter Zijlstra Subject: Re: [BUG nohz]: wrong user and system time accounting In-Reply-To: Message-ID: References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170329131656.1d6cb743@redhat.com> <1490818125.28917.11.camel@redhat.com> <1490848051.4167.57.camel@gmx.de> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Mar 2017, Wanpeng Li wrote: > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index f3778e2b..f1ee393 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -676,18 +676,21 @@ void thread_group_cputime_adjusted(struct > task_struct *p, u64 *ut, u64 *st) > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN > static u64 vtime_delta(struct task_struct *tsk) > { > - unsigned long now = READ_ONCE(jiffies); > + u64 now = local_clock(); > + u64 delta; > + > + delta = now - tsk->vtime_snap; > > - if (time_before(now, (unsigned long)tsk->vtime_snap)) > + if (delta < TICK_NSEC) > return 0; > > - return jiffies_to_nsecs(now - tsk->vtime_snap); > + return jiffies_to_nsecs(delta / TICK_NSEC); So you replaced a jiffies based approach with a jiffies based approach. > } > > static u64 get_vtime_delta(struct task_struct *tsk) > { > - unsigned long now = READ_ONCE(jiffies); > - u64 delta, other; > + u64 delta = vtime_delta(tsk); > + u64 other; > > /* > * Unlike tick based timing, vtime based timing never has lost > @@ -696,10 +699,9 @@ static u64 get_vtime_delta(struct task_struct *tsk) > * elapsed time. Limit account_other_time to prevent rounding > * errors from causing elapsed vtime to go negative. > */ > - delta = jiffies_to_nsecs(now - tsk->vtime_snap); > other = account_other_time(delta); > WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE); > - tsk->vtime_snap = now; > + tsk->vtime_snap += delta; Here is how it works^Wfails For simplicity tsk->vtime_snap starts at 0 HZ = 1000 CPU0 CPU1 sysexit() account_system() now == 0 delta = vtime_delta() <- 0ns tsk->vtime_snap += delta; == 0ns busy_loop(995us) sysenter() now == 996us account_user() delta = vtime_delta() <- 0ns tsk->vtime_snap += delta == 0ns sysexit() account_system() now == 1001us delta = vtime_delta() <- 10000000ns ^^^^ Gets accounted to system tsk->vtime_snap += delta; == 10000000ns It's not different from the current jiffies based stuff at all. Same failure mode. Thanks, tglx