From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754143AbdDLNS3 (ORCPT ); Wed, 12 Apr 2017 09:18:29 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:36044 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752677AbdDLNSY (ORCPT ); Wed, 12 Apr 2017 09:18:24 -0400 Date: Wed, 12 Apr 2017 15:18:19 +0200 From: Frederic Weisbecker To: Thomas Gleixner Cc: Wanpeng Li , Mike Galbraith , Rik van Riel , Luiz Capitulino , "linux-kernel@vger.kernel.org" , Peter Zijlstra Subject: Re: [BUG nohz]: wrong user and system time accounting Message-ID: <20170412131818.GB21309@lerouge> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170329131656.1d6cb743@redhat.com> <1490818125.28917.11.camel@redhat.com> <1490848051.4167.57.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 11, 2017 at 04:22:48PM +0200, Thomas Gleixner wrote: > On Thu, 30 Mar 2017, Wanpeng Li wrote: > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > > index f3778e2b..f1ee393 100644 > > --- a/kernel/sched/cputime.c > > +++ b/kernel/sched/cputime.c > > @@ -676,18 +676,21 @@ void thread_group_cputime_adjusted(struct > > task_struct *p, u64 *ut, u64 *st) > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN > > static u64 vtime_delta(struct task_struct *tsk) > > { > > - unsigned long now = READ_ONCE(jiffies); > > + u64 now = local_clock(); > > + u64 delta; > > + > > + delta = now - tsk->vtime_snap; > > > > - if (time_before(now, (unsigned long)tsk->vtime_snap)) > > + if (delta < TICK_NSEC) > > return 0; > > > > - return jiffies_to_nsecs(now - tsk->vtime_snap); > > + return jiffies_to_nsecs(delta / TICK_NSEC); > > So you replaced a jiffies based approach with a jiffies based approach. > > > } > > > > static u64 get_vtime_delta(struct task_struct *tsk) > > { > > - unsigned long now = READ_ONCE(jiffies); > > - u64 delta, other; > > + u64 delta = vtime_delta(tsk); > > + u64 other; > > > > /* > > * Unlike tick based timing, vtime based timing never has lost > > @@ -696,10 +699,9 @@ static u64 get_vtime_delta(struct task_struct *tsk) > > * elapsed time. Limit account_other_time to prevent rounding > > * errors from causing elapsed vtime to go negative. > > */ > > - delta = jiffies_to_nsecs(now - tsk->vtime_snap); > > other = account_other_time(delta); > > WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE); > > - tsk->vtime_snap = now; > > + tsk->vtime_snap += delta; > > Here is how it works^Wfails > > For simplicity tsk->vtime_snap starts at 0 > HZ = 1000 > > CPU0 CPU1 > sysexit() > account_system() > now == 0 > delta = vtime_delta() <- 0ns > tsk->vtime_snap += delta; == 0ns > > busy_loop(995us) > > sysenter() > now == 996us > account_user() > delta = vtime_delta() <- 0ns > tsk->vtime_snap += delta == 0ns > > sysexit() > account_system() > now == 1001us > delta = vtime_delta() <- 10000000ns > > ^^^^ Gets accounted to system > > tsk->vtime_snap += delta; == 10000000ns > > It's not different from the current jiffies based stuff at all. Same > failure mode. Yes you're right, I got confused again. So to fix this we could do our snapshots at a frequency lower than HZ but still high enough to avoid overhead. Something like TICK_NSEC / 2 ?