From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754585AbdC1HTa (ORCPT ); Tue, 28 Mar 2017 03:19:30 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:36068 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754313AbdC1HT2 (ORCPT ); Tue, 28 Mar 2017 03:19:28 -0400 MIME-Version: 1.0 In-Reply-To: <1490636129.8850.76.camel@redhat.com> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> From: Wanpeng Li Date: Tue, 28 Mar 2017 15:19:25 +0800 Message-ID: Subject: Re: [BUG nohz]: wrong user and system time accounting To: Rik van Riel Cc: Luiz Capitulino , Frederic Weisbecker , "linux-kernel@vger.kernel.org" , linux-rt-users@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2017-03-28 1:35 GMT+08:00 Rik van Riel : > On Mon, 2017-03-27 at 09:56 +0800, Wanpeng Li wrote: >> >> Actually after I bisect, the first bad commit is ff9a9b4c4334 >> ("sched, >> time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity"). The bug >> can be reproduced readily if CONFIG_CONTEXT_TRACKING_FORCE is true > > At the time, we thought it was an "occasionally bad" / "unlucky" > kind of bug, not a systemic issue, like your observations seem > to suggest. > >> Let's consider the cpu which has responsibility for the global >> timekeeping, as the tracing posted above, the vtime_account_user() is >> called before tick_sched_timer() which will update jiffies, so >> jiffies >> is stale in vtime_account_user() and the run time in userspace is >> skipped, the vtime_user_enter() is called after jiffies update, so >> both the time in userspace and in kernel are accumulated to sys >> time. >> If the housekeeping cpu is idle when CONFIG_NO_HZ_FULL, everything is >> fine. However, if you give stress to the housekeeping cpu, top will >> show 100% sys-time of both the housekeeping cpu and the other cpus >> who >> have at least two tasks running on and in full_nohz mode. I think it >> is because the stress delays the timer interrupt handling in some >> degree, then the jiffies is not updated timely before other cpus >> access it in vtime_account_user(). >> >> I think we can keep syscalls/exceptions context tracking still in >> jiffies based sampling and utilize local_clock() in vtime_delta() >> again for irqs which avoids jiffies stale influence. I can make a >> patch if the idea is acceptable or there is any better proposal. :) > > Making that patch seems worthwhile, but I would like to > know what the root cause is of the issue that is being > observed. > > Is the problem due to the nohz_full CPU receiving an > interrupt at the same time the timer interrupt fires on > the housekeeping CPU? > > Is it due to a nohz_full CPU updating jiffies all by > itself from irq context? In that case, could it be > better to always have that be done by the housekeeping > CPU? I observed that the jiffies is always updated by housekeeping CPU as we expected. Regards, Wanpeng Li