From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61519C433ED for ; Fri, 21 May 2021 08:41:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3CF3861073 for ; Fri, 21 May 2021 08:41:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233232AbhEUInS (ORCPT ); Fri, 21 May 2021 04:43:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:58000 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233093AbhEUInP (ORCPT ); Fri, 21 May 2021 04:43:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DB976AACA; Fri, 21 May 2021 08:41:51 +0000 (UTC) Date: Fri, 21 May 2021 09:41:47 +0100 From: Mel Gorman To: "hasegawa-hitomi@fujitsu.com" Cc: Peter Zijlstra , "'fweisbec@gmail.com'" , "'mingo@kernel.org'" , "'tglx@linutronix.de'" , "'juri.lelli@redhat.com'" , "'vincent.guittot@linaro.org'" , "'dietmar.eggemann@arm.com'" , "'rostedt@goodmis.org'" , "'bsegall@google.com'" , "'bristot@redhat.com'" , "'linux-kernel@vger.kernel.org'" Subject: Re: Utime and stime are less when getrusage (RUSAGE_THREAD) is executed on a tickless CPU. Message-ID: <20210521084147.GG3672@suse.de> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 21, 2021 at 06:40:53AM +0000, hasegawa-hitomi@fujitsu.com wrote: > Hi Peter and Frederic > > > > > Would be superfluous for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y > > > architectures at the very least. > > > > > > It also doesn't help any of the other callers, like for example procfs. > > > > > > Something like the below ought to work and fix all variants I think. But > > > it does make the call significantly more expensive. > > > > > > Looking at thread_group_cputime() that already does something like this, > > > but that's also susceptible to a variant of this very same issue; since > > > it doesn't call it unconditionally, nor on all tasks, so if current > > > isn't part of the threadgroup and/or another task is on a nohz_full cpu, > > > things will go wobbly again. > > > > > > There's a note about syscall performance there, so clearly someone seems > > > to care about that aspect of things, but it does suck for nohz_full. > > > > > > Frederic, didn't we have remote ticks that should help with this stuff? > > > > > > And mostly I think the trade-off here is that if you run on nohz_full, > > > you're not expected to go do syscalls anyway (because they're sodding > > > expensive) and hence the accuracy of these sort of things is mostly > > > irrelevant. > > > > > > So it might be the use-case is just fundamentally bonkers and we > > > shouldn't really bother fixing this. > > > > > > Anyway? > > > > Typing be hard... that should 'obviously' be reading: Anyone? > > > I understand that there is a trade-off between performance and accuracy > and that this issue may have already been discussed. > However, as Peter mentions, the process of updating sum_exec_runtime > just before retrieving information is already implemented in > thread_group_cputime() in the root of RUSAGE_SELF etc. So, I think > RUSAGE_THREAD should follow suit and implement the same process. > I don't think it's a straight-forward issue. I know we've had to deal with bugs in the past where the overhead of getting CPU usage statistics was high enough to dominate workloads that had self-monitoring capabilities to the extent the self-monitoring was counter-productive. It was particularly problematic when self-monitoring was being activated to find the source of a slowdown. I tend to agree with Peter here that the fix may be worse than the problem ultimately where workloads are not necessarily willing to pay the cost of accuracy and as he pointed out already, it's expected nohz_full tasks are avoiding syscalls as much as possible. -- Mel Gorman SUSE Labs