From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69EEEC43387 for ; Wed, 19 Dec 2018 09:32:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4093321873 for ; Wed, 19 Dec 2018 09:32:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728529AbeLSJcM (ORCPT ); Wed, 19 Dec 2018 04:32:12 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:59492 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727522AbeLSJcL (ORCPT ); Wed, 19 Dec 2018 04:32:11 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1CAF9A78; Wed, 19 Dec 2018 01:32:11 -0800 (PST) Received: from queper01-lin (queper01-lin.cambridge.arm.com [10.1.195.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D44123F575; Wed, 19 Dec 2018 01:32:08 -0800 (PST) Date: Wed, 19 Dec 2018 09:32:03 +0000 From: Quentin Perret To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Linux PM , Giovanni Gherdovich , Doug Smythies , Srinivas Pandruvada , Peter Zijlstra , Linux Kernel Mailing List , Frederic Weisbecker , Mel Gorman , Daniel Lezcano , hu1.chen@intel.com, Lorenzo Pieralisi Subject: Re: [PATCH] cpuidle: New timer events oriented governor for tickless systems Message-ID: <20181219093200.3e5zr7ixxowboanv@queper01-lin> References: <128769314.7imKxugEn6@aspire.rjw.lan> <20181218114949.c6npu357fj65l5iv@queper01-ThinkPad-T460s> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 18 Dec 2018 at 16:31:13 (+0100), Rafael J. Wysocki wrote: > > > + /* > > > + * Count and sum the most recent idle duration values less than > > > + * the target residency of the state selected so far, find the > > > + * max. > > > + */ > > > + for (i = 0; i < INTERVALS; i++) { > > > + unsigned int val = cpu_data->intervals[i]; > > > + > > > + if (val >= drv->states[idx].target_residency) > > > + continue; > > > + > > > + count++; > > > + sum += val; > > > + } > > > + > > > + /* > > > + * Give up unless the majority of the most recent idle duration > > > + * values are in the interesting range. > > > + */ > > > + if (count > INTERVALS / 2) { > > > > I understand this probably works well enough in practice, but how 'robust' > > is supposed to be this filtering here ? If you have, say, 2 tasks with > > prime periods, then the hyper-period (which is when the pattern repeats) > > can get high really quickly, and you'll never see the full extent of > > that pattern with this mechanism. > > That's correct, but this is just about whether or not there is a > reason to select an idle state shallower than the one selected > already. How the pattern looks like exactly doesn't matter too much > here AFAICS. Right, I think what I was trying to say is that if you could actually capture at least one full hyper-period, then the avg_us you're computing here would be 'stable' as long as you're not changing workloads. So the prediction should be accurate overall, in average. But maybe we don't want that. At least it's not obvious we always want that. Maybe the hyper-period is so large that you'd be better off looking only at portion of it and locally optimize things ... So yes, this is very much a theoretical question. And the results will, as always, largely depend on the workload you're running, so nevermind. > > Yes, it's hard to do much better without introducing tons of complexity > > in here, but how did you define INTERVALS=8 and so ? > > I blatantly stole this number from the menu governor. :-) Fair enough ;-) > > > + unsigned int avg_us = div64_u64(sum, count); > > > + > > > + /* > > > + * Avoid spending too much time in an idle state that > > > + * would be too shallow. > > > + */ > > > + if (!(tick_nohz_tick_stopped() && avg_us < TICK_USEC)) { > > > > IIUC, this is saying we shouldn't trust the predicted sleep time if it > > is shorter than the tick and the tick is stopped. What is the use case > > you have in mind for this ? > > > > In other words, in which scenario do you expect to see a lot of high > > frequency non-timer-related wake-ups and have the tick disabled ? > > It is mostly theoretical, but the rationale here is that the cost of > using a shallow idle state with the tick stopped is potentially very > high (the CPU may stay in that idle state for a very long time then), > so it is better to assume misprediction in that case to avoid that > cost. I see, and since we should in fact have the tick enabled when loads of tasks are running, then this check should usually be harmless in practice. Makes sense. Thanks, Quentin