From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752677AbeC1IjV (ORCPT ); Wed, 28 Mar 2018 04:39:21 -0400 Received: from mailout5.zih.tu-dresden.de ([141.30.67.74]:46598 "EHLO mailout5.zih.tu-dresden.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751513AbeC1IjS (ORCPT ); Wed, 28 Mar 2018 04:39:18 -0400 Subject: Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick To: "Rafael J. Wysocki" CC: Peter Zijlstra , Linux PM , Frederic Weisbecker , "Thomas Gleixner" , Paul McKenney , Doug Smythies , Rik van Riel , "Aubrey Li" , Mike Galbraith , LKML References: <2390019.oHdSGtR3EE@aspire.rjw.lan> <2249320.0Z4q8AXauv@aspire.rjw.lan> <6462e44a-e207-6b97-22bf-ad4aed69afc2@tu-dresden.de> <4198010.6ArFqS34NK@aspire.rjw.lan> From: Thomas Ilsche Message-ID: Date: Wed, 28 Mar 2018 10:38:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: MSX-L106.msx.ad.zih.tu-dresden.de (172.26.34.106) To MSX-L104.msx.ad.zih.tu-dresden.de (172.26.34.104) X-PMWin-Version: 4.0.3, Antivirus-Engine: 3.70.2, Antivirus-Data: 5.49 X-TUD-Virus-Scanned: mailout5.zih.tu-dresden.de Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-03-28 10:13, Rafael J. Wysocki wrote: > On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki wrote: >> On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote: >>> On 2018-03-20 16:45, Rafael J. Wysocki wrote: >>>> From: Rafael J. Wysocki >>>> >>>> In order to address the issue with short idle duration predictions >>>> by the idle governor after the tick has been stopped, reorder the >>>> code in cpuidle_idle_call() so that the governor idle state selection >>>> runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned >>>> by cpuidle_select() to decide whether or not to stop the tick. >>>> >>>> This isn't straightforward, because menu_select() invokes >>>> tick_nohz_get_sleep_length() to get the time to the next timer >>>> event and the number returned by the latter comes from >>>> __tick_nohz_idle_enter(). Fortunately, however, it is possible >>>> to compute that number without actually stopping the tick and with >>>> the help of the existing code. >>> >>> I think something is wrong with the new tick_nohz_get_sleep_length. >>> It seems to return a value that is too large, ignoring immanent >>> non-sched timer. >> >> That's a very useful hint, let me have a look. >> >>> I tested idle-loop-v7.3. It looks very similar to my previous results >>> on the first idle-loop-git-version [1]. Idle and traditional synthetic >>> powernightmares are mostly good. >> >> OK >> >>> But it selects too deep C-states for short idle periods, which is bad >>> for power consumption [2]. >> >> That still needs to be improved, then. >> >>> I tracked this down with additional tests using >>> __attribute__((optimize("O0"))) menu_select >>> and perf probe. With this the behavior seems slightly different, but it >>> shows that data->next_timer_us is: >>> v4.16-rc6: the expected ~500 us [3] >>> idle-loop-v7.3: many milliseconds to minutes [4]. >>> This leads to the governor to wrongly selecting C6. >>> >>> Checking with 372be9e and 6ea0577, I can confirm that the change is >>> introduced by this patch. >> >> Yes, that's where the most intrusive reordering happens. > > Overall, this is an interesting conundrum, because the case in > question is when the tick should never be stopped at all during the > workload and the code's behavior in that case should not change, so > the change was not intentional. > > Now, from walking through the code, as long as can_stop_idle_tick() > returns 'true' all should be fine or at least I don't see why there is > any difference in behavior in that case. > > However, if can_stop_idle_tick() returns 'false' (for example, because > need_resched() returns 'true' when it is evaluated), the behavior *is* > different in a couple of ways. I sort of know how that can be > addressed, but I'd like to reproduce your results here. > > Are you still using the same workload as before to trigger this behavior? > Yes, the exact code I use is as follows $ gcc poller.c -O3 -fopenmp -o poller_omp $ GOMP_CPU_AFFINITY=0-35 ./poller_omp 500 #include #include #include int main(int argc, char *argv[]) { int sleep_us = 10000; if (argc == 2) { sleep_us = atoi(argv[1]); } #pragma omp parallel { while (1) { usleep(sleep_us); } } }