From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753379AbbHUPzl (ORCPT ); Fri, 21 Aug 2015 11:55:41 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:35826 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753100AbbHUPzk (ORCPT ); Fri, 21 Aug 2015 11:55:40 -0400 Date: Fri, 21 Aug 2015 17:55:36 +0200 From: Frederic Weisbecker To: preetium@andrew.cmu.edu Cc: Peter Zijlstra , Vatika Harlalka , mingo@redhat.com, tglx@linutronix.de, rafael.j.wysocki@intel.com, schwidefsky@de.ibm.com, linux-kernel@vger.kernel.org, preeti.murthy@gmail.com Subject: Re: [PATCH 0/2] nohz_full: Offload task_tick to remote housekeeping cpus for nohz_full cpus Message-ID: <20150821155535.GA4739@lerouge> References: <20150813122223.GC16853@twins.programming.kicks-ass.net> <20150813124401.GA29958@lerouge> <20150813150545.GD16853@twins.programming.kicks-ass.net> <20150813153614.GD29958@lerouge> <8d30cc8ff09828555d603e33d58ae0af.squirrel@webmail.andrew.cmu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8d30cc8ff09828555d603e33d58ae0af.squirrel@webmail.andrew.cmu.edu> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 20, 2015 at 08:50:55AM -0400, preetium@andrew.cmu.edu wrote: > > On Thu, Aug 13, 2015 at 05:05:45PM +0200, Peter Zijlstra wrote: > >> I see nothing like the stuff I asked for in here, on top it creates the > >> stupid tick.c file. > > > > Right. I initially thought that we should make sched_tick() just work with > > long delays. > > Then tglx suggested the offline idea but I lost track about our > > conversation. > > > > But yeah making that scheduler_tick() working with long delays sound much > > better. Certainly > > much more work but that's a natural evolution after all. It should pay in > > longer term. > > > > We can start with update_cpu_load_active() which only works with HZ > > frequency updates or > > nohz idle zero load decay. Now I think that stuff is only used for load > > balancing. I had > > hopes this thing could be removed. I think Alex Shin (IIRC) tried but the > > patchset didn't > > make it. > > I don't think Peter is talking about delays in updating the scheduler stats. > Looking at the earlier discussion, it looks like we need to do periodic tick > tasks only on demand on the nohz_full cpus. We will perhaps need to do the > following(reiterating some points that Peter said earlier) : > > 1. One of the tasks that scheduler_tick() does is trigger_load_balance(). If > we have to get rid of the residual tick, we need to move load balancing on > nohz_full cpus into nohz_idle_balance(). In addition to load balancing on > the idle cpus, this routine will load balance on the nohz_full cpus as well, > when they are running single tasks. > > This seems to be a good move because it will avoid pulling more tasks on > to the nohz_full cpus, when they are running single tasks, unless needed. I suspect trigger_load_balance() is fine because now nohz full CPUs are part of cpu_isolated_map. I believe in that case they are on_null_domains(). If not then we should have a similar ignore check. > > 2. In nohz_idle_load_balance(), there needs to be routines similar to > update_idle_cpu_load() for nohz_full cpus so that the cpu loads are updated > before triggering load balance on them. Lets call this > update_nohz_full_cpu_load(). > This should include update_curr() and update_cpu_load_active() for nohz_full > cpus. nohz_idle_load_balance() shouldn't happen if trigger_load_balance() is ignored. > > 3. When scheduling stats are read, update_curr() and > update_cpu_load_active() will > be called remotely. Concerning update_cpu_load_active(), I wonder if rq->cpu_load[i > 0] are read for CPUs from cpu_isolated_map. I suspect that an isolated CPU shouldn't belong to any struct sched_group. Now we can force disable SCHED_LB_BIAS otherwise. Concerning update_curr(), we have yet to find all the places that do remote read. But again, perhaps an isolated cpu isn't subject to sched stats (involved in upodate_curr()) remotely read. That's all stuff we need to verify.