From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932815AbdDZSMo (ORCPT ); Wed, 26 Apr 2017 14:12:44 -0400 Received: from mail-oi0-f49.google.com ([209.85.218.49]:35945 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932787AbdDZSMg (ORCPT ); Wed, 26 Apr 2017 14:12:36 -0400 MIME-Version: 1.0 In-Reply-To: <20170424201444.GC14169@wtj.duckdns.org> References: <20170424201344.GA14169@wtj.duckdns.org> <20170424201444.GC14169@wtj.duckdns.org> From: Vincent Guittot Date: Wed, 26 Apr 2017 20:12:09 +0200 Message-ID: Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg To: Tejun Heo Cc: Ingo Molnar , Peter Zijlstra , linux-kernel , Linus Torvalds , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24 April 2017 at 22:14, Tejun Heo wrote: > We noticed that with cgroup CPU controller in use, the scheduling > > Note the drastic increase in p99 scheduling latency. After > investigation, it turned out that the update_sd_lb_stats(), which is > used by load_balance() to pick the most loaded group, was often > picking the wrong group. A CPU which has one schbench running and Can the problem be on the load balance side instead ? and more precisely in the wakeup path ? After looking at the trace, it seems that task placement happens at wake up path and if it fails to select the right idle cpu at wake up, you will have to wait for a load balance which is alreayd too late > another queued wouldn't report the correspondingly higher It will as load_avg includes the runnable_load_avg so whatever load is in runnable_load_avg will be in load_avg too. But at the contrary, runnable_load_avg will not have the blocked that is going to wake up soon in the case of schbench One last thing, the load_avg of an idle CPU can stay blocked for a while (until a load balance happens that will update blocked load) and can be seen has "busy" whereas it is not. Could it be a reason of your problem ? I have an ongoing patch to solve the problem at least partly if this can be a reason > weighted_cpuload() and get looked over as the target of load > balancing. > > weighted_cpuload() is the root cfs_rq's runnable_load_avg which is the > sum of the load_avg of all queued sched_entities. Without cgroups or > at the root cgroup, each task's load_avg contributes directly to the > sum. When a task wakes up or goes to sleep, the change is immediately > reflected on runnable_load_avg which in turn affects load balancing. > > #else /* CONFIG_FAIR_GROUP_SCHED */