From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S965002AbbEMNwM (ORCPT <rfc822;w@1wt.eu>);
	Wed, 13 May 2015 09:52:12 -0400
Received: from mx1.redhat.com ([209.132.183.28]:38372 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S934274AbbEMNwI (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 13 May 2015 09:52:08 -0400
Message-ID: <555356E8.5000307@redhat.com>
Date: Wed, 13 May 2015 09:51:36 -0400
From: Rik van Riel <riel@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>
CC: dedekind1@gmail.com, linux-kernel@vger.kernel.org, mgorman@suse.de,
        jhladky@redhat.com
Subject: Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing
 destination
References: <1430908530.7444.145.camel@sauron.fi.intel.com> <20150506114128.0c846a37@cuia.bos.redhat.com> <1431090801.1418.87.camel@sauron.fi.intel.com> <554D1681.7040902@redhat.com> <1431438610.20417.0.camel@sauron.fi.intel.com> <55522005.1080705@redhat.com> <20150513062906.GJ3007@worktop.Skamania.guest>
In-Reply-To: <20150513062906.GJ3007@worktop.Skamania.guest>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 05/13/2015 02:29 AM, Peter Zijlstra wrote:
> On Tue, May 12, 2015 at 11:45:09AM -0400, Rik van Riel wrote:
>> I have a few poorly formed ideas on what could be done about that:
>>
>> 1) have fbq_classify_rq take the current task on the rq into account,
>>    and adjust the fbq classification if all the runnable-but-queued
>>    tasks are on the right node
> 
> So while looking at this I came up with the below; it treats anything
> inside ->active_nodes as a preferred node for balancing purposes.
> 
> Would that make sense?

Not necessarily.

If there are two workloads on a multi-threaded system, and they
have not yet converged on one node each, both nodes will be part
of ->active_nodes.

Treating them as preferred nodes means the load balancing code
would do nothing at all to help the workloads converge.

> I'll see what I can do about current in the runqueue type
> classification.

This can probably be racy, so just checking a value in the
current task struct for the runqueue should be ok. I am not
aware of any architecture where the task struct address can
become invalid. Worst thing that could happen is that the
bits examined change value.

>> 2) ensure that rq->nr_numa_running and rq->nr_preferred_running also
>>    get incremented for kernel threads that are bound to a particular
>>    CPU - currently CPU-bound kernel threads will cause the NUMA
>>    statistics to look like a CPU has tasks that do not belong on that
>>    NUMA node
> 
> I'm thinking accounting those to nr_pinned, lemme see how that works
> out.

Cool.

-- 
All rights reversed