From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753067AbdDQOlE (ORCPT <rfc822;w@1wt.eu>);
        Mon, 17 Apr 2017 10:41:04 -0400
Received: from mx1.redhat.com ([209.132.183.28]:36490 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751432AbdDQOlC (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 17 Apr 2017 10:41:02 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DF16B85A07
Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lvenanci@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com DF16B85A07
Reply-To: lvenanci@redhat.com
Subject: Re: [RFC 2/3] sched/topology: fix sched groups on NUMA machines with
 mesh topology
References: <1492091769-19879-1-git-send-email-lvenanci@redhat.com>
 <1492091769-19879-3-git-send-email-lvenanci@redhat.com>
 <20170414113813.vktcpsrsuu2st2fm@hirez.programming.kicks-ass.net>
 <20170414165857.7n75lxk4usfsbjaq@hirez.programming.kicks-ass.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, lwang@redhat.com, riel@redhat.com,
        Mike Galbraith <efault@gmx.de>, Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@kernel.org>
From: Lauro Venancio <lvenanci@redhat.com>
Organization: Red Hat
Message-ID: <c21ceced-400c-0986-e53f-4c5eea8b23dd@redhat.com>
Date: Mon, 17 Apr 2017 11:40:59 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20170414165857.7n75lxk4usfsbjaq@hirez.programming.kicks-ass.net>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 17 Apr 2017 14:41:02 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 04/14/2017 01:58 PM, Peter Zijlstra wrote:
> On Fri, Apr 14, 2017 at 01:38:13PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 13, 2017 at 10:56:08AM -0300, Lauro Ramos Venancio wrote:
>>> This patch constructs the sched groups from each CPU perspective. So, on
>>> a 4 nodes machine with ring topology, while nodes 0 and 2 keep the same
>>> groups as before [(3, 0, 1)(1, 2, 3)], nodes 1 and 3 have new groups
>>> [(0, 1, 2)(2, 3, 0)]. This allows moving tasks between any node 2-hops
>>> apart.
>> Ah,.. so after drawing pictures I see what went wrong; duh :-(
>>
>> An equivalent patch would be (if for_each_cpu_wrap() were exposed):
>>
>> @@ -521,11 +588,11 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
>>  	struct cpumask *covered = sched_domains_tmpmask;
>>  	struct sd_data *sdd = sd->private;
>>  	struct sched_domain *sibling;
>> -	int i;
>> +	int i, wrap;
>>  
>>  	cpumask_clear(covered);
>>  
>> -	for_each_cpu(i, span) {
>> +	for_each_cpu_wrap(i, span, cpu, wrap) {
>>  		struct cpumask *sg_span;
>>  
>>  		if (cpumask_test_cpu(i, covered))
>>
>>
>> We need to start iterating at @cpu, not start at 0 every time.
>>
>>
> OK, please have a look here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core

Looks good, but please hold these patches while patch 3 is not applied.
Without it, the sched_group_capacity (sg->sgc) instance is not selected
correctly and we have an important performance regression in all NUMA
machines.

I will continue this discussion in the other thread.