From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753781AbbEKLLX (ORCPT <rfc822;w@1wt.eu>);
	Mon, 11 May 2015 07:11:23 -0400
Received: from mga09.intel.com ([134.134.136.24]:58205 "EHLO mga09.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753747AbbEKLLT (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 11 May 2015 07:11:19 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.13,406,1427785200"; 
   d="scan'208";a="569428830"
Message-ID: <1431342675.1418.148.camel@sauron.fi.intel.com>
Subject: Re: [PATCH] numa,sched: only consider less busy nodes as numa
 balancing destination
From: Artem Bityutskiy <dedekind1@gmail.com>
Reply-To: dedekind1@gmail.com
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, mgorman@suse.de, peterz@infradead.org,
        jhladky@redhat.com
Date: Mon, 11 May 2015 14:11:15 +0300
In-Reply-To: <554D1681.7040902@redhat.com>
References: <1430908530.7444.145.camel@sauron.fi.intel.com>
		 <20150506114128.0c846a37@cuia.bos.redhat.com>
	 <1431090801.1418.87.camel@sauron.fi.intel.com>
	 <554D1681.7040902@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4 (3.10.4-4.fc20) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, 2015-05-08 at 16:03 -0400, Rik van Riel wrote:
> This works well when dealing with tasks that are constantly
> running, but fails catastrophically when dealing with tasks
> that go to sleep, wake back up, go back to sleep, wake back
> up, and generally mess up the load statistics that the NUMA
> balancing code use in a random way.

Sleeping is what happens a lot I believe in this workload: processes do
a lot of network I/O, file I/O too, and a lot of IPC.

Would you please expand on this a bit more - why would this scenario
"mess up load statistics" ?

> If the normal scheduler load balancer is moving tasks the
> other way the NUMA balancer is moving them, things will
> not converge, and tasks will have worse memory locality
> than not doing NUMA balancing at all.

Are the regular and NUMA balancers independent?

Are there mechanisms to detect ping-pong situations? I'd like to verify
your theory, and these kinds of mechanisms would be helpful.

> Currently the load balancer has a preference for moving
> tasks to their preferred nodes (NUMA_FAVOUR_HIGHER, true),
> but there is no resistance to moving tasks away from their
> preferred nodes (NUMA_RESIST_LOWER, false).  That setting
> was arrived at after a fair amount of experimenting, and
> is probably correct.

I guess I can try making NUMA_RESIST_LOWER to be true and see what
happens. But probably first I need to confirm that your theory
(balancers playing ping-pong) is correct, any hints on how would I do
this?

Thanks!

Artem.