From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752523AbbEFOko (ORCPT ); Wed, 6 May 2015 10:40:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34119 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750808AbbEFOkl (ORCPT ); Wed, 6 May 2015 10:40:41 -0400 Message-ID: <554A27E8.1010508@redhat.com> Date: Wed, 06 May 2015 10:40:40 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: dedekind1@gmail.com CC: linux-kernel@vger.kernel.org, mgorman@suse.de, Peter Zijlstra , Jirka Hladky Subject: Re: autoNUMA web workload regression References: <1430908530.7444.145.camel@sauron.fi.intel.com> In-Reply-To: <1430908530.7444.145.camel@sauron.fi.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CC'ing Peter & Mel. Leaving Artem's email intact so they can read it :) On 05/06/2015 06:35 AM, Artem Bityutskiy wrote: > Hi Rik, > > we observe a tremendous regression between kernel version 3.16 and 3.17 > (and up), and I've bisected it to this commit: > > a43455a sched/numa: Ensure task_numa_migrate() checks the preferred node > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a43455a1d572daf7b730fe12eb747d1e17411365 > > We run a Web server (nginx) on a 2-socket Haswell server and we emulate > an e-Commerce Web-site. Clients send requests to the server and measure > the response time. Clients load the server quite heavily - CPU > utilization is more than 90% as measured with turbostat. We use Fedora > 20. > > If I take 3.17 and revert this patch, I observe 600% or more average > response time improvement comparing to vanilla 3.17. > > If I take 4.1-rc1 and revert this patch, I observe 300% or more average > response time improvement comparing to vanilla 3.17. > > I asked Fengguang Wu to run LKP workloads on multiple 4 and 8 socket > machines for v4.1-rc1 with and without this patch, and there seem to be > no difference - all the micro-benchmarks performed similarly and the > difference were withing the error range. > > IOW, it looks like this patch has bad effect on Web server QoS (slower > response time). What do you think? The changeset you found fixes the issue where both node A and B are fully loaded (or overloaded), and tasks are located on the wrong node. Without that changeset, workloads in that situation will never converge, because we do not consider the best node for a task. I have seen that changeset cause another regression in the past, but on a much less heavily loaded system, with around 20-50% CPU utilization, and a single process multi-threaded workload, it causes the workload to not be properly spread out across the system. I wonder if we should try a changeset where the NUMA balancing code never considers moving a task from a less busy to a busier node, regardless of whether or not the destination node is the preferred node, or some other node? I can cook up a quick patch to test that out. Any opinions Peter or Mel?