From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933251AbdDEPWq (ORCPT ); Wed, 5 Apr 2017 11:22:46 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41483 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755315AbdDEPW3 (ORCPT ); Wed, 5 Apr 2017 11:22:29 -0400 Date: Wed, 5 Apr 2017 20:52:15 +0530 From: Srikar Dronamraju To: Michal Hocko Cc: Ingo Molnar , Peter Zijlstra , LKML , Mel Gorman , Rik van Riel Subject: Re: [PATCH] sched: Fix numabalancing to work with isolated cpus Reply-To: Srikar Dronamraju References: <1491326848-5748-1-git-send-email-srikar@linux.vnet.ibm.com> <20170405125743.GB7258@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20170405125743.GB7258@dhcp22.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable x-cbid: 17040515-0008-0000-0000-0000054FF1D5 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17040515-0009-0000-0000-0000135C0881 Message-Id: <20170405152215.GA6019@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-05_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704050132 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Michal Hocko [2017-04-05 14:57:43]: > On Tue 04-04-17 22:57:28, Srikar Dronamraju wrote: > [...] > > For example: > > perf bench numa mem --no-data_rand_walk -p 4 -t $THREADS -G 0 -P 3072 -T 0 -l 50 -c -s 1000 > > would call sched_setaffinity that resets the cpus_allowed mask. > > > > Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175 > > Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168 > > Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168 > > Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168 > > Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168 > > > > The isolated cpus are part of the cpus allowed list. In the above case, > > numabalancing ends up scheduling some of these tasks on isolated cpus. > > Why is this bad? If the task is allowed to run on isolated CPUs then why 1. kernel-parameters.txt states: isolcpus as "Isolate CPUs from the general scheduler." So the expectation that numabalancing can schedule tasks on it is wrong. 2. If numabalancing was disabled, the task would never run on the isolated CPUs. 3. With the faulty behaviour, it was observed that tasks scheduled on the isolated cpus might end up taking more time, because they never get a chance to move back to a node which has local memory. 4. The isolated cpus may be idle at that point, but actual work may be scheduled on isolcpus later (when numabalancing had already scheduled work on to it.) Since scheduler doesnt do any balancing on isolcpus even if they are overloaded and the system is completely free, the isolcpus stay overloaded. > shouldn't its numa balancing be allowed the same? The changelog > describes what but doesn't explain _why_ this change is needed/useful.