From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751463AbdBAHbo (ORCPT ); Wed, 1 Feb 2017 02:31:44 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:46061 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751292AbdBAHbl (ORCPT ); Wed, 1 Feb 2017 02:31:41 -0500 Subject: Re: [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask To: Mel Gorman , Anshuman Khandual References: <20170130203003.dm2ydoi3e6cbbwcj@suse.de> <20170131142237.27097-1-khandual@linux.vnet.ibm.com> <20170131160029.ubt6fvw6oh2fgxpd@suse.de> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, vbabka@suse.cz, minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com, srikar@linux.vnet.ibm.com, haren@linux.vnet.ibm.com, jglisse@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com From: Anshuman Khandual Date: Wed, 1 Feb 2017 13:01:24 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20170131160029.ubt6fvw6oh2fgxpd@suse.de> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17020107-0024-0000-0000-00000395B8FE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17020107-0025-0000-0000-0000110FD088 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-31_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702010074 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/31/2017 09:30 PM, Mel Gorman wrote: > On Tue, Jan 31, 2017 at 07:52:37PM +0530, Anshuman Khandual wrote: >> At present, top_cpuset.mems_allowed is same as node_states[N_MEMORY] and it >> cannot be changed at the runtime. Maximum possible node_states[N_MEMORY] >> also gets reflected in top_cpuset.effective_mems interface. It prevents some >> one from removing or restricting memory placement which will be applicable >> system wide on a given memory node through cpuset mechanism which might be >> limiting. This solves the problem by enabling update_nodemask() function to >> accept changes to top_cpuset.mems_allowed as well. Once changed, it also >> updates the value of top_cpuset.effective_mems. Updates all it's task's >> mems_allowed nodemask as well. It calls cpuset_inc() to make sure cpuset >> is accounted for in the buddy allocator through cpusets_enabled() check. >> > > What's the point of allowing the root cpuset to be restricted? After an extended period of run time on a system, currently if we have to run HW diagnostics and dump (which are run out of band) for debug purpose, we have to stop further allocations to the node. Hot plugging the memory node out of the kernel will achieve this. But it can also be made possible by just enabling top_cpuset.memory_migrate and then restricting all the allocations by removing the node from top_cpuset. mems_allowed nodemask. This will force all the existing allocations out of the target node. More importantly it also extends the cpuset memory restriction feature to the logical completion without adding any regressions for the existing use cases. Then why not do this ? Does it add any overhead ? In the future this feature can also be used to isolate a memory node from all possible general allocations and at the same time provide an alternate method for explicit allocation into it (still working on this part, though have a hack right now). The current RFC series proposes one such possible use case through the top_cpuset.mems_allowed nodemask. But in this case it is being restricted during boot as well as after hotplug of a memory only NUMA node. If you think currently this does not have a use case to stand on it's own, then I will carry it along with this patch series as part of the proposed cpuset based isolation solution (with explicit allocation access to the isolated node) as described just above. - Anshuman