From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756838AbZCDPdS (ORCPT ); Wed, 4 Mar 2009 10:33:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753529AbZCDPdH (ORCPT ); Wed, 4 Mar 2009 10:33:07 -0500 Received: from smtp3.ultrahosting.com ([74.213.175.254]:60252 "EHLO smtp.ultrahosting.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753473AbZCDPdG (ORCPT ); Wed, 4 Mar 2009 10:33:06 -0500 Date: Wed, 4 Mar 2009 10:23:16 -0500 (EST) From: Christoph Lameter X-X-Sender: cl@qirst.com To: David Rientjes cc: Paul Menage , Pekka Enberg , Andrew Morton , Randy Dunlap , linux-kernel@vger.kernel.org Subject: Re: [patch 2/2] slub: enforce cpuset restrictions for cpu slabs In-Reply-To: Message-ID: References: <6599ad830903031355y3e12dec6n4b7b675d354f8e72@mail.gmail.com> <6599ad830903031653t231eb921md60a1aa21effa87b@mail.gmail.com> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 3 Mar 2009, David Rientjes wrote: > > Presumably in most cases all cpusets would have slab_hardwall set to > > the same value. > > Christoph, would a `slab_hardwall' cpuset setting address your concerns? That would make the per object memory policies in SLUB configurable? If you can do that without regression and its clean then it would be acceptable. Again if you want per object memory policies in SLUB then it needs to be added consistently. You would also f.e. have to check for an MPOL_BIND condition where you check for cpuset nodes and make sure that __slab_alloc goes round robin on MPOL_INTERLEAVE etc etc. You end up with a similar nightmare implementation of that stuff as in SLAB. And as far as I know this still has its issues since f.e. the MPOL_INTERLEAVE for objects is fuzzing with the MPOL_INTERLEAVE node for pages which may result in strange sequences of placement of pages on nodes because there were intermediate allocations from slabs etc etc. Memory policies and cpusets were initially designed to deal with page allocations not with allocations of small objects. If you read the numactl manpage then it becomes quite clear that we are dealing with page chunks (look at the --touch or --strict options etc). The intend is to spread memory in page chunks over NUMA nodes. That is satisfied if the page allocations of the slab allocator are controllable by memory policies and cpuset. And yes the page allocations may only roughly correlate to the tasks that are consuming objects from shared pools.