From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261716AbVASNRu (ORCPT ); Wed, 19 Jan 2005 08:17:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261717AbVASNRu (ORCPT ); Wed, 19 Jan 2005 08:17:50 -0500 Received: from holly.csn.ul.ie ([136.201.105.4]:7129 "EHLO holly.csn.ul.ie") by vger.kernel.org with ESMTP id S261716AbVASNRh (ORCPT ); Wed, 19 Jan 2005 08:17:37 -0500 Date: Wed, 19 Jan 2005 13:17:30 +0000 (GMT) From: Mel Gorman X-X-Sender: mel@skynet To: "Tolentino, Matthew E" Cc: Linux Memory Management List , Linux Kernel Mailing List Subject: RE: [RFC] Avoiding fragmentation through different allocator In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 17 Jan 2005, Tolentino, Matthew E wrote: > >I considered adding a new zone but I felt it would be a massive job for > >what I considered to be a simple problem. I think my approach is nice > >and isolated within the allocator itself and will be less likely to > >affect other code. > > Just for clarity, I prefer this approach over adding zones, > hence my pursuit of something akin to it. > Ok, cool. > >On possibility is that we could say that the UserRclm and KernRclm pool > >are always eligible for hotplug and have hotplug banks only > >satisy those > >allocations pushing KernNonRclm allocations to fixed banks. How is it > >currently known if a bank of memory is hotplug? Is there a > >node for each > >hotplug bank? If yes, we could flag those nodes to only > >satisify UserRclm > >and KernRclm allocations and force fallback to other nodes. > > The hardware/firmware has to tell the kernel in some way. In > my case it is ACPI that delineates between regions that may be > removed. No, there isn't a node for each bank of hot-plug > memory. The reason I was pursuing this was to be able to > avoid coarse granularity distinctions like that. As there is not a node for each hotplug bank, this has to happen at the zone level. Architectures have the option of defining their own memmap_init() although only ia64 take advantage of it. If we wanted to be able to identify hotplug pages in an independant manner, we woulc implement memmap_init() for hotplug and fill zone->free_area_usemap accordingly. Currently the bitmap in there has two bits for each 2^MAX_ORDER block that looks like; 00 = Kernel non-reclaimable 10 = Kernel reclaimable 01 = User reclaimable So, we could say 11 is for hotplug memory. Alternatively if it is possible to have a system that consists entirely of hotplug memory, we could add a third bit. As it is one bit per 2^MAX_ORDER pages in the system, it would not be a big chunk of memory. The question is what to do with that information then. We can't just say that User pages go to hotplug regions as that will introduce two problems. One of balancing and the second of what happens when an unreclaimable page is in a bank we want to move without page migration in place. > >The danger is > >that allocations would fail because non-hotplug banks were already full > >and pageout would not happen because the watermarks were satisified. > > Which implies a potential need for balancing between user/kernel > lists, no? > If we're not careful, yes and I have a gut-feeling that says we should not need to be balancing anything for hotplug. > >If you have already posted a version of the patch (you have > >feedback so I > >guess it's there somewhere), can you send me a link to the thread where > >you introduced your approach? It's possible that we just need > >to merge the > >ideas. > > No, I hadn't posted it yet due to chasing a bug. However, perhaps > now I'll instead focus on adding the necessary hotplug support > into your patch, hence merging the hotplug requirements/ideas? > I've no problem with that. It's just a case of how we use the information exactly. > >It's because I consider all 2^MAX_ORDER pages in a zone to be > >equal where > >as I'm guessing you don't. Until they are split, there is > >nothing special > >about them. It is only when it is split that I want it reserved for a > >purpose. > > > >However, if we knew there were blocks that were hot-pluggable, we could > >just have a hotplug-global and non-hotplug-global pool. If > >it's a UserRclm > >or KernRclm allocation, split from hotplug-global, otherwise use > >non-hotplug-global. It'd increase the memory requirements of > >the patch a > >bit though. > > Exactly. Perhaps this could just be isolated via the > CONFIG_MEMORY_HOTPLUG build option, thus not increasing the memory > requirements in the common case... > I think so. The changes would be removing pages from the global pool with macros rather than directly and having the hotplug and non-hotplug versions. We just need to think more of what happens when a kernel allocation comes along, fixed memory is depleted but there is plenty of hotplug memory left. -- Mel Gorman