From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261545AbVALWrn (ORCPT ); Wed, 12 Jan 2005 17:47:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261547AbVALWrb (ORCPT ); Wed, 12 Jan 2005 17:47:31 -0500 Received: from fmr14.intel.com ([192.55.52.68]:36298 "EHLO fmsfmr002.fm.intel.com") by vger.kernel.org with ESMTP id S261545AbVALWqR convert rfc822-to-8bit (ORCPT ); Wed, 12 Jan 2005 17:46:17 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Subject: RE: [RFC] Avoiding fragmentation through different allocator Date: Wed, 12 Jan 2005 14:45:44 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC] Avoiding fragmentation through different allocator Thread-Index: AcT46zICQ/iuC/zNRy+oGacQwELb+QABGpjg From: "Tolentino, Matthew E" To: "Mel Gorman" , "Linux Memory Management List" Cc: "Linux Kernel Mailing List" X-OriginalArrivalTime: 12 Jan 2005 22:45:51.0595 (UTC) FILETIME=[77A2C3B0:01C4F8F8] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel! >Instead of having one global MAX_ORDER-sized array of free >lists, there are >three, one for each type of allocation. Finally, there is a >list of pages of >size 2^MAX_ORDER which is a global pool of the largest pages >the kernel deals with. I've got a patch that I've been testing recently for memory hotplug that does nearly the exact same thing - break up the management of page allocations based on type - after having had a number of conversations with Dave Hansen on this topic. I've also prototyped this to use as an alternative to adding duplicate zones for delineating between memory that may be removed and memory that is not likely to ever be removable. I've only tested in the context of memory hotplug, but it does greatly simplify memory removal within individual zones. Your distinction between areas is pretty cool considering I've only distinguished at the coarser granularity of user vs. kernel to date. It would be interesting to throw KernelNonReclaimable into the mix as well although I haven't gotten there yet... ;-) >Once a 2^MAX_ORDER block of pages it split for a type of >allocation, it is >added to the free-lists for that type, in effect reserving it. >Hence, over >time, pages of the related types can be clustered together. This means >that if we wanted 2^MAX_ORDER number of pages, we could linearly scan a >block of pages allocated for UserReclaimable and page each of them out. Interesting. I took a slightly different approach due to some known delineations between areas that are defined to be non- removable vs. areas that may be removed at some point. Thus I'm only managing two distinct free_area lists currently. I'm curious as to the motivation for having a global MAX_ORDER size list that is allocation agnostic initially...is it so that the pages can evolve according to system demands (assuming MAX_ORDER sized chunks are eventually available again)? It looks like you left the per_cpu_pages as-is. Did you consider separating those as well to reflect kernel vs. user pools? >- struct free_area free_area[MAX_ORDER]; >+ struct free_area free_area_lists[ALLOC_TYPES][MAX_ORDER]; >+ struct free_area free_area_global; >+ >+ /* >+ * This map tracks what each 2^MAX_ORDER sized block >has been used for. >+ * When a page is freed, it's index within this bitmap >is calculated >+ * using (address >> MAX_ORDER) * 2 . This means that pages will >+ * always be freed into the correct list in free_area_lists >+ */ >+ unsigned long *free_area_usemap; So, the current user/kernelreclaim/kernelnonreclaim determination is based on this bitmap. Couldn't this be managed in individual struct pages instead, kind of like the buddy bitmap patches? I'm trying to figure out one last bug when I remove memory (via nonlinear sections) that has been dedicated to user allocations. After which perhaps I'll post it as well, although it is *very* similar. However it does demonstrate the utility of this approach for memory hotplug - specifically memory removal - without the complexity of adding more zones. matt