From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>, Mark <markk@clara.co.uk>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Sudip Mukherjee <sudipm.mukherjee@gmail.com> Subject: [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED Date: Wed, 27 Jan 2016 22:19:14 -0800 [thread overview] Message-ID: <20160128061914.32541.97351.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new mm zones that are bumping up against the current maximum limit of 4 zones, i.e. 2 bits in page->flags. When adding a zone this equation still needs to be satisified: SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS ZONE_DEVICE currently tries to satisfy this equation by requiring that ZONE_DMA be disabled, but this is untenable given generic kernels want to support ZONE_DEVICE and ZONE_DMA simultaneously. ZONE_CMA would like to increase the amount of memory covered per section, but that limits the minimum granularity at which consecutive memory ranges can be added via devm_memremap_pages(). The trade-off of what is acceptable to sacrifice depends heavily on the platform. For example, ZONE_CMA is targeted for 32-bit platforms where page->flags is constrained, but those platforms likely do not care about the minimum granularity of memory hotplug. A big iron machine with 1024 numa nodes can likely sacrifice ZONE_DMA where a general purpose distribution kernel can not. CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected when the number of configured zones exceeds 4. It documents the configuration symbols and definitions that get modified when ZONES_WIDTH is greater than 2. For now, it steals a bit from NODES_SHIFT. Later on it can be used to document the definitions that get modified when a 32-bit configuration wants more zone bits. Note that GFP_ZONE_TABLE poses an interesting constraint since include/linux/gfp.h gets included by the 32-bit portion of a 64-bit build. We need to be careful to only build the table for zones that have a corresponding gfp_t flag. GFP_ZONES_SHIFT is introduced for this purpose. This patch does not attempt to solve the problem of adding a new zone that also has a corresponding GFP_ flag. Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931 Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Reported-by: Mark <markk@clara.co.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/Kconfig | 6 ++++-- include/linux/gfp.h | 33 ++++++++++++++++++++------------- include/linux/page-flags-layout.h | 2 ++ mm/Kconfig | 7 +++++-- 4 files changed, 31 insertions(+), 17 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 330e738ccfc1..9dfc52eb3976 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1409,8 +1409,10 @@ config NUMA_EMU config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP - range 1 10 - default "10" if MAXSMP + range 1 10 if !NR_ZONES_EXTENDED + range 1 9 if NR_ZONES_EXTENDED + default "10" if MAXSMP && !NR_ZONES_EXTENDED + default "9" if MAXSMP && NR_ZONES_EXTENDED default "6" if X86_64 default "3" depends on NEED_MULTIPLE_NODES diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 28ad5f6494b0..5979c2c80140 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -329,22 +329,29 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) * - * ZONES_SHIFT must be <= 2 on 32 bit platforms. + * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms. */ -#if 16 * ZONES_SHIFT > BITS_PER_LONG -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer +#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4 +/* ZONE_DEVICE is not a valid GFP zone specifier */ +#define GFP_ZONES_SHIFT 2 +#else +#define GFP_ZONES_SHIFT ZONES_SHIFT +#endif + +#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG +#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer #endif #define GFP_ZONE_TABLE ( \ - (ZONE_NORMAL << 0 * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT) \ - | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT) \ - | (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT) \ - | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT) \ + (ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \ + | (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \ + | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT) \ ) /* @@ -369,8 +376,8 @@ static inline enum zone_type gfp_zone(gfp_t flags) enum zone_type z; int bit = (__force int) (flags & GFP_ZONEMASK); - z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) & - ((1 << ZONES_SHIFT) - 1); + z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) & + ((1 << GFP_ZONES_SHIFT) - 1); VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1); return z; } diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index da523661500a..77b078c103b2 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -17,6 +17,8 @@ #define ZONES_SHIFT 1 #elif MAX_NR_ZONES <= 4 #define ZONES_SHIFT 2 +#elif MAX_NR_ZONES <= 8 +#define ZONES_SHIFT 3 #else #error ZONES_SHIFT -- too many zones configured adjust calculation #endif diff --git a/mm/Kconfig b/mm/Kconfig index 97a4e06b15c0..cb5377624df3 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -651,8 +651,6 @@ config IDLE_PAGE_TRACKING config ZONE_DEVICE bool "Device memory (pmem, etc...) hotplug support" if EXPERT - default !ZONE_DMA - depends on !ZONE_DMA depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on X86_64 #arch_add_memory() comprehends device memory @@ -666,5 +664,10 @@ config ZONE_DEVICE If FS_DAX is enabled, then say Y. +config NR_ZONES_EXTENDED + bool + default n if !64BIT + default y if ZONE_DEVICE && ZONE_DMA && ZONE_DMA32 + config FRAME_VECTOR bool
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>, Mark <markk@clara.co.uk>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Sudip Mukherjee <sudipm.mukherjee@gmail.com> Subject: [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED Date: Wed, 27 Jan 2016 22:19:14 -0800 [thread overview] Message-ID: <20160128061914.32541.97351.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new mm zones that are bumping up against the current maximum limit of 4 zones, i.e. 2 bits in page->flags. When adding a zone this equation still needs to be satisified: SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS ZONE_DEVICE currently tries to satisfy this equation by requiring that ZONE_DMA be disabled, but this is untenable given generic kernels want to support ZONE_DEVICE and ZONE_DMA simultaneously. ZONE_CMA would like to increase the amount of memory covered per section, but that limits the minimum granularity at which consecutive memory ranges can be added via devm_memremap_pages(). The trade-off of what is acceptable to sacrifice depends heavily on the platform. For example, ZONE_CMA is targeted for 32-bit platforms where page->flags is constrained, but those platforms likely do not care about the minimum granularity of memory hotplug. A big iron machine with 1024 numa nodes can likely sacrifice ZONE_DMA where a general purpose distribution kernel can not. CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected when the number of configured zones exceeds 4. It documents the configuration symbols and definitions that get modified when ZONES_WIDTH is greater than 2. For now, it steals a bit from NODES_SHIFT. Later on it can be used to document the definitions that get modified when a 32-bit configuration wants more zone bits. Note that GFP_ZONE_TABLE poses an interesting constraint since include/linux/gfp.h gets included by the 32-bit portion of a 64-bit build. We need to be careful to only build the table for zones that have a corresponding gfp_t flag. GFP_ZONES_SHIFT is introduced for this purpose. This patch does not attempt to solve the problem of adding a new zone that also has a corresponding GFP_ flag. Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931 Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Reported-by: Mark <markk@clara.co.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/Kconfig | 6 ++++-- include/linux/gfp.h | 33 ++++++++++++++++++++------------- include/linux/page-flags-layout.h | 2 ++ mm/Kconfig | 7 +++++-- 4 files changed, 31 insertions(+), 17 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 330e738ccfc1..9dfc52eb3976 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1409,8 +1409,10 @@ config NUMA_EMU config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP - range 1 10 - default "10" if MAXSMP + range 1 10 if !NR_ZONES_EXTENDED + range 1 9 if NR_ZONES_EXTENDED + default "10" if MAXSMP && !NR_ZONES_EXTENDED + default "9" if MAXSMP && NR_ZONES_EXTENDED default "6" if X86_64 default "3" depends on NEED_MULTIPLE_NODES diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 28ad5f6494b0..5979c2c80140 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -329,22 +329,29 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) * - * ZONES_SHIFT must be <= 2 on 32 bit platforms. + * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms. */ -#if 16 * ZONES_SHIFT > BITS_PER_LONG -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer +#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4 +/* ZONE_DEVICE is not a valid GFP zone specifier */ +#define GFP_ZONES_SHIFT 2 +#else +#define GFP_ZONES_SHIFT ZONES_SHIFT +#endif + +#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG +#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer #endif #define GFP_ZONE_TABLE ( \ - (ZONE_NORMAL << 0 * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT) \ - | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT) \ - | (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT) \ - | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT) \ + (ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \ + | (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \ + | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT) \ + | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT) \ ) /* @@ -369,8 +376,8 @@ static inline enum zone_type gfp_zone(gfp_t flags) enum zone_type z; int bit = (__force int) (flags & GFP_ZONEMASK); - z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) & - ((1 << ZONES_SHIFT) - 1); + z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) & + ((1 << GFP_ZONES_SHIFT) - 1); VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1); return z; } diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index da523661500a..77b078c103b2 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -17,6 +17,8 @@ #define ZONES_SHIFT 1 #elif MAX_NR_ZONES <= 4 #define ZONES_SHIFT 2 +#elif MAX_NR_ZONES <= 8 +#define ZONES_SHIFT 3 #else #error ZONES_SHIFT -- too many zones configured adjust calculation #endif diff --git a/mm/Kconfig b/mm/Kconfig index 97a4e06b15c0..cb5377624df3 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -651,8 +651,6 @@ config IDLE_PAGE_TRACKING config ZONE_DEVICE bool "Device memory (pmem, etc...) hotplug support" if EXPERT - default !ZONE_DMA - depends on !ZONE_DMA depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on X86_64 #arch_add_memory() comprehends device memory @@ -666,5 +664,10 @@ config ZONE_DEVICE If FS_DAX is enabled, then say Y. +config NR_ZONES_EXTENDED + bool + default n if !64BIT + default y if ZONE_DEVICE && ZONE_DMA && ZONE_DMA32 + config FRAME_VECTOR bool -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2016-01-28 6:20 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-01-28 6:19 Dan Williams [this message] 2016-01-28 6:19 ` [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED Dan Williams 2016-02-02 5:42 ` Andrew Morton 2016-02-02 5:42 ` Andrew Morton 2016-02-07 6:10 ` Dan Williams 2016-02-07 6:10 ` Dan Williams 2016-02-29 12:33 ` Vlastimil Babka 2016-02-29 12:33 ` Vlastimil Babka 2016-02-29 17:55 ` Dan Williams 2016-02-29 17:55 ` Dan Williams 2016-03-01 0:06 ` Vlastimil Babka 2016-03-01 0:06 ` Vlastimil Babka 2016-03-01 2:06 ` Dan Williams 2016-03-01 2:06 ` Dan Williams 2016-03-01 8:31 ` Vlastimil Babka 2016-03-01 8:31 ` Vlastimil Babka 2016-03-01 23:43 ` Dan Williams 2016-03-01 23:43 ` Dan Williams 2016-03-02 8:10 ` Vlastimil Babka 2016-03-02 8:10 ` Vlastimil Babka
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160128061914.32541.97351.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@linux.intel.com \ --cc=iamjoonsoo.kim@lge.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=markk@clara.co.uk \ --cc=mgorman@suse.de \ --cc=riel@redhat.com \ --cc=sudipm.mukherjee@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.