linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
@ 2015-07-30 16:34 Vlastimil Babka
  2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-07-30 16:34 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Mel Gorman, David Rientjes, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi, Johannes Weiner, linux-ia64, linuxppc-dev,
	cbe-oss-dev, kvm, Vlastimil Babka, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is valid")
as an optimized variant of alloc_pages_node(), that doesn't fallback to current
node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily
suggest that the allocation is restricted to the given node and fails
otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is
passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
thp: really limit transparent hugepage allocation to local node") and
b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of alloc_pages_exact*()
functions where 'exact' means exact size (instead of page order), which leads
to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's
an optimized variant of alloc_pages_node() not intended for general usage.
Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't duplicate
the API needlessly. The number of users would be small anyway.

Existing callers of alloc_pages_exact_node() are simply converted to call
__alloc_pages_node(), with two exceptions. sba_alloc_coherent() and
slob_new_page() both open-code the check for NUMA_NO_NODE, so they are
converted to use alloc_pages_node() instead. This means they no longer perform
some VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE),
it may hide wrong values which would be previously exposed. Both differences
will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily hiding
potentially buggy callers. Restricting the checks in alloc_pages_node() is
left for the next patch which can in turn expose more existing buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Acked-by: Robin Holt <robinmholt@gmail.com>
---
 Based on feedback from v1 and v2, The name is __alloc_pages_node() instead of
 alloc_pages_prefer_node() from v1. Two callsites were also converted to
 alloc_pages_node() instead. v2 was a RFC for linux-mm to settle on the API
 first. It tried keeping alloc_pages_exact_node() as a wrapper that adds
 __GFP_THISNODE but the consensus was to drop it.

 I'm CC'ing also maintainers of the callsites so they can verify that the
 callsites that don't pass __GFP_THISNODE are really not intended to restrict
 allocation to the given node. I went through them myself and each looked like
 it's better off if it can successfully allocate on a fallback node rather
 than fail. DavidR checked them also I think, but it's better if maintainers
 can verify that. I'm not completely sure about all the usages in sl*b due to
 multiple layers through which gfp flags are being passed.

 Patches 2 and 3 are mm-only so I don't CC everyone.

 arch/ia64/hp/common/sba_iommu.c   |  6 +-----
 arch/ia64/kernel/uncached.c       |  2 +-
 arch/ia64/sn/pci/pci_dma.c        |  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c                |  2 +-
 drivers/misc/sgi-xp/xpc_uv.c      |  2 +-
 include/linux/gfp.h               | 23 +++++++++++++++--------
 kernel/profile.c                  |  8 ++++----
 mm/filemap.c                      |  2 +-
 mm/huge_memory.c                  |  6 ++----
 mm/hugetlb.c                      |  4 ++--
 mm/memory-failure.c               |  2 +-
 mm/mempolicy.c                    |  4 ++--
 mm/migrate.c                      |  4 ++--
 mm/page_alloc.c                   |  2 --
 mm/slab.c                         |  2 +-
 mm/slob.c                         | 14 ++++----------
 mm/slub.c                         |  2 +-
 18 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..a6d6190 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 
 #ifdef CONFIG_NUMA
 	{
-		int node = ioc->node;
 		struct page *page;
 
-		if (node == NUMA_NO_NODE)
-			node = numa_node_id();
-
-		page = alloc_pages_exact_node(node, flags, get_order(size));
+		page = alloc_pages_node(ioc->node, flags, get_order(size));
 		if (unlikely(!page))
 			return NULL;
 
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 20e8a9b..f3976da 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -97,7 +97,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 
 	/* attempt to allocate a granule's worth of cached memory pages */
 
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				IA64_GRANULE_SHIFT-PAGE_SHIFT);
 	if (!page) {
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d0853e8..8f59907 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size,
 	 */
 	node = pcibus_to_node(pdev->bus);
 	if (likely(node >=0)) {
-		struct page *p = alloc_pages_exact_node(node,
+		struct page *p = __alloc_pages_node(node,
 						flags, get_order(size));
 
 		if (likely(p))
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index e865d74..2d4f60c 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -123,7 +123,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order)
 
 	area->nid = nid;
 	area->order = order;
-	area->pages = alloc_pages_exact_node(area->nid,
+	area->pages = __alloc_pages_node(area->nid,
 						GFP_KERNEL|__GFP_THISNODE,
 						area->order);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0dbeec1..881286b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3150,7 +3150,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
 	struct page *pages;
 	struct vmcs *vmcs;
 
-	pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order);
+	pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order);
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);
diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 95c8944..340b44d 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -239,7 +239,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
 	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
 
 	nid = cpu_to_node(cpu);
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				      pg_order);
 	if (page == NULL) {
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3bd64b1..d2c142b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -303,20 +303,28 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }
 
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
-						unsigned int order)
+/*
+ * Allocate pages, preferring the node given as nid. The node must be valid and
+ * online. For more general interface, see alloc_pages_node().
+ */
+static inline struct page *
+__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-	/* Unknown node is current node */
-	if (nid < 0)
-		nid = numa_node_id();
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
-static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
+/*
+ * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
+ * prefer the current CPU's node.
+ */
+static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	/* Unknown node is current node */
+	if (nid < 0)
+		nid = numa_node_id();
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
@@ -357,7 +365,6 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
 void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-/* This is different from alloc_pages_exact_node !!! */
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 #define __get_free_page(gfp_mask) \
diff --git a/kernel/profile.c b/kernel/profile.c
index a7bcd28..99513e1 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -339,7 +339,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -347,7 +347,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 			per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
 		}
 		if (!per_cpu(cpu_profile_hits, cpu)[0]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -543,14 +543,14 @@ static int create_hash_tables(void)
 		int node = cpu_to_mem(cpu);
 		struct page *page;
 
-		page = alloc_pages_exact_node(node,
+		page = __alloc_pages_node(node,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				0);
 		if (!page)
 			goto out_cleanup;
 		per_cpu(cpu_profile_hits, cpu)[1]
 				= (struct profile_hit *)page_address(page);
-		page = alloc_pages_exact_node(node,
+		page = __alloc_pages_node(node,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				0);
 		if (!page)
diff --git a/mm/filemap.c b/mm/filemap.c
index 204fd1c..b510a0d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -674,7 +674,7 @@ struct page *__page_cache_alloc(gfp_t gfp)
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = alloc_pages_exact_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, 0);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aa58a32..56355f2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
 	 */
 	up_read(&mm->mmap_sem);
 
-	*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
+	*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
 	if (unlikely(!*hpage)) {
 		count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
 		*hpage = ERR_PTR(-ENOMEM);
@@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
-	/* Only allocate from the target node */
-	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
-		__GFP_THISNODE;
+	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
 
 	/* release the mmap_sem read lock. */
 	new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e83fce5..4920bcb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1331,7 +1331,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 {
 	struct page *page;
 
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
 						__GFP_REPEAT|__GFP_NOWARN,
 		huge_page_order(h));
@@ -1483,7 +1483,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 				   __GFP_REPEAT|__GFP_NOWARN,
 				   huge_page_order(h));
 	else
-		page = alloc_pages_exact_node(nid,
+		page = __alloc_pages_node(nid,
 			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
 			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9700539..839f934 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1468,7 +1468,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
 	else
-		return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d6f2cae..87a1779 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -942,7 +942,7 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
 	else
-		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE |
+		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
 						    __GFP_THISNODE, 0);
 }
 
@@ -1998,7 +1998,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(hpage_node, *nmask)) {
 			mpol_cond_put(pol);
-			page = alloc_pages_exact_node(hpage_node,
+			page = __alloc_pages_node(hpage_node,
 						gfp | __GFP_THISNODE, order);
 			goto out;
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index d86cec0..cd673c8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1195,7 +1195,7 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 					pm->node);
 	else
-		return alloc_pages_exact_node(pm->node,
+		return __alloc_pages_node(pm->node,
 				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
 }
 
@@ -1555,7 +1555,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	int nid = (int) data;
 	struct page *newpage;
 
-	newpage = alloc_pages_exact_node(nid,
+	newpage = __alloc_pages_node(nid,
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4b220cb..88d2ee9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3511,8 +3511,6 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *
  * Like alloc_pages_exact(), but try to allocate on node nid first before falling
  * back.
- * Note this is not alloc_pages_exact_node() which allocates on a specific node,
- * but is not exact.
  */
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
diff --git a/mm/slab.c b/mm/slab.c
index 4c5910f..1783eda 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1595,7 +1595,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	if (memcg_charge_slab(cachep, flags, cachep->gfporder))
 		return NULL;
 
-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
 	if (!page) {
 		memcg_uncharge_slab(cachep, cachep->gfporder);
 		slab_out_of_memory(cachep, flags, nodeid);
diff --git a/mm/slob.c b/mm/slob.c
index 165bbd3..42e6e7e 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -44,10 +44,9 @@
  *
  * NUMA support in SLOB is fairly simplistic, pushing most of the real
  * logic down to the page allocator, and simply doing the node accounting
- * on the upper levels. In the event that a node id is explicitly
- * provided, alloc_pages_exact_node() with the specified node id is used
- * instead. The common case (or when the node id isn't explicitly provided)
- * will default to the current node, as per numa_node_id().
+ * on the upper levels. In the event that a node id is explicitly provided
+ * it is preferred for the allocation. The common case (or when the node id
+ * isn't explicitly provided) will default to the current CPU's node.
  *
  * Node aware pages are still inserted in to the global freelist, and
  * these are scanned for by matching against the node id encoded in the
@@ -191,12 +190,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
-#ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE)
-		page = alloc_pages_exact_node(node, gfp, order);
-	else
-#endif
-		page = alloc_pages(gfp, order);
+	page = alloc_pages_node(node, gfp, order);
 
 	if (!page)
 		return NULL;
diff --git a/mm/slub.c b/mm/slub.c
index 257283f..b48ad97 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1336,7 +1336,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-		page = alloc_pages_exact_node(node, flags, order);
+		page = __alloc_pages_node(node, flags, order);
 
 	if (!page)
 		memcg_uncharge_slab(s, order);
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()
  2015-07-30 16:34 [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Vlastimil Babka
@ 2015-07-30 16:34 ` Vlastimil Babka
  2015-07-30 17:35   ` Johannes Weiner
  2015-07-30 17:59   ` Christoph Lameter
  2015-07-30 16:34 ` [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node() Vlastimil Babka
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-07-30 16:34 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Mel Gorman, David Rientjes, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi, Johannes Weiner, linux-ia64, linuxppc-dev,
	cbe-oss-dev, kvm, Vlastimil Babka

Perform the same debug checks in alloc_pages_node() as are done in
__alloc_pages_node(), by making the former function a wrapper of the latter
one.

In addition to better diagnostics in DEBUG_VM builds for situations which
have been already fatal (e.g. out-of-bounds node id), there are two visible
changes for potential existing buggy callers of alloc_pages_node():

- calling alloc_pages_node() with any negative nid (e.g. due to arithmetic
  overflow) was treated as passing NUMA_NO_NODE and fallback to local node was
  applied. This will now be fatal.
- calling alloc_pages_node() with an offline node will now be checked for
  DEBUG_VM builds. Since it's not fatal if the node has been previously online,
  and this patch may expose some existing buggy callers, change the VM_BUG_ON
  in __alloc_pages_node() to VM_WARN_ON.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
---
v3: I think this is enough for what patch 4/4 in v2 tried to provide on top,
    so there's no patch 4/4 anymore. The DEBUG_VM-only fixup didn't seem
    justified to me.

 include/linux/gfp.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index d2c142b..4a12cae2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -310,23 +310,23 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 static inline struct page *
 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
 /*
  * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
- * prefer the current CPU's node.
+ * prefer the current CPU's node. Otherwise node must be valid and online.
  */
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	/* Unknown node is current node */
-	if (nid < 0)
+	if (nid == NUMA_NO_NODE)
 		nid = numa_node_id();
 
-	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
 #ifdef CONFIG_NUMA
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()
  2015-07-30 16:34 [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Vlastimil Babka
  2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
@ 2015-07-30 16:34 ` Vlastimil Babka
  2015-07-30 17:41   ` Johannes Weiner
  2015-07-30 17:59   ` Christoph Lameter
  2015-07-30 17:33 ` [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Johannes Weiner
  2015-07-30 17:58 ` Christoph Lameter
  3 siblings, 2 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-07-30 16:34 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Mel Gorman, David Rientjes, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi, Johannes Weiner, linux-ia64, linuxppc-dev,
	cbe-oss-dev, kvm, Vlastimil Babka, Mel Gorman

numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
so it's a more robust fallback than the currently used numa_node_id().

Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/gfp.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4a12cae2..f92cbd2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -318,13 +318,14 @@ __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 
 /*
  * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
- * prefer the current CPU's node. Otherwise node must be valid and online.
+ * prefer the current CPU's closest node. Otherwise node must be valid and
+ * online.
  */
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
 	if (nid == NUMA_NO_NODE)
-		nid = numa_node_id();
+		nid = numa_mem_id();
 
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-30 16:34 [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Vlastimil Babka
  2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
  2015-07-30 16:34 ` [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node() Vlastimil Babka
@ 2015-07-30 17:33 ` Johannes Weiner
  2015-07-30 17:58 ` Christoph Lameter
  3 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2015-07-30 17:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

On Thu, Jul 30, 2015 at 06:34:29PM +0200, Vlastimil Babka wrote:
> The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
> allocator: do not check NUMA node ID when the caller knows the node is valid")
> as an optimized variant of alloc_pages_node(), that doesn't fallback to current
> node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily
> suggest that the allocation is restricted to the given node and fails
> otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is
> passed among the gfp flags.
> 
> The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
> thp: really limit transparent hugepage allocation to local node") and
> b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").
> 
> Another issue with the name is that there's a family of alloc_pages_exact*()
> functions where 'exact' means exact size (instead of page order), which leads
> to more confusion.
> 
> To prevent further mistakes, this patch effectively renames
> alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's
> an optimized variant of alloc_pages_node() not intended for general usage.
> Both functions get described in comments.
> 
> It has been also considered to really provide a convenience function for
> allocations restricted to a node, but the major opinion seems to be that
> __GFP_THISNODE already provides that functionality and we shouldn't duplicate
> the API needlessly. The number of users would be small anyway.
> 
> Existing callers of alloc_pages_exact_node() are simply converted to call
> __alloc_pages_node(), with two exceptions. sba_alloc_coherent() and
> slob_new_page() both open-code the check for NUMA_NO_NODE, so they are
> converted to use alloc_pages_node() instead. This means they no longer perform
> some VM_BUG_ON checks, and since the current check for nid in
> alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE),
> it may hide wrong values which would be previously exposed. Both differences
> will be rectified by the next patch.
> 
> To sum up, this patch makes no functional changes, except temporarily hiding
> potentially buggy callers. Restricting the checks in alloc_pages_node() is
> left for the next patch which can in turn expose more existing buggy callers.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Gleb Natapov <gleb@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Cliff Whickman <cpw@sgi.com>
> Acked-by: Robin Holt <robinmholt@gmail.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()
  2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
@ 2015-07-30 17:35   ` Johannes Weiner
  2015-07-30 17:59   ` Christoph Lameter
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2015-07-30 17:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm

On Thu, Jul 30, 2015 at 06:34:30PM +0200, Vlastimil Babka wrote:
> Perform the same debug checks in alloc_pages_node() as are done in
> __alloc_pages_node(), by making the former function a wrapper of the latter
> one.
> 
> In addition to better diagnostics in DEBUG_VM builds for situations which
> have been already fatal (e.g. out-of-bounds node id), there are two visible
> changes for potential existing buggy callers of alloc_pages_node():
> 
> - calling alloc_pages_node() with any negative nid (e.g. due to arithmetic
>   overflow) was treated as passing NUMA_NO_NODE and fallback to local node was
>   applied. This will now be fatal.
> - calling alloc_pages_node() with an offline node will now be checked for
>   DEBUG_VM builds. Since it's not fatal if the node has been previously online,
>   and this patch may expose some existing buggy callers, change the VM_BUG_ON
>   in __alloc_pages_node() to VM_WARN_ON.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()
  2015-07-30 16:34 ` [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node() Vlastimil Babka
@ 2015-07-30 17:41   ` Johannes Weiner
  2015-08-06  7:00     ` Vlastimil Babka
  2015-07-30 17:59   ` Christoph Lameter
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Weiner @ 2015-07-30 17:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Mel Gorman

On Thu, Jul 30, 2015 at 06:34:31PM +0200, Vlastimil Babka wrote:
> numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().

Won't it fall through to the next closest memory node in the zonelist
anyway? Is this for callers doing NUMA_NO_NODE with __GFP_THISZONE?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-30 16:34 [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Vlastimil Babka
                   ` (2 preceding siblings ...)
  2015-07-30 17:33 ` [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Johannes Weiner
@ 2015-07-30 17:58 ` Christoph Lameter
  2015-07-30 19:59   ` Vlastimil Babka
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2015-07-30 17:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

On Thu, 30 Jul 2015, Vlastimil Babka wrote:

> --- a/mm/slob.c
> +++ b/mm/slob.c
>  	void *page;
>
> -#ifdef CONFIG_NUMA
> -	if (node != NUMA_NO_NODE)
> -		page = alloc_pages_exact_node(node, gfp, order);
> -	else
> -#endif
> -		page = alloc_pages(gfp, order);
> +	page = alloc_pages_node(node, gfp, order);

NAK. This is changing slob behavior. With no node specified it must use
alloc_pages because that obeys NUMA memory policies etc etc. It should not
force allocation from the current node like what is happening here after
the patch. See the code in slub.c that is similar.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()
  2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
  2015-07-30 17:35   ` Johannes Weiner
@ 2015-07-30 17:59   ` Christoph Lameter
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2015-07-30 17:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm


Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()
  2015-07-30 16:34 ` [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node() Vlastimil Babka
  2015-07-30 17:41   ` Johannes Weiner
@ 2015-07-30 17:59   ` Christoph Lameter
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2015-07-30 17:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Mel Gorman

On Thu, 30 Jul 2015, Vlastimil Babka wrote:

> numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
>
> Suggested-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

You can add my ack too if it helps.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-30 17:58 ` Christoph Lameter
@ 2015-07-30 19:59   ` Vlastimil Babka
  2015-07-30 20:07     ` Christoph Lameter
  2015-07-31 21:25     ` David Rientjes
  0 siblings, 2 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-07-30 19:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

On 07/30/2015 07:58 PM, Christoph Lameter wrote:
> On Thu, 30 Jul 2015, Vlastimil Babka wrote:
> 
>> --- a/mm/slob.c
>> +++ b/mm/slob.c
>>  	void *page;
>>
>> -#ifdef CONFIG_NUMA
>> -	if (node != NUMA_NO_NODE)
>> -		page = alloc_pages_exact_node(node, gfp, order);
>> -	else
>> -#endif
>> -		page = alloc_pages(gfp, order);
>> +	page = alloc_pages_node(node, gfp, order);
> 
> NAK. This is changing slob behavior. With no node specified it must use
> alloc_pages because that obeys NUMA memory policies etc etc. It should not
> force allocation from the current node like what is happening here after
> the patch. See the code in slub.c that is similar.
 
Doh, somehow I convinced myself that there's #else and alloc_pages() is only
used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version.

------8<------
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 24 Jul 2015 15:49:47 +0200
Subject: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to
 __alloc_pages_node

The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is valid")
as an optimized variant of alloc_pages_node(), that doesn't fallback to current
node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily
suggest that the allocation is restricted to the given node and fails
otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is
passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
thp: really limit transparent hugepage allocation to local node") and
b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of alloc_pages_exact*()
functions where 'exact' means exact size (instead of page order), which leads
to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's
an optimized variant of alloc_pages_node() not intended for general usage.
Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't duplicate
the API needlessly. The number of users would be small anyway.

Existing callers of alloc_pages_exact_node() are simply converted to call
__alloc_pages_node(), with the exception of sba_alloc_coherent() which
open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON
checks, and since the current check for nid in alloc_pages_node() uses a 'nid <
0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which
would be previously exposed. Both differences will be rectified by the next
patch.

To sum up, this patch makes no functional changes, except temporarily hiding
potentially buggy callers. Restricting the checks in alloc_pages_node() is
left for the next patch which can in turn expose more existing buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Acked-by: Robin Holt <robinmholt@gmail.com>
---
 arch/ia64/hp/common/sba_iommu.c   |  6 +-----
 arch/ia64/kernel/uncached.c       |  2 +-
 arch/ia64/sn/pci/pci_dma.c        |  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c                |  2 +-
 drivers/misc/sgi-xp/xpc_uv.c      |  2 +-
 include/linux/gfp.h               | 23 +++++++++++++++--------
 kernel/profile.c                  |  8 ++++----
 mm/filemap.c                      |  2 +-
 mm/huge_memory.c                  |  6 ++----
 mm/hugetlb.c                      |  4 ++--
 mm/memory-failure.c               |  2 +-
 mm/mempolicy.c                    |  4 ++--
 mm/migrate.c                      |  4 ++--
 mm/page_alloc.c                   |  2 --
 mm/slab.c                         |  2 +-
 mm/slob.c                         |  4 ++--
 mm/slub.c                         |  2 +-
 18 files changed, 39 insertions(+), 40 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..a6d6190 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 
 #ifdef CONFIG_NUMA
 	{
-		int node = ioc->node;
 		struct page *page;
 
-		if (node == NUMA_NO_NODE)
-			node = numa_node_id();
-
-		page = alloc_pages_exact_node(node, flags, get_order(size));
+		page = alloc_pages_node(ioc->node, flags, get_order(size));
 		if (unlikely(!page))
 			return NULL;
 
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 20e8a9b..f3976da 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -97,7 +97,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 
 	/* attempt to allocate a granule's worth of cached memory pages */
 
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				IA64_GRANULE_SHIFT-PAGE_SHIFT);
 	if (!page) {
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d0853e8..8f59907 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size,
 	 */
 	node = pcibus_to_node(pdev->bus);
 	if (likely(node >=0)) {
-		struct page *p = alloc_pages_exact_node(node,
+		struct page *p = __alloc_pages_node(node,
 						flags, get_order(size));
 
 		if (likely(p))
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index e865d74..2d4f60c 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -123,7 +123,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order)
 
 	area->nid = nid;
 	area->order = order;
-	area->pages = alloc_pages_exact_node(area->nid,
+	area->pages = __alloc_pages_node(area->nid,
 						GFP_KERNEL|__GFP_THISNODE,
 						area->order);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0dbeec1..881286b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3150,7 +3150,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
 	struct page *pages;
 	struct vmcs *vmcs;
 
-	pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order);
+	pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order);
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);
diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 95c8944..340b44d 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -239,7 +239,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
 	mq->mmr_blade = uv_cpu_to_blade_id(cpu);
 
 	nid = cpu_to_node(cpu);
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				      pg_order);
 	if (page == NULL) {
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3bd64b1..d2c142b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -303,20 +303,28 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }
 
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
-						unsigned int order)
+/*
+ * Allocate pages, preferring the node given as nid. The node must be valid and
+ * online. For more general interface, see alloc_pages_node().
+ */
+static inline struct page *
+__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-	/* Unknown node is current node */
-	if (nid < 0)
-		nid = numa_node_id();
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
-static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
+/*
+ * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
+ * prefer the current CPU's node.
+ */
+static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	/* Unknown node is current node */
+	if (nid < 0)
+		nid = numa_node_id();
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
@@ -357,7 +365,6 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
 void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-/* This is different from alloc_pages_exact_node !!! */
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 #define __get_free_page(gfp_mask) \
diff --git a/kernel/profile.c b/kernel/profile.c
index a7bcd28..99513e1 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -339,7 +339,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -347,7 +347,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 			per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
 		}
 		if (!per_cpu(cpu_profile_hits, cpu)[0]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -543,14 +543,14 @@ static int create_hash_tables(void)
 		int node = cpu_to_mem(cpu);
 		struct page *page;
 
-		page = alloc_pages_exact_node(node,
+		page = __alloc_pages_node(node,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				0);
 		if (!page)
 			goto out_cleanup;
 		per_cpu(cpu_profile_hits, cpu)[1]
 				= (struct profile_hit *)page_address(page);
-		page = alloc_pages_exact_node(node,
+		page = __alloc_pages_node(node,
 				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
 				0);
 		if (!page)
diff --git a/mm/filemap.c b/mm/filemap.c
index 204fd1c..b510a0d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -674,7 +674,7 @@ struct page *__page_cache_alloc(gfp_t gfp)
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = alloc_pages_exact_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, 0);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aa58a32..56355f2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
 	 */
 	up_read(&mm->mmap_sem);
 
-	*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
+	*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
 	if (unlikely(!*hpage)) {
 		count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
 		*hpage = ERR_PTR(-ENOMEM);
@@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
-	/* Only allocate from the target node */
-	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
-		__GFP_THISNODE;
+	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
 
 	/* release the mmap_sem read lock. */
 	new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e83fce5..4920bcb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1331,7 +1331,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 {
 	struct page *page;
 
-	page = alloc_pages_exact_node(nid,
+	page = __alloc_pages_node(nid,
 		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
 						__GFP_REPEAT|__GFP_NOWARN,
 		huge_page_order(h));
@@ -1483,7 +1483,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 				   __GFP_REPEAT|__GFP_NOWARN,
 				   huge_page_order(h));
 	else
-		page = alloc_pages_exact_node(nid,
+		page = __alloc_pages_node(nid,
 			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
 			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9700539..839f934 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1468,7 +1468,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
 	else
-		return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d6f2cae..87a1779 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -942,7 +942,7 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
 	else
-		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE |
+		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
 						    __GFP_THISNODE, 0);
 }
 
@@ -1998,7 +1998,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(hpage_node, *nmask)) {
 			mpol_cond_put(pol);
-			page = alloc_pages_exact_node(hpage_node,
+			page = __alloc_pages_node(hpage_node,
 						gfp | __GFP_THISNODE, order);
 			goto out;
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index d86cec0..cd673c8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1195,7 +1195,7 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 					pm->node);
 	else
-		return alloc_pages_exact_node(pm->node,
+		return __alloc_pages_node(pm->node,
 				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
 }
 
@@ -1555,7 +1555,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	int nid = (int) data;
 	struct page *newpage;
 
-	newpage = alloc_pages_exact_node(nid,
+	newpage = __alloc_pages_node(nid,
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4b220cb..88d2ee9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3511,8 +3511,6 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *
  * Like alloc_pages_exact(), but try to allocate on node nid first before falling
  * back.
- * Note this is not alloc_pages_exact_node() which allocates on a specific node,
- * but is not exact.
  */
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
diff --git a/mm/slab.c b/mm/slab.c
index 4c5910f..1783eda 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1595,7 +1595,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	if (memcg_charge_slab(cachep, flags, cachep->gfporder))
 		return NULL;
 
-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
 	if (!page) {
 		memcg_uncharge_slab(cachep, cachep->gfporder);
 		slab_out_of_memory(cachep, flags, nodeid);
diff --git a/mm/slob.c b/mm/slob.c
index 165bbd3..0d7e5df 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -45,7 +45,7 @@
  * NUMA support in SLOB is fairly simplistic, pushing most of the real
  * logic down to the page allocator, and simply doing the node accounting
  * on the upper levels. In the event that a node id is explicitly
- * provided, alloc_pages_exact_node() with the specified node id is used
+ * provided, __alloc_pages_node() with the specified node id is used
  * instead. The common case (or when the node id isn't explicitly provided)
  * will default to the current node, as per numa_node_id().
  *
@@ -193,7 +193,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
-		page = alloc_pages_exact_node(node, gfp, order);
+		page = __alloc_pages_node(node, gfp, order);
 	else
 #endif
 		page = alloc_pages(gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 257283f..b48ad97 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1336,7 +1336,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-		page = alloc_pages_exact_node(node, flags, order);
+		page = __alloc_pages_node(node, flags, order);
 
 	if (!page)
 		memcg_uncharge_slab(s, order);
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-30 19:59   ` Vlastimil Babka
@ 2015-07-30 20:07     ` Christoph Lameter
  2015-07-31 21:25     ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2015-07-30 20:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

On Thu, 30 Jul 2015, Vlastimil Babka wrote:

> > NAK. This is changing slob behavior. With no node specified it must use
> > alloc_pages because that obeys NUMA memory policies etc etc. It should not
> > force allocation from the current node like what is happening here after
> > the patch. See the code in slub.c that is similar.
>
> Doh, somehow I convinced myself that there's #else and alloc_pages() is only
> used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-30 19:59   ` Vlastimil Babka
  2015-07-30 20:07     ` Christoph Lameter
@ 2015-07-31 21:25     ` David Rientjes
  2015-07-31 21:45       ` Vlastimil Babka
  1 sibling, 1 reply; 14+ messages in thread
From: David Rientjes @ 2015-07-31 21:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, linux-mm, Andrew Morton, linux-kernel,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Johannes Weiner, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Tony Luck, Fenghua Yu,
	Arnd Bergmann, Benjamin Herrenschmidt, Paul Mackerras,
	Gleb Natapov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Cliff Whickman, Michael Ellerman, Robin Holt

On Thu, 30 Jul 2015, Vlastimil Babka wrote:

> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index aa58a32..56355f2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
>  	 */
>  	up_read(&mm->mmap_sem);
>  
> -	*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
> +	*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
>  	if (unlikely(!*hpage)) {
>  		count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
>  		*hpage = ERR_PTR(-ENOMEM);
> @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> -	/* Only allocate from the target node */
> -	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
> -		__GFP_THISNODE;
> +	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
>  
>  	/* release the mmap_sem read lock. */
>  	new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);

Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() 
that is removed in collapse_huge_page()?  I also don't see what happened 
to the __GFP_OTHER_NODE.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node
  2015-07-31 21:25     ` David Rientjes
@ 2015-07-31 21:45       ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-07-31 21:45 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Christoph Lameter, linux-mm, linux-kernel, Mel Gorman,
	Greg Thelen, Aneesh Kumar K.V, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi, Johannes Weiner, linux-ia64, linuxppc-dev,
	cbe-oss-dev, kvm, Tony Luck, Fenghua Yu, Arnd Bergmann,
	Benjamin Herrenschmidt, Paul Mackerras, Gleb Natapov,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Cliff Whickman, Michael Ellerman, Robin Holt

On 31.7.2015 23:25, David Rientjes wrote:
> On Thu, 30 Jul 2015, Vlastimil Babka wrote:
> 
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index aa58a32..56355f2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2469,7 +2469,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
>>  	 */
>>  	up_read(&mm->mmap_sem);
>>  
>> -	*hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER);
>> +	*hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
>>  	if (unlikely(!*hpage)) {
>>  		count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
>>  		*hpage = ERR_PTR(-ENOMEM);
>> @@ -2568,9 +2568,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>>  
>>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>>  
>> -	/* Only allocate from the target node */
>> -	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) |
>> -		__GFP_THISNODE;
>> +	gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), 0);
>>  
>>  	/* release the mmap_sem read lock. */
>>  	new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
> 
> Hmm, where is the __GFP_THISNODE enforcement in khugepaged_alloc_page() 
> that is removed in collapse_huge_page()?  I also don't see what happened 
> to the __GFP_OTHER_NODE.

Crap, I messed up with git, this hunk was supposed to be gone. Thanks for
noticing. Please apply without the collapse_huge_page hunk, or tell me to resend
once more.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()
  2015-07-30 17:41   ` Johannes Weiner
@ 2015-08-06  7:00     ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-08-06  7:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Andrew Morton, linux-kernel, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi, linux-ia64,
	linuxppc-dev, cbe-oss-dev, kvm, Mel Gorman

On 07/30/2015 07:41 PM, Johannes Weiner wrote:
> On Thu, Jul 30, 2015 at 06:34:31PM +0200, Vlastimil Babka wrote:
>> numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
>> so it's a more robust fallback than the currently used numa_node_id().
>
> Won't it fall through to the next closest memory node in the zonelist
> anyway?

Right, I would expect the zonelist of memoryless node to be the same as 
of the closest node. Documentation/vm/numa seems to agree.

Is this for callers doing NUMA_NO_NODE with __GFP_THISZONE?

I guess that's the only scenario where that matters, yeah. And there 
might well be no such caller now, but maybe some will sneak in without 
the author testing on a system with memoryless node.

Note that with !CONFIG_HAVE_MEMORYLESS_NODES, numa_mem_id() just does 
numa_node_id().

So yeah I think "a more robust fallback" is correct :) But let's put it 
explicitly in changelog then:

----8<----

alloc_pages_node() might fail when called with NUMA_NO_NODE and 
__GFP_THISNODE on a CPU belonging to a memoryless node. To make the 
local-node fallback more robust and prevent such situations, use 
numa_mem_id(), which was introduced for similar scenarios in the slab 
context.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-08-06  7:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30 16:34 [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Vlastimil Babka
2015-07-30 16:34 ` [PATCH v3 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node() Vlastimil Babka
2015-07-30 17:35   ` Johannes Weiner
2015-07-30 17:59   ` Christoph Lameter
2015-07-30 16:34 ` [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node() Vlastimil Babka
2015-07-30 17:41   ` Johannes Weiner
2015-08-06  7:00     ` Vlastimil Babka
2015-07-30 17:59   ` Christoph Lameter
2015-07-30 17:33 ` [PATCH v3 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Johannes Weiner
2015-07-30 17:58 ` Christoph Lameter
2015-07-30 19:59   ` Vlastimil Babka
2015-07-30 20:07     ` Christoph Lameter
2015-07-31 21:25     ` David Rientjes
2015-07-31 21:45       ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).