All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-24 14:45 ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is valid")
as an optimized variant of alloc_pages_node(), that doesn't allow the node id
to be -1. Unfortunately the name of the function can easily suggest that the
allocation is restricted to the given node and fails otherwise. In truth, the
node is only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
thp: really limit transparent hugepage allocation to local node") and
b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").

To prevent further mistakes and provide a convenience function for allocations
truly restricted to a node, this patch makes alloc_pages_exact_node() pass
__GFP_THISNODE to that effect. The previous implementation of
alloc_pages_exact_node() is copied as __alloc_pages_node() which implies it's
an optimized variant of __alloc_pages_node() not intended for general usage.
All three functions are described in the comment.

Existing callers of alloc_pages_exact_node() are adjusted as follows:
- those that explicitly pass __GFP_THISNODE keep calling
  alloc_pages_exact_node(), but the flag is removed from the call
- others are converted to call __alloc_pages_node(). Some may still pass
  __GFP_THISNODE if they serve as wrappers that get gfp_flags from higher
  layers.

There's exception of sba_alloc_coherent() which open-codes the check for
nid == -1, so it is converted to use alloc_pages_node() instead. This means
it no longer performs some VM_BUG_ON checks, but otherwise the whole patch
makes no functional changes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
I've dropped non-mm guys from CC for now and marked as RFC until we agree on
the API.

 arch/ia64/hp/common/sba_iommu.c   |  6 +-----
 arch/ia64/kernel/uncached.c       |  2 +-
 arch/ia64/sn/pci/pci_dma.c        |  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c                |  2 +-
 drivers/misc/sgi-xp/xpc_uv.c      |  2 +-
 include/linux/gfp.h               | 23 +++++++++++++++++++++++
 kernel/profile.c                  |  8 ++++----
 mm/filemap.c                      |  2 +-
 mm/hugetlb.c                      |  7 +++----
 mm/memory-failure.c               |  2 +-
 mm/mempolicy.c                    |  6 ++----
 mm/migrate.c                      |  6 ++----
 mm/slab.c                         |  2 +-
 mm/slob.c                         |  2 +-
 mm/slub.c                         |  2 +-
 16 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..a6d6190 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 
 #ifdef CONFIG_NUMA
 	{
-		int node = ioc->node;
 		struct page *page;
 
-		if (node == NUMA_NO_NODE)
-			node = numa_node_id();
-
-		page = alloc_pages_exact_node(node, flags, get_order(size));
+		page = alloc_pages_node(ioc->node, flags, get_order(size));
 		if (unlikely(!page))
 			return NULL;
 
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 20e8a9b..b187c87 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -98,7 +98,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	/* attempt to allocate a granule's worth of cached memory pages */
 
 	page = alloc_pages_exact_node(nid,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				IA64_GRANULE_SHIFT-PAGE_SHIFT);
 	if (!page) {
 		mutex_unlock(&uc_pool->add_chunk_mutex);
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d0853e8..8f59907 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size,
 	 */
 	node = pcibus_to_node(pdev->bus);
 	if (likely(node >=0)) {
-		struct page *p = alloc_pages_exact_node(node,
+		struct page *p = __alloc_pages_node(node,
 						flags, get_order(size));
 
 		if (likely(p))
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index e865d74..ff5ae13 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -124,7 +124,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order)
 	area->nid = nid;
 	area->order = order;
 	area->pages = alloc_pages_exact_node(area->nid,
-						GFP_KERNEL|__GFP_THISNODE,
+						GFP_KERNEL,
 						area->order);
 
 	if (!area->pages) {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2d73807..8c7f3b0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3158,7 +3158,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
 	struct page *pages;
 	struct vmcs *vmcs;
 
-	pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order);
+	pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order);
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);
diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 95c8944..a4758cd 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -240,7 +240,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
 
 	nid = cpu_to_node(cpu);
 	page = alloc_pages_exact_node(nid,
-				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				      GFP_KERNEL | __GFP_ZERO,
 				      pg_order);
 	if (page == NULL) {
 		dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 15928f0..c50848e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }
 
+/*
+ * An optimized version of alloc_pages_node(), to be only used in places where
+ * the overhead of the check for nid == -1 could matter.
+ */
+static inline struct page *
+__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+{
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+
+	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+}
+
+/*
+ * Allocate pages, preferring the node given as nid. When nid equals -1,
+ * prefer the current CPU's node.
+ */
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
@@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
+/*
+ * Allocate pages, restricting the allocation to the node given as nid. The
+ * node must be valid and online. This is achieved by adding __GFP_THISNODE
+ * to gfp_mask.
+ */
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
 
+	gfp_mask |= __GFP_THISNODE;
+
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
diff --git a/kernel/profile.c b/kernel/profile.c
index a7bcd28..30a9404 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -339,7 +339,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -347,7 +347,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 			per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
 		}
 		if (!per_cpu(cpu_profile_hits, cpu)[0]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -544,14 +544,14 @@ static int create_hash_tables(void)
 		struct page *page;
 
 		page = alloc_pages_exact_node(node,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				0);
 		if (!page)
 			goto out_cleanup;
 		per_cpu(cpu_profile_hits, cpu)[1]
 				= (struct profile_hit *)page_address(page);
 		page = alloc_pages_exact_node(node,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				0);
 		if (!page)
 			goto out_cleanup;
diff --git a/mm/filemap.c b/mm/filemap.c
index 6bf5e42..5a7d4e2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -648,7 +648,7 @@ struct page *__page_cache_alloc(gfp_t gfp)
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = alloc_pages_exact_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, 0);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 271e443..156d8d7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1089,8 +1089,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 	struct page *page;
 
 	page = alloc_pages_exact_node(nid,
-		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
-						__GFP_REPEAT|__GFP_NOWARN,
+		htlb_alloc_mask(h)|__GFP_COMP|__GFP_REPEAT|__GFP_NOWARN,
 		huge_page_order(h));
 	if (page) {
 		if (arch_prepare_hugepage(page)) {
@@ -1251,8 +1250,8 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 				   huge_page_order(h));
 	else
 		page = alloc_pages_exact_node(nid,
-			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
-			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
+			htlb_alloc_mask(h)|__GFP_COMP|__GFP_REPEAT|
+			__GFP_NOWARN, huge_page_order(h));
 
 	if (page && arch_prepare_hugepage(page)) {
 		__free_pages(page, huge_page_order(h));
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 501820c..b783bc5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1503,7 +1503,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
 	else
-		return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7477432..4547960 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -945,8 +945,7 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
 	else
-		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE |
-						    __GFP_THISNODE, 0);
+		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
@@ -1986,8 +1985,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(node, *nmask)) {
 			mpol_cond_put(pol);
-			page = alloc_pages_exact_node(node,
-						gfp | __GFP_THISNODE, order);
+			page = alloc_pages_exact_node(node, gfp, order);
 			goto out;
 		}
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index f53838f..d139222 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	struct page *newpage;
 
 	newpage = alloc_pages_exact_node(nid,
-					 (GFP_HIGHUSER_MOVABLE |
-					  __GFP_THISNODE | __GFP_NOMEMALLOC |
-					  __GFP_NORETRY | __GFP_NOWARN) &
-					 ~GFP_IOFS, 0);
+				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
+				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
 
 	return newpage;
 }
diff --git a/mm/slab.c b/mm/slab.c
index 7eb38dd..5f49e63 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1594,7 +1594,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	if (memcg_charge_slab(cachep, flags, cachep->gfporder))
 		return NULL;
 
-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
 	if (!page) {
 		memcg_uncharge_slab(cachep, cachep->gfporder);
 		slab_out_of_memory(cachep, flags, nodeid);
diff --git a/mm/slob.c b/mm/slob.c
index 4765f65..10d8e02 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -193,7 +193,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
-		page = alloc_pages_exact_node(node, gfp, order);
+		page = __alloc_pages_node(node, gfp, order);
 	else
 #endif
 		page = alloc_pages(gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 54c0876..0486343 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1323,7 +1323,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-		page = alloc_pages_exact_node(node, flags, order);
+		page = __alloc_pages_node(node, flags, order);
 
 	if (!page)
 		memcg_uncharge_slab(s, order);
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-24 14:45 ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is valid")
as an optimized variant of alloc_pages_node(), that doesn't allow the node id
to be -1. Unfortunately the name of the function can easily suggest that the
allocation is restricted to the given node and fails otherwise. In truth, the
node is only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
thp: really limit transparent hugepage allocation to local node") and
b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").

To prevent further mistakes and provide a convenience function for allocations
truly restricted to a node, this patch makes alloc_pages_exact_node() pass
__GFP_THISNODE to that effect. The previous implementation of
alloc_pages_exact_node() is copied as __alloc_pages_node() which implies it's
an optimized variant of __alloc_pages_node() not intended for general usage.
All three functions are described in the comment.

Existing callers of alloc_pages_exact_node() are adjusted as follows:
- those that explicitly pass __GFP_THISNODE keep calling
  alloc_pages_exact_node(), but the flag is removed from the call
- others are converted to call __alloc_pages_node(). Some may still pass
  __GFP_THISNODE if they serve as wrappers that get gfp_flags from higher
  layers.

There's exception of sba_alloc_coherent() which open-codes the check for
nid == -1, so it is converted to use alloc_pages_node() instead. This means
it no longer performs some VM_BUG_ON checks, but otherwise the whole patch
makes no functional changes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
I've dropped non-mm guys from CC for now and marked as RFC until we agree on
the API.

 arch/ia64/hp/common/sba_iommu.c   |  6 +-----
 arch/ia64/kernel/uncached.c       |  2 +-
 arch/ia64/sn/pci/pci_dma.c        |  2 +-
 arch/powerpc/platforms/cell/ras.c |  2 +-
 arch/x86/kvm/vmx.c                |  2 +-
 drivers/misc/sgi-xp/xpc_uv.c      |  2 +-
 include/linux/gfp.h               | 23 +++++++++++++++++++++++
 kernel/profile.c                  |  8 ++++----
 mm/filemap.c                      |  2 +-
 mm/hugetlb.c                      |  7 +++----
 mm/memory-failure.c               |  2 +-
 mm/mempolicy.c                    |  6 ++----
 mm/migrate.c                      |  6 ++----
 mm/slab.c                         |  2 +-
 mm/slob.c                         |  2 +-
 mm/slub.c                         |  2 +-
 16 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 344387a..a6d6190 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 
 #ifdef CONFIG_NUMA
 	{
-		int node = ioc->node;
 		struct page *page;
 
-		if (node == NUMA_NO_NODE)
-			node = numa_node_id();
-
-		page = alloc_pages_exact_node(node, flags, get_order(size));
+		page = alloc_pages_node(ioc->node, flags, get_order(size));
 		if (unlikely(!page))
 			return NULL;
 
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 20e8a9b..b187c87 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -98,7 +98,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	/* attempt to allocate a granule's worth of cached memory pages */
 
 	page = alloc_pages_exact_node(nid,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				IA64_GRANULE_SHIFT-PAGE_SHIFT);
 	if (!page) {
 		mutex_unlock(&uc_pool->add_chunk_mutex);
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d0853e8..8f59907 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size,
 	 */
 	node = pcibus_to_node(pdev->bus);
 	if (likely(node >=0)) {
-		struct page *p = alloc_pages_exact_node(node,
+		struct page *p = __alloc_pages_node(node,
 						flags, get_order(size));
 
 		if (likely(p))
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index e865d74..ff5ae13 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -124,7 +124,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order)
 	area->nid = nid;
 	area->order = order;
 	area->pages = alloc_pages_exact_node(area->nid,
-						GFP_KERNEL|__GFP_THISNODE,
+						GFP_KERNEL,
 						area->order);
 
 	if (!area->pages) {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2d73807..8c7f3b0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3158,7 +3158,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
 	struct page *pages;
 	struct vmcs *vmcs;
 
-	pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order);
+	pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order);
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);
diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c
index 95c8944..a4758cd 100644
--- a/drivers/misc/sgi-xp/xpc_uv.c
+++ b/drivers/misc/sgi-xp/xpc_uv.c
@@ -240,7 +240,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
 
 	nid = cpu_to_node(cpu);
 	page = alloc_pages_exact_node(nid,
-				      GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				      GFP_KERNEL | __GFP_ZERO,
 				      pg_order);
 	if (page == NULL) {
 		dev_err(xpc_part, "xpc_create_gru_mq_uv() failed to alloc %d "
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 15928f0..c50848e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }
 
+/*
+ * An optimized version of alloc_pages_node(), to be only used in places where
+ * the overhead of the check for nid == -1 could matter.
+ */
+static inline struct page *
+__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+{
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+
+	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+}
+
+/*
+ * Allocate pages, preferring the node given as nid. When nid equals -1,
+ * prefer the current CPU's node.
+ */
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
@@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
+/*
+ * Allocate pages, restricting the allocation to the node given as nid. The
+ * node must be valid and online. This is achieved by adding __GFP_THISNODE
+ * to gfp_mask.
+ */
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
 
+	gfp_mask |= __GFP_THISNODE;
+
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
 
diff --git a/kernel/profile.c b/kernel/profile.c
index a7bcd28..30a9404 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -339,7 +339,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -347,7 +347,7 @@ static int profile_cpu_callback(struct notifier_block *info,
 			per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
 		}
 		if (!per_cpu(cpu_profile_hits, cpu)[0]) {
-			page = alloc_pages_exact_node(node,
+			page = __alloc_pages_node(node,
 					GFP_KERNEL | __GFP_ZERO,
 					0);
 			if (!page)
@@ -544,14 +544,14 @@ static int create_hash_tables(void)
 		struct page *page;
 
 		page = alloc_pages_exact_node(node,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				0);
 		if (!page)
 			goto out_cleanup;
 		per_cpu(cpu_profile_hits, cpu)[1]
 				= (struct profile_hit *)page_address(page);
 		page = alloc_pages_exact_node(node,
-				GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				GFP_KERNEL | __GFP_ZERO,
 				0);
 		if (!page)
 			goto out_cleanup;
diff --git a/mm/filemap.c b/mm/filemap.c
index 6bf5e42..5a7d4e2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -648,7 +648,7 @@ struct page *__page_cache_alloc(gfp_t gfp)
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = alloc_pages_exact_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, 0);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 271e443..156d8d7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1089,8 +1089,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 	struct page *page;
 
 	page = alloc_pages_exact_node(nid,
-		htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
-						__GFP_REPEAT|__GFP_NOWARN,
+		htlb_alloc_mask(h)|__GFP_COMP|__GFP_REPEAT|__GFP_NOWARN,
 		huge_page_order(h));
 	if (page) {
 		if (arch_prepare_hugepage(page)) {
@@ -1251,8 +1250,8 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 				   huge_page_order(h));
 	else
 		page = alloc_pages_exact_node(nid,
-			htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
-			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
+			htlb_alloc_mask(h)|__GFP_COMP|__GFP_REPEAT|
+			__GFP_NOWARN, huge_page_order(h));
 
 	if (page && arch_prepare_hugepage(page)) {
 		__free_pages(page, huge_page_order(h));
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 501820c..b783bc5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1503,7 +1503,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
 	else
-		return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7477432..4547960 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -945,8 +945,7 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
 	else
-		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE |
-						    __GFP_THISNODE, 0);
+		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
@@ -1986,8 +1985,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(node, *nmask)) {
 			mpol_cond_put(pol);
-			page = alloc_pages_exact_node(node,
-						gfp | __GFP_THISNODE, order);
+			page = alloc_pages_exact_node(node, gfp, order);
 			goto out;
 		}
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index f53838f..d139222 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 	struct page *newpage;
 
 	newpage = alloc_pages_exact_node(nid,
-					 (GFP_HIGHUSER_MOVABLE |
-					  __GFP_THISNODE | __GFP_NOMEMALLOC |
-					  __GFP_NORETRY | __GFP_NOWARN) &
-					 ~GFP_IOFS, 0);
+				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
+				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
 
 	return newpage;
 }
diff --git a/mm/slab.c b/mm/slab.c
index 7eb38dd..5f49e63 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1594,7 +1594,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	if (memcg_charge_slab(cachep, flags, cachep->gfporder))
 		return NULL;
 
-	page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+	page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
 	if (!page) {
 		memcg_uncharge_slab(cachep, cachep->gfporder);
 		slab_out_of_memory(cachep, flags, nodeid);
diff --git a/mm/slob.c b/mm/slob.c
index 4765f65..10d8e02 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -193,7 +193,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
-		page = alloc_pages_exact_node(node, gfp, order);
+		page = __alloc_pages_node(node, gfp, order);
 	else
 #endif
 		page = alloc_pages(gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 54c0876..0486343 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1323,7 +1323,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else
-		page = alloc_pages_exact_node(node, flags, order);
+		page = __alloc_pages_node(node, flags, order);
 
 	if (!page)
 		memcg_uncharge_slab(s, order);
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 2/4] mm: unify checks in alloc_pages_node family of functions
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-24 14:45   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

Perform the same debug checks in alloc_pages_node() as are done in
alloc_pages_exact_node() and __alloc_pages_node() by making the latter
function the inner core of the former ones.

Change the !node_online(nid) check from VM_BUG_ON to VM_WARN_ON since it's not
fatal and this patch may expose some buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c50848e..54c3ee7 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -307,7 +307,8 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 static inline struct page *
 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
@@ -319,11 +320,11 @@ __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	/* Unknown node is current node */
-	if (nid < 0)
+	/* Unknown node is current (or closest) node */
+	if (nid == NUMA_NO_NODE)
 		nid = numa_node_id();
 
-	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
 /*
@@ -334,11 +335,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
-
-	gfp_mask |= __GFP_THISNODE;
-
-	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+	return __alloc_pages_node(nid, gfp_mask | __GFP_THISNODE, order);
 }
 
 #ifdef CONFIG_NUMA
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 2/4] mm: unify checks in alloc_pages_node family of functions
@ 2015-07-24 14:45   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

Perform the same debug checks in alloc_pages_node() as are done in
alloc_pages_exact_node() and __alloc_pages_node() by making the latter
function the inner core of the former ones.

Change the !node_online(nid) check from VM_BUG_ON to VM_WARN_ON since it's not
fatal and this patch may expose some buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c50848e..54c3ee7 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -307,7 +307,8 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
 static inline struct page *
 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+	VM_WARN_ON(!node_online(nid));
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }
@@ -319,11 +320,11 @@ __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	/* Unknown node is current node */
-	if (nid < 0)
+	/* Unknown node is current (or closest) node */
+	if (nid == NUMA_NO_NODE)
 		nid = numa_node_id();
 
-	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+	return __alloc_pages_node(nid, gfp_mask, order);
 }
 
 /*
@@ -334,11 +335,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
-
-	gfp_mask |= __GFP_THISNODE;
-
-	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
+	return __alloc_pages_node(nid, gfp_mask | __GFP_THISNODE, order);
 }
 
 #ifdef CONFIG_NUMA
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-24 14:45   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
so it's a more robust fallback than the currently used numa_node_id().

Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 54c3ee7..531c72d 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -322,7 +322,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 {
 	/* Unknown node is current (or closest) node */
 	if (nid == NUMA_NO_NODE)
-		nid = numa_node_id();
+		nid = numa_mem_id();
 
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
@ 2015-07-24 14:45   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
so it's a more robust fallback than the currently used numa_node_id().

Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 54c3ee7..531c72d 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -322,7 +322,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 {
 	/* Unknown node is current (or closest) node */
 	if (nid == NUMA_NO_NODE)
-		nid = numa_node_id();
+		nid = numa_mem_id();
 
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-24 14:45   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

alloc_pages_node() now warns when an offline node is passed. Make it fallback
to the local (or nearest) node as if NUMA_NO_NODE nid is passed, but keep the
VM_WARN_ON warning.

Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
David's suggested if(VM_WARN_ON(...)) doesn't work that way, hence this more
involved and awkward syntax.

 include/linux/gfp.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 531c72d..104a027 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
 	/* Unknown node is current (or closest) node */
-	if (nid == NUMA_NO_NODE)
+	if (nid == NUMA_NO_NODE) {
 		nid = numa_mem_id();
+	} else if (!node_online(nid)) {
+		VM_WARN_ON(!node_online(nid));
+		nid = numa_mem_id();
+	}
 
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-24 14:45   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi, Vlastimil Babka

alloc_pages_node() now warns when an offline node is passed. Make it fallback
to the local (or nearest) node as if NUMA_NO_NODE nid is passed, but keep the
VM_WARN_ON warning.

Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
David's suggested if(VM_WARN_ON(...)) doesn't work that way, hence this more
involved and awkward syntax.

 include/linux/gfp.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 531c72d..104a027 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
 	/* Unknown node is current (or closest) node */
-	if (nid == NUMA_NO_NODE)
+	if (nid == NUMA_NO_NODE) {
 		nid = numa_mem_id();
+	} else if (!node_online(nid)) {
+		VM_WARN_ON(!node_online(nid));
+		nid = numa_mem_id();
+	}
 
 	return __alloc_pages_node(nid, gfp_mask, order);
 }
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 14:45   ` Vlastimil Babka
@ 2015-07-24 15:48     ` Christoph Lameter
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2015-07-24 15:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 531c72d..104a027 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	/* Unknown node is current (or closest) node */
> -	if (nid == NUMA_NO_NODE)
> +	if (nid == NUMA_NO_NODE) {
>  		nid = numa_mem_id();
> +	} else if (!node_online(nid)) {
> +		VM_WARN_ON(!node_online(nid));
> +		nid = numa_mem_id();
> +	}

I would think you would only want this for debugging purposes. The
overwhelming majority of hardware out there has no memory
onlining/offlining capability after all and this adds the overhead to each
call to alloc_pages_node.

Make this dependo n CONFIG_VM_DEBUG or some such thing?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-24 15:48     ` Christoph Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2015-07-24 15:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 531c72d..104a027 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	/* Unknown node is current (or closest) node */
> -	if (nid == NUMA_NO_NODE)
> +	if (nid == NUMA_NO_NODE) {
>  		nid = numa_mem_id();
> +	} else if (!node_online(nid)) {
> +		VM_WARN_ON(!node_online(nid));
> +		nid = numa_mem_id();
> +	}

I would think you would only want this for debugging purposes. The
overwhelming majority of hardware out there has no memory
onlining/offlining capability after all and this adds the overhead to each
call to alloc_pages_node.

Make this dependo n CONFIG_VM_DEBUG or some such thing?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 15:48     ` Christoph Lameter
@ 2015-07-24 19:54       ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 19:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Christoph Lameter wrote:

> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
> 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 531c72d..104a027 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> >  						unsigned int order)
> >  {
> >  	/* Unknown node is current (or closest) node */
> > -	if (nid == NUMA_NO_NODE)
> > +	if (nid == NUMA_NO_NODE) {
> >  		nid = numa_mem_id();
> > +	} else if (!node_online(nid)) {
> > +		VM_WARN_ON(!node_online(nid));
> > +		nid = numa_mem_id();
> > +	}
> 
> I would think you would only want this for debugging purposes. The
> overwhelming majority of hardware out there has no memory
> onlining/offlining capability after all and this adds the overhead to each
> call to alloc_pages_node.
> 
> Make this dependo n CONFIG_VM_DEBUG or some such thing?
> 

Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
placement has changed somewhat because of the new __alloc_pages_node().  I 
think

	else if (VM_WARN_ON(!node_online(nid)))
		nid = numa_mem_id();

should be fine since it only triggers for CONFIG_DEBUG_VM.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-24 19:54       ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 19:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Christoph Lameter wrote:

> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
> 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 531c72d..104a027 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> >  						unsigned int order)
> >  {
> >  	/* Unknown node is current (or closest) node */
> > -	if (nid == NUMA_NO_NODE)
> > +	if (nid == NUMA_NO_NODE) {
> >  		nid = numa_mem_id();
> > +	} else if (!node_online(nid)) {
> > +		VM_WARN_ON(!node_online(nid));
> > +		nid = numa_mem_id();
> > +	}
> 
> I would think you would only want this for debugging purposes. The
> overwhelming majority of hardware out there has no memory
> onlining/offlining capability after all and this adds the overhead to each
> call to alloc_pages_node.
> 
> Make this dependo n CONFIG_VM_DEBUG or some such thing?
> 

Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
placement has changed somewhat because of the new __alloc_pages_node().  I 
think

	else if (VM_WARN_ON(!node_online(nid)))
		nid = numa_mem_id();

should be fine since it only triggers for CONFIG_DEBUG_VM.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-24 20:08   ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 15928f0..c50848e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>  }
>  
> +/*
> + * An optimized version of alloc_pages_node(), to be only used in places where
> + * the overhead of the check for nid == -1 could matter.

We don't actually check for nid == -1, or nid == NUMA_NO_NODE, in any of 
the functions.  I would just state that nid must be valid and possible to 
allocate from when passed to this function.

> + */
> +static inline struct page *
> +__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
> +{
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> +
> +	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> +}
> +
> +/*
> + * Allocate pages, preferring the node given as nid. When nid equals -1,
> + * prefer the current CPU's node.
> + */

We've done quite a bit of work to refer only to NUMA_NO_NODE, so we'd like 
to avoid hardcoded -1 anywhere we can.

>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
> +/*
> + * Allocate pages, restricting the allocation to the node given as nid. The
> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
> + * to gfp_mask.

Not sure we need to point out that __GPF_THISNODE does this, it stands out 
pretty well in the function already :)

> + */
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>  
> +	gfp_mask |= __GFP_THISNODE;
> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
[snip]

I assume you looked at the collapse_huge_page() case and decided that it 
needs no modification since the gfp mask is used later for other calls?

> diff --git a/mm/migrate.c b/mm/migrate.c
> index f53838f..d139222 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
>  	struct page *newpage;
>  
>  	newpage = alloc_pages_exact_node(nid,
> -					 (GFP_HIGHUSER_MOVABLE |
> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
> -					  __GFP_NORETRY | __GFP_NOWARN) &
> -					 ~GFP_IOFS, 0);
> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
>  
>  	return newpage;
>  }
[snip]

What about the alloc_pages_exact_node() in new_page_node()?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-24 20:08   ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 15928f0..c50848e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>  }
>  
> +/*
> + * An optimized version of alloc_pages_node(), to be only used in places where
> + * the overhead of the check for nid == -1 could matter.

We don't actually check for nid == -1, or nid == NUMA_NO_NODE, in any of 
the functions.  I would just state that nid must be valid and possible to 
allocate from when passed to this function.

> + */
> +static inline struct page *
> +__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
> +{
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> +
> +	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> +}
> +
> +/*
> + * Allocate pages, preferring the node given as nid. When nid equals -1,
> + * prefer the current CPU's node.
> + */

We've done quite a bit of work to refer only to NUMA_NO_NODE, so we'd like 
to avoid hardcoded -1 anywhere we can.

>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
> +/*
> + * Allocate pages, restricting the allocation to the node given as nid. The
> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
> + * to gfp_mask.

Not sure we need to point out that __GPF_THISNODE does this, it stands out 
pretty well in the function already :)

> + */
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>  
> +	gfp_mask |= __GFP_THISNODE;
> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
[snip]

I assume you looked at the collapse_huge_page() case and decided that it 
needs no modification since the gfp mask is used later for other calls?

> diff --git a/mm/migrate.c b/mm/migrate.c
> index f53838f..d139222 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
>  	struct page *newpage;
>  
>  	newpage = alloc_pages_exact_node(nid,
> -					 (GFP_HIGHUSER_MOVABLE |
> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
> -					  __GFP_NORETRY | __GFP_NOWARN) &
> -					 ~GFP_IOFS, 0);
> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
>  
>  	return newpage;
>  }
[snip]

What about the alloc_pages_exact_node() in new_page_node()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 2/4] mm: unify checks in alloc_pages_node family of functions
  2015-07-24 14:45   ` Vlastimil Babka
@ 2015-07-24 20:09     ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> Perform the same debug checks in alloc_pages_node() as are done in
> alloc_pages_exact_node() and __alloc_pages_node() by making the latter
> function the inner core of the former ones.
> 
> Change the !node_online(nid) check from VM_BUG_ON to VM_WARN_ON since it's not
> fatal and this patch may expose some buggy callers.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 2/4] mm: unify checks in alloc_pages_node family of functions
@ 2015-07-24 20:09     ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> Perform the same debug checks in alloc_pages_node() as are done in
> alloc_pages_exact_node() and __alloc_pages_node() by making the latter
> function the inner core of the former ones.
> 
> Change the !node_online(nid) check from VM_BUG_ON to VM_WARN_ON since it's not
> fatal and this patch may expose some buggy callers.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
  2015-07-24 14:45   ` Vlastimil Babka
@ 2015-07-24 20:09     ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
> 
> Suggested-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
@ 2015-07-24 20:09     ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 20:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
> 
> Suggested-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 19:54       ` David Rientjes
@ 2015-07-24 20:39         ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 20:39 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On 24.7.2015 21:54, David Rientjes wrote:
> On Fri, 24 Jul 2015, Christoph Lameter wrote:
> 
>> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
>>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index 531c72d..104a027 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>>  						unsigned int order)
>>>  {
>>>  	/* Unknown node is current (or closest) node */
>>> -	if (nid == NUMA_NO_NODE)
>>> +	if (nid == NUMA_NO_NODE) {
>>>  		nid = numa_mem_id();
>>> +	} else if (!node_online(nid)) {
>>> +		VM_WARN_ON(!node_online(nid));
>>> +		nid = numa_mem_id();
>>> +	}
>>
>> I would think you would only want this for debugging purposes. The
>> overwhelming majority of hardware out there has no memory
>> onlining/offlining capability after all and this adds the overhead to each
>> call to alloc_pages_node.
>>
>> Make this dependo n CONFIG_VM_DEBUG or some such thing?
>>
> 
> Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
> placement has changed somewhat because of the new __alloc_pages_node().  I 
> think
> 
> 	else if (VM_WARN_ON(!node_online(nid)))
> 		nid = numa_mem_id();
> 
> should be fine since it only triggers for CONFIG_DEBUG_VM.

Um, so on your original suggestion I thought that you assumed that the condition
inside VM_WARN_ON is evaluated regardless of CONFIG_DEBUG_VM, it just will or
will not generate a warning. Which is how BUG_ON works, but VM_WARN_ON (and
VM_BUG_ON) doesn't. IIUC VM_WARN_ON() with !CONFIG_DEBUG_VM will always be false.
Because I didn't think you would suggest the "nid = numa_mem_id()" for
!node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
seems that you do suggest that? I would understand if the fixup (correcting an
offline node to some that's online) was done regardless of DEBUG_VM, and
DEBUG_VM just switched between silent and noisy fixup. But having a debug option
alter the outcome seems wrong?
Am I correct that passing an offline node is not fatal, just the zonelist will
be empty and the allocation will fail? Now without DEBUG_VM it would silently
fail, and with DEBUG_VM it would warn, but succeed on another node.

So either we do fixup regardless of DEBUG_VM, or drop this patch, as the
VM_WARN_ON(!node_online(nid)) is already done in __alloc_pages_node() thanks to
patch 2/4?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-24 20:39         ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 20:39 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On 24.7.2015 21:54, David Rientjes wrote:
> On Fri, 24 Jul 2015, Christoph Lameter wrote:
> 
>> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
>>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index 531c72d..104a027 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>>  						unsigned int order)
>>>  {
>>>  	/* Unknown node is current (or closest) node */
>>> -	if (nid == NUMA_NO_NODE)
>>> +	if (nid == NUMA_NO_NODE) {
>>>  		nid = numa_mem_id();
>>> +	} else if (!node_online(nid)) {
>>> +		VM_WARN_ON(!node_online(nid));
>>> +		nid = numa_mem_id();
>>> +	}
>>
>> I would think you would only want this for debugging purposes. The
>> overwhelming majority of hardware out there has no memory
>> onlining/offlining capability after all and this adds the overhead to each
>> call to alloc_pages_node.
>>
>> Make this dependo n CONFIG_VM_DEBUG or some such thing?
>>
> 
> Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
> placement has changed somewhat because of the new __alloc_pages_node().  I 
> think
> 
> 	else if (VM_WARN_ON(!node_online(nid)))
> 		nid = numa_mem_id();
> 
> should be fine since it only triggers for CONFIG_DEBUG_VM.

Um, so on your original suggestion I thought that you assumed that the condition
inside VM_WARN_ON is evaluated regardless of CONFIG_DEBUG_VM, it just will or
will not generate a warning. Which is how BUG_ON works, but VM_WARN_ON (and
VM_BUG_ON) doesn't. IIUC VM_WARN_ON() with !CONFIG_DEBUG_VM will always be false.
Because I didn't think you would suggest the "nid = numa_mem_id()" for
!node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
seems that you do suggest that? I would understand if the fixup (correcting an
offline node to some that's online) was done regardless of DEBUG_VM, and
DEBUG_VM just switched between silent and noisy fixup. But having a debug option
alter the outcome seems wrong?
Am I correct that passing an offline node is not fatal, just the zonelist will
be empty and the allocation will fail? Now without DEBUG_VM it would silently
fail, and with DEBUG_VM it would warn, but succeed on another node.

So either we do fixup regardless of DEBUG_VM, or drop this patch, as the
VM_WARN_ON(!node_online(nid)) is already done in __alloc_pages_node() thanks to
patch 2/4?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-24 20:08   ` David Rientjes
@ 2015-07-24 20:52     ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 20:52 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On 24.7.2015 22:08, David Rientjes wrote:
> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
> 
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index 15928f0..c50848e 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>>  }
>>  
>> +/*
>> + * An optimized version of alloc_pages_node(), to be only used in places where
>> + * the overhead of the check for nid == -1 could matter.
> 
> We don't actually check for nid == -1, or nid == NUMA_NO_NODE, in any of 
> the functions.  I would just state that nid must be valid and possible to 
> allocate from when passed to this function.

OK

>> + */
>> +static inline struct page *
>> +__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
>> +{
>> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>> +
>> +	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>> +}
>> +
>> +/*
>> + * Allocate pages, preferring the node given as nid. When nid equals -1,
>> + * prefer the current CPU's node.
>> + */
> 
> We've done quite a bit of work to refer only to NUMA_NO_NODE, so we'd like 
> to avoid hardcoded -1 anywhere we can.

OK

>>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>  }
>>  
>> +/*
>> + * Allocate pages, restricting the allocation to the node given as nid. The
>> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
>> + * to gfp_mask.
> 
> Not sure we need to point out that __GPF_THISNODE does this, it stands out 
> pretty well in the function already :)

Right.

>> + */
>>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>  
>> +	gfp_mask |= __GFP_THISNODE;
>> +
>>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>  }
>>  
> [snip]
> 
> I assume you looked at the collapse_huge_page() case and decided that it 
> needs no modification since the gfp mask is used later for other calls?

Yeah. Not that the memcg charge parts would seem to care about __GFP_THISNODE,
though.

>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index f53838f..d139222 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
>>  	struct page *newpage;
>>  
>>  	newpage = alloc_pages_exact_node(nid,
>> -					 (GFP_HIGHUSER_MOVABLE |
>> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
>> -					  __GFP_NORETRY | __GFP_NOWARN) &
>> -					 ~GFP_IOFS, 0);
>> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
>> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
>>  
>>  	return newpage;
>>  }
> [snip]
> 
> What about the alloc_pages_exact_node() in new_page_node()?

Oops, seems I missed that one. So the API seems ok otherwise?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-24 20:52     ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-24 20:52 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On 24.7.2015 22:08, David Rientjes wrote:
> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
> 
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index 15928f0..c50848e 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -300,6 +300,22 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>>  }
>>  
>> +/*
>> + * An optimized version of alloc_pages_node(), to be only used in places where
>> + * the overhead of the check for nid == -1 could matter.
> 
> We don't actually check for nid == -1, or nid == NUMA_NO_NODE, in any of 
> the functions.  I would just state that nid must be valid and possible to 
> allocate from when passed to this function.

OK

>> + */
>> +static inline struct page *
>> +__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
>> +{
>> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>> +
>> +	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>> +}
>> +
>> +/*
>> + * Allocate pages, preferring the node given as nid. When nid equals -1,
>> + * prefer the current CPU's node.
>> + */
> 
> We've done quite a bit of work to refer only to NUMA_NO_NODE, so we'd like 
> to avoid hardcoded -1 anywhere we can.

OK

>>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>  }
>>  
>> +/*
>> + * Allocate pages, restricting the allocation to the node given as nid. The
>> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
>> + * to gfp_mask.
> 
> Not sure we need to point out that __GPF_THISNODE does this, it stands out 
> pretty well in the function already :)

Right.

>> + */
>>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>  
>> +	gfp_mask |= __GFP_THISNODE;
>> +
>>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>  }
>>  
> [snip]
> 
> I assume you looked at the collapse_huge_page() case and decided that it 
> needs no modification since the gfp mask is used later for other calls?

Yeah. Not that the memcg charge parts would seem to care about __GFP_THISNODE,
though.

>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index f53838f..d139222 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
>>  	struct page *newpage;
>>  
>>  	newpage = alloc_pages_exact_node(nid,
>> -					 (GFP_HIGHUSER_MOVABLE |
>> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
>> -					  __GFP_NORETRY | __GFP_NOWARN) &
>> -					 ~GFP_IOFS, 0);
>> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
>> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
>>  
>>  	return newpage;
>>  }
> [snip]
> 
> What about the alloc_pages_exact_node() in new_page_node()?

Oops, seems I missed that one. So the API seems ok otherwise?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 20:39         ` Vlastimil Babka
@ 2015-07-24 23:06           ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 23:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> >>> index 531c72d..104a027 100644
> >>> --- a/include/linux/gfp.h
> >>> +++ b/include/linux/gfp.h
> >>> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> >>>  						unsigned int order)
> >>>  {
> >>>  	/* Unknown node is current (or closest) node */
> >>> -	if (nid == NUMA_NO_NODE)
> >>> +	if (nid == NUMA_NO_NODE) {
> >>>  		nid = numa_mem_id();
> >>> +	} else if (!node_online(nid)) {
> >>> +		VM_WARN_ON(!node_online(nid));
> >>> +		nid = numa_mem_id();
> >>> +	}
> >>
> >> I would think you would only want this for debugging purposes. The
> >> overwhelming majority of hardware out there has no memory
> >> onlining/offlining capability after all and this adds the overhead to each
> >> call to alloc_pages_node.
> >>
> >> Make this dependo n CONFIG_VM_DEBUG or some such thing?
> >>
> > 
> > Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
> > placement has changed somewhat because of the new __alloc_pages_node().  I 
> > think
> > 
> > 	else if (VM_WARN_ON(!node_online(nid)))
> > 		nid = numa_mem_id();
> > 
> > should be fine since it only triggers for CONFIG_DEBUG_VM.
> 
> Um, so on your original suggestion I thought that you assumed that the condition
> inside VM_WARN_ON is evaluated regardless of CONFIG_DEBUG_VM, it just will or
> will not generate a warning. Which is how BUG_ON works, but VM_WARN_ON (and
> VM_BUG_ON) doesn't. IIUC VM_WARN_ON() with !CONFIG_DEBUG_VM will always be false.

Right, that's what Christoph is also suggesting.  VM_WARN_ON without 
CONFIG_DEBUG_VM should permit the compiler to check the expression but not 
generate any code and we don't want to check node_online() here for every 
allocation, it's only a debugging measure.

> Because I didn't think you would suggest the "nid = numa_mem_id()" for
> !node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
> seems that you do suggest that? I would understand if the fixup (correcting an
> offline node to some that's online) was done regardless of DEBUG_VM, and
> DEBUG_VM just switched between silent and noisy fixup. But having a debug option
> alter the outcome seems wrong?

Hmm, not sure why this is surprising, I don't expect people to deploy 
production kernels with CONFIG_DEBUG_VM enabled, it's far too expensive.  
I was expecting they would enable it for, well... debug :)

In that case, if nid is a valid node but offline, then the nid = 
numa_mem_id() fixup seems fine to allow the kernel to continue debugging.

When a node is offlined as a result of memory hotplug, the pgdat doesn't 
get freed so it can be onlined later.  Thus, alloc_pages_node() with an 
offline node and !CONFIG_DEBUG_VM may not panic.  If it does, this can 
probably be removed because we're covered.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-24 23:06           ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 23:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> >>> index 531c72d..104a027 100644
> >>> --- a/include/linux/gfp.h
> >>> +++ b/include/linux/gfp.h
> >>> @@ -321,8 +321,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> >>>  						unsigned int order)
> >>>  {
> >>>  	/* Unknown node is current (or closest) node */
> >>> -	if (nid == NUMA_NO_NODE)
> >>> +	if (nid == NUMA_NO_NODE) {
> >>>  		nid = numa_mem_id();
> >>> +	} else if (!node_online(nid)) {
> >>> +		VM_WARN_ON(!node_online(nid));
> >>> +		nid = numa_mem_id();
> >>> +	}
> >>
> >> I would think you would only want this for debugging purposes. The
> >> overwhelming majority of hardware out there has no memory
> >> onlining/offlining capability after all and this adds the overhead to each
> >> call to alloc_pages_node.
> >>
> >> Make this dependo n CONFIG_VM_DEBUG or some such thing?
> >>
> > 
> > Yeah, the suggestion was for VM_WARN_ON() in the conditional, but the 
> > placement has changed somewhat because of the new __alloc_pages_node().  I 
> > think
> > 
> > 	else if (VM_WARN_ON(!node_online(nid)))
> > 		nid = numa_mem_id();
> > 
> > should be fine since it only triggers for CONFIG_DEBUG_VM.
> 
> Um, so on your original suggestion I thought that you assumed that the condition
> inside VM_WARN_ON is evaluated regardless of CONFIG_DEBUG_VM, it just will or
> will not generate a warning. Which is how BUG_ON works, but VM_WARN_ON (and
> VM_BUG_ON) doesn't. IIUC VM_WARN_ON() with !CONFIG_DEBUG_VM will always be false.

Right, that's what Christoph is also suggesting.  VM_WARN_ON without 
CONFIG_DEBUG_VM should permit the compiler to check the expression but not 
generate any code and we don't want to check node_online() here for every 
allocation, it's only a debugging measure.

> Because I didn't think you would suggest the "nid = numa_mem_id()" for
> !node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
> seems that you do suggest that? I would understand if the fixup (correcting an
> offline node to some that's online) was done regardless of DEBUG_VM, and
> DEBUG_VM just switched between silent and noisy fixup. But having a debug option
> alter the outcome seems wrong?

Hmm, not sure why this is surprising, I don't expect people to deploy 
production kernels with CONFIG_DEBUG_VM enabled, it's far too expensive.  
I was expecting they would enable it for, well... debug :)

In that case, if nid is a valid node but offline, then the nid = 
numa_mem_id() fixup seems fine to allow the kernel to continue debugging.

When a node is offlined as a result of memory hotplug, the pgdat doesn't 
get freed so it can be onlined later.  Thus, alloc_pages_node() with an 
offline node and !CONFIG_DEBUG_VM may not panic.  If it does, this can 
probably be removed because we're covered.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-24 20:52     ` Vlastimil Babka
@ 2015-07-24 23:09       ` David Rientjes
  -1 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 23:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> > I assume you looked at the collapse_huge_page() case and decided that it 
> > needs no modification since the gfp mask is used later for other calls?
> 
> Yeah. Not that the memcg charge parts would seem to care about __GFP_THISNODE,
> though.
> 

Hmm, not sure that memcg would ever care about __GFP_THISNODE.  I wonder 
if it make more sense to remove setting __GFP_THISNODE in 
collapse_huge_page()?  khugepaged_alloc_page() seems fine with the new 
alloc_pages_exact_node() semantics.

> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index f53838f..d139222 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
> >>  	struct page *newpage;
> >>  
> >>  	newpage = alloc_pages_exact_node(nid,
> >> -					 (GFP_HIGHUSER_MOVABLE |
> >> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
> >> -					  __GFP_NORETRY | __GFP_NOWARN) &
> >> -					 ~GFP_IOFS, 0);
> >> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
> >> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
> >>  
> >>  	return newpage;
> >>  }
> > [snip]
> > 
> > What about the alloc_pages_exact_node() in new_page_node()?
> 
> Oops, seems I missed that one. So the API seems ok otherwise?
> 

Yup!  And I believe that this patch doesn't cause any regression after the 
new_page_node() issue is fixed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-24 23:09       ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2015-07-24 23:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Greg Thelen,
	Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg, Joonsoo Kim,
	Naoya Horiguchi

On Fri, 24 Jul 2015, Vlastimil Babka wrote:

> > I assume you looked at the collapse_huge_page() case and decided that it 
> > needs no modification since the gfp mask is used later for other calls?
> 
> Yeah. Not that the memcg charge parts would seem to care about __GFP_THISNODE,
> though.
> 

Hmm, not sure that memcg would ever care about __GFP_THISNODE.  I wonder 
if it make more sense to remove setting __GFP_THISNODE in 
collapse_huge_page()?  khugepaged_alloc_page() seems fine with the new 
alloc_pages_exact_node() semantics.

> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index f53838f..d139222 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1554,10 +1554,8 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
> >>  	struct page *newpage;
> >>  
> >>  	newpage = alloc_pages_exact_node(nid,
> >> -					 (GFP_HIGHUSER_MOVABLE |
> >> -					  __GFP_THISNODE | __GFP_NOMEMALLOC |
> >> -					  __GFP_NORETRY | __GFP_NOWARN) &
> >> -					 ~GFP_IOFS, 0);
> >> +				(GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC |
> >> +				 __GFP_NORETRY | __GFP_NOWARN) & ~GFP_IOFS, 0);
> >>  
> >>  	return newpage;
> >>  }
> > [snip]
> > 
> > What about the alloc_pages_exact_node() in new_page_node()?
> 
> Oops, seems I missed that one. So the API seems ok otherwise?
> 

Yup!  And I believe that this patch doesn't cause any regression after the 
new_page_node() issue is fixed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
  2015-07-24 23:06           ` David Rientjes
@ 2015-07-27 11:29             ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-27 11:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On 07/25/2015 01:06 AM, David Rientjes wrote:
> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
>
>> Because I didn't think you would suggest the "nid = numa_mem_id()" for
>> !node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
>> seems that you do suggest that? I would understand if the fixup (correcting an
>> offline node to some that's online) was done regardless of DEBUG_VM, and
>> DEBUG_VM just switched between silent and noisy fixup. But having a debug option
>> alter the outcome seems wrong?
>
> Hmm, not sure why this is surprising, I don't expect people to deploy
> production kernels with CONFIG_DEBUG_VM enabled, it's far too expensive.
> I was expecting they would enable it for, well... debug :)

But is there any other place that does such thing for debug builds?

> In that case, if nid is a valid node but offline, then the nid =
> numa_mem_id() fixup seems fine to allow the kernel to continue debugging.
>
> When a node is offlined as a result of memory hotplug, the pgdat doesn't
> get freed so it can be onlined later.  Thus, alloc_pages_node() with an
> offline node and !CONFIG_DEBUG_VM may not panic.  If it does, this can
> probably be removed because we're covered.

I've checked, but can't say I understand the hotplug code completely... 
but it seems there are two cases
- the node was never online, and the nid passed to alloc_pages_node() is 
clearly bogus. Then there's no pgdat and it should crash on NULL pointer 
dereference. VM_WARN_ON() in __alloc_pages_node() will already catch 
this and provide more details as to what caused the crash. Fixup would 
allow "continue debugging", but it seems that having configured e.g. a 
crashdump to inspect is a better way to debug than letting the kernel 
continue?
- the node has been online in the past, so the nid pointing to an 
offline node might be due to a race with offlining. It shouldn't crash, 
and most likely the zonelist that is left untouched by the offlining 
(AFAICS) will allow fallback to other nodes. Unless there is a nodemask 
of __GFP_THIS_NODE, in which case allocation fails. Again, VM_WARN_ON() 
in __alloc_pages_node() will warn us already. I doubt the fixup is 
needed here?

So I would just drop this patch. We already have the debug warning in 
__alloc_pages_node(), and a fixup is imho just confusing.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node
@ 2015-07-27 11:29             ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-27 11:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On 07/25/2015 01:06 AM, David Rientjes wrote:
> On Fri, 24 Jul 2015, Vlastimil Babka wrote:
>
>> Because I didn't think you would suggest the "nid = numa_mem_id()" for
>> !node_online(nid) fixup would happen only for CONFIG_DEBUG_VM kernels. But it
>> seems that you do suggest that? I would understand if the fixup (correcting an
>> offline node to some that's online) was done regardless of DEBUG_VM, and
>> DEBUG_VM just switched between silent and noisy fixup. But having a debug option
>> alter the outcome seems wrong?
>
> Hmm, not sure why this is surprising, I don't expect people to deploy
> production kernels with CONFIG_DEBUG_VM enabled, it's far too expensive.
> I was expecting they would enable it for, well... debug :)

But is there any other place that does such thing for debug builds?

> In that case, if nid is a valid node but offline, then the nid =
> numa_mem_id() fixup seems fine to allow the kernel to continue debugging.
>
> When a node is offlined as a result of memory hotplug, the pgdat doesn't
> get freed so it can be onlined later.  Thus, alloc_pages_node() with an
> offline node and !CONFIG_DEBUG_VM may not panic.  If it does, this can
> probably be removed because we're covered.

I've checked, but can't say I understand the hotplug code completely... 
but it seems there are two cases
- the node was never online, and the nid passed to alloc_pages_node() is 
clearly bogus. Then there's no pgdat and it should crash on NULL pointer 
dereference. VM_WARN_ON() in __alloc_pages_node() will already catch 
this and provide more details as to what caused the crash. Fixup would 
allow "continue debugging", but it seems that having configured e.g. a 
crashdump to inspect is a better way to debug than letting the kernel 
continue?
- the node has been online in the past, so the nid pointing to an 
offline node might be due to a race with offlining. It shouldn't crash, 
and most likely the zonelist that is left untouched by the offlining 
(AFAICS) will allow fallback to other nodes. Unless there is a nodemask 
of __GFP_THIS_NODE, in which case allocation fails. Again, VM_WARN_ON() 
in __alloc_pages_node() will warn us already. I doubt the fixup is 
needed here?

So I would just drop this patch. We already have the debug warning in 
__alloc_pages_node(), and a fixup is imho just confusing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-27 15:39   ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2015-07-27 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
> +/*
> + * Allocate pages, restricting the allocation to the node given as nid. The
> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
> + * to gfp_mask.
> + */
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>  
> +	gfp_mask |= __GFP_THISNODE;
> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }

The "exact" name is currently ambiguous within the allocator API, and
it's bad that we have _exact_node() and _exact_nid() with entirely
different meanings. It'd be good to make "thisnode" refer to specific
and exclusive node requests, and "exact" to mean page allocation
chunks that are not in powers of two.

Would you consider renaming this function to alloc_pages_thisnode() as
part of this series?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-27 15:39   ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2015-07-27 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
> +/*
> + * Allocate pages, restricting the allocation to the node given as nid. The
> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
> + * to gfp_mask.
> + */
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
>  	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>  
> +	gfp_mask |= __GFP_THISNODE;
> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }

The "exact" name is currently ambiguous within the allocator API, and
it's bad that we have _exact_node() and _exact_nid() with entirely
different meanings. It'd be good to make "thisnode" refer to specific
and exclusive node requests, and "exact" to mean page allocation
chunks that are not in powers of two.

Would you consider renaming this function to alloc_pages_thisnode() as
part of this series?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-27 15:39   ` Johannes Weiner
@ 2015-07-27 15:47     ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-27 15:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On 07/27/2015 05:39 PM, Johannes Weiner wrote:
> On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
>> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>   	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>   }
>>
>> +/*
>> + * Allocate pages, restricting the allocation to the node given as nid. The
>> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
>> + * to gfp_mask.
>> + */
>>   static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>   						unsigned int order)
>>   {
>>   	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>
>> +	gfp_mask |= __GFP_THISNODE;
>> +
>>   	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>   }
>
> The "exact" name is currently ambiguous within the allocator API, and
> it's bad that we have _exact_node() and _exact_nid() with entirely
> different meanings. It'd be good to make "thisnode" refer to specific
> and exclusive node requests, and "exact" to mean page allocation
> chunks that are not in powers of two.

Ugh, good point.

> Would you consider renaming this function to alloc_pages_thisnode() as
> part of this series?

Sure, let's do it properly while at it. Yet "thisnode" is somewhat 
misleading name as it might imply the cpu's local node. The same applies 
to __GFP_THISNODE. So maybe find a better name for both? restrict_node? 
single_node?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-27 15:47     ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-07-27 15:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On 07/27/2015 05:39 PM, Johannes Weiner wrote:
> On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
>> @@ -310,11 +326,18 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>   	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>   }
>>
>> +/*
>> + * Allocate pages, restricting the allocation to the node given as nid. The
>> + * node must be valid and online. This is achieved by adding __GFP_THISNODE
>> + * to gfp_mask.
>> + */
>>   static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>   						unsigned int order)
>>   {
>>   	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>
>> +	gfp_mask |= __GFP_THISNODE;
>> +
>>   	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>   }
>
> The "exact" name is currently ambiguous within the allocator API, and
> it's bad that we have _exact_node() and _exact_nid() with entirely
> different meanings. It'd be good to make "thisnode" refer to specific
> and exclusive node requests, and "exact" to mean page allocation
> chunks that are not in powers of two.

Ugh, good point.

> Would you consider renaming this function to alloc_pages_thisnode() as
> part of this series?

Sure, let's do it properly while at it. Yet "thisnode" is somewhat 
misleading name as it might imply the cpu's local node. The same applies 
to __GFP_THISNODE. So maybe find a better name for both? restrict_node? 
single_node?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-24 14:45 ` Vlastimil Babka
@ 2015-07-29 13:30   ` Mel Gorman
  -1 siblings, 0 replies; 40+ messages in thread
From: Mel Gorman @ 2015-07-29 13:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
> The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
> allocator: do not check NUMA node ID when the caller knows the node is valid")

No gold stars for that one.

> as an optimized variant of alloc_pages_node(), that doesn't allow the node id
> to be -1. Unfortunately the name of the function can easily suggest that the
> allocation is restricted to the given node and fails otherwise. In truth, the
> node is only preferred, unless __GFP_THISNODE is passed among the gfp flags.
> 
> The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
> thp: really limit transparent hugepage allocation to local node") and
> b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").
> 
> To prevent further mistakes and provide a convenience function for allocations
> truly restricted to a node, this patch makes alloc_pages_exact_node() pass
> __GFP_THISNODE to that effect. The previous implementation of

The change of what we have now is a good idea. What you have is a solid
improvement in my view but I see there are a few different suggestions
in the thread. Based on that I think it makes sense to just destroy
alloc_pages_exact_node. In the future "exact" in the allocator API will
mean "exactly this number of pages". Use your __alloc_pages_node helper
and specify __GFP_THISNODE if the caller requires that specific node.

> alloc_pages_exact_node() is copied as __alloc_pages_node() which implies it's
> an optimized variant of __alloc_pages_node() not intended for general usage.
> All three functions are described in the comment.
> 
> Existing callers of alloc_pages_exact_node() are adjusted as follows:
> - those that explicitly pass __GFP_THISNODE keep calling
>   alloc_pages_exact_node(), but the flag is removed from the call

__alloc_pages_node(__GFP_THISNODE) would be harder to get wrong in the future

> - others are converted to call __alloc_pages_node(). Some may still pass
>   __GFP_THISNODE if they serve as wrappers that get gfp_flags from higher
>   layers.
> 
> There's exception of sba_alloc_coherent() which open-codes the check for
> nid == -1, so it is converted to use alloc_pages_node() instead. This means
> it no longer performs some VM_BUG_ON checks, but otherwise the whole patch
> makes no functional changes.
> 

In general, checks for -1 should go away, particularly with new patches.
Use NUMA_NO_NODE.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-29 13:30   ` Mel Gorman
  0 siblings, 0 replies; 40+ messages in thread
From: Mel Gorman @ 2015-07-29 13:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, David Rientjes,
	Greg Thelen, Aneesh Kumar K.V, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:23PM +0200, Vlastimil Babka wrote:
> The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
> allocator: do not check NUMA node ID when the caller knows the node is valid")

No gold stars for that one.

> as an optimized variant of alloc_pages_node(), that doesn't allow the node id
> to be -1. Unfortunately the name of the function can easily suggest that the
> allocation is restricted to the given node and fails otherwise. In truth, the
> node is only preferred, unless __GFP_THISNODE is passed among the gfp flags.
> 
> The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm,
> thp: really limit transparent hugepage allocation to local node") and
> b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node").
> 
> To prevent further mistakes and provide a convenience function for allocations
> truly restricted to a node, this patch makes alloc_pages_exact_node() pass
> __GFP_THISNODE to that effect. The previous implementation of

The change of what we have now is a good idea. What you have is a solid
improvement in my view but I see there are a few different suggestions
in the thread. Based on that I think it makes sense to just destroy
alloc_pages_exact_node. In the future "exact" in the allocator API will
mean "exactly this number of pages". Use your __alloc_pages_node helper
and specify __GFP_THISNODE if the caller requires that specific node.

> alloc_pages_exact_node() is copied as __alloc_pages_node() which implies it's
> an optimized variant of __alloc_pages_node() not intended for general usage.
> All three functions are described in the comment.
> 
> Existing callers of alloc_pages_exact_node() are adjusted as follows:
> - those that explicitly pass __GFP_THISNODE keep calling
>   alloc_pages_exact_node(), but the flag is removed from the call

__alloc_pages_node(__GFP_THISNODE) would be harder to get wrong in the future

> - others are converted to call __alloc_pages_node(). Some may still pass
>   __GFP_THISNODE if they serve as wrappers that get gfp_flags from higher
>   layers.
> 
> There's exception of sba_alloc_coherent() which open-codes the check for
> nid == -1, so it is converted to use alloc_pages_node() instead. This means
> it no longer performs some VM_BUG_ON checks, but otherwise the whole patch
> makes no functional changes.
> 

In general, checks for -1 should go away, particularly with new patches.
Use NUMA_NO_NODE.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
  2015-07-24 14:45   ` Vlastimil Babka
@ 2015-07-29 13:31     ` Mel Gorman
  -1 siblings, 0 replies; 40+ messages in thread
From: Mel Gorman @ 2015-07-29 13:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:25PM +0200, Vlastimil Babka wrote:
> numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
> 
> Suggested-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node()
@ 2015-07-29 13:31     ` Mel Gorman
  0 siblings, 0 replies; 40+ messages in thread
From: Mel Gorman @ 2015-07-29 13:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Fri, Jul 24, 2015 at 04:45:25PM +0200, Vlastimil Babka wrote:
> numa_mem_id() is able to handle allocation from CPU's on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
> 
> Suggested-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-29 13:30   ` Mel Gorman
@ 2015-07-30 14:33     ` Christoph Lameter
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2015-07-30 14:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Wed, 29 Jul 2015, Mel Gorman wrote:

> The change of what we have now is a good idea. What you have is a solid
> improvement in my view but I see there are a few different suggestions
> in the thread. Based on that I think it makes sense to just destroy
> alloc_pages_exact_node. In the future "exact" in the allocator API will
> mean "exactly this number of pages". Use your __alloc_pages_node helper
> and specify __GFP_THISNODE if the caller requires that specific node.

Yes please.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-30 14:33     ` Christoph Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2015-07-30 14:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Pekka Enberg,
	Joonsoo Kim, Naoya Horiguchi

On Wed, 29 Jul 2015, Mel Gorman wrote:

> The change of what we have now is a good idea. What you have is a solid
> improvement in my view but I see there are a few different suggestions
> in the thread. Based on that I think it makes sense to just destroy
> alloc_pages_exact_node. In the future "exact" in the allocator API will
> mean "exactly this number of pages". Use your __alloc_pages_node helper
> and specify __GFP_THISNODE if the caller requires that specific node.

Yes please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
  2015-07-29 13:30   ` Mel Gorman
@ 2015-07-30 15:14     ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2015-07-30 15:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Wed, Jul 29, 2015 at 02:30:43PM +0100, Mel Gorman wrote:
> The change of what we have now is a good idea. What you have is a solid
> improvement in my view but I see there are a few different suggestions
> in the thread. Based on that I think it makes sense to just destroy
> alloc_pages_exact_node. In the future "exact" in the allocator API will
> mean "exactly this number of pages". Use your __alloc_pages_node helper
> and specify __GFP_THISNODE if the caller requires that specific node.

+1

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE
@ 2015-07-30 15:14     ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2015-07-30 15:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton,
	David Rientjes, Greg Thelen, Aneesh Kumar K.V, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Naoya Horiguchi

On Wed, Jul 29, 2015 at 02:30:43PM +0100, Mel Gorman wrote:
> The change of what we have now is a good idea. What you have is a solid
> improvement in my view but I see there are a few different suggestions
> in the thread. Based on that I think it makes sense to just destroy
> alloc_pages_exact_node. In the future "exact" in the allocator API will
> mean "exactly this number of pages". Use your __alloc_pages_node helper
> and specify __GFP_THISNODE if the caller requires that specific node.

+1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-07-30 15:15 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 14:45 [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE Vlastimil Babka
2015-07-24 14:45 ` Vlastimil Babka
2015-07-24 14:45 ` [RFC v2 2/4] mm: unify checks in alloc_pages_node family of functions Vlastimil Babka
2015-07-24 14:45   ` Vlastimil Babka
2015-07-24 20:09   ` David Rientjes
2015-07-24 20:09     ` David Rientjes
2015-07-24 14:45 ` [RFC v2 3/4] mm: use numa_mem_id in alloc_pages_node() Vlastimil Babka
2015-07-24 14:45   ` Vlastimil Babka
2015-07-24 20:09   ` David Rientjes
2015-07-24 20:09     ` David Rientjes
2015-07-29 13:31   ` Mel Gorman
2015-07-29 13:31     ` Mel Gorman
2015-07-24 14:45 ` [RFC v2 4/4] mm: fallback for offline nodes in alloc_pages_node Vlastimil Babka
2015-07-24 14:45   ` Vlastimil Babka
2015-07-24 15:48   ` Christoph Lameter
2015-07-24 15:48     ` Christoph Lameter
2015-07-24 19:54     ` David Rientjes
2015-07-24 19:54       ` David Rientjes
2015-07-24 20:39       ` Vlastimil Babka
2015-07-24 20:39         ` Vlastimil Babka
2015-07-24 23:06         ` David Rientjes
2015-07-24 23:06           ` David Rientjes
2015-07-27 11:29           ` Vlastimil Babka
2015-07-27 11:29             ` Vlastimil Babka
2015-07-24 20:08 ` [RFC v2 1/4] mm: make alloc_pages_exact_node pass __GFP_THISNODE David Rientjes
2015-07-24 20:08   ` David Rientjes
2015-07-24 20:52   ` Vlastimil Babka
2015-07-24 20:52     ` Vlastimil Babka
2015-07-24 23:09     ` David Rientjes
2015-07-24 23:09       ` David Rientjes
2015-07-27 15:39 ` Johannes Weiner
2015-07-27 15:39   ` Johannes Weiner
2015-07-27 15:47   ` Vlastimil Babka
2015-07-27 15:47     ` Vlastimil Babka
2015-07-29 13:30 ` Mel Gorman
2015-07-29 13:30   ` Mel Gorman
2015-07-30 14:33   ` Christoph Lameter
2015-07-30 14:33     ` Christoph Lameter
2015-07-30 15:14   ` Johannes Weiner
2015-07-30 15:14     ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.