All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug
@ 2018-07-25 22:01 osalvador
  2018-07-25 22:01 ` [PATCH v3 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

This patchset does three things:

 1) Clean ups/refactor free_area_init_core/free_area_init_node
    by moving the ifdefery out of the functions.
 2) Move the pgdat/zone initialization in free_area_init_core to its
    own function.
 3) Introduce free_area_init_core_hotplug, a small subset of free_area_init_core,
    which is only called from memhotlug code path.
    In this way, we have:

    free_area_init_core: called during early initialization
    free_area_init_core_hotplug: called whenever a new node was allocated (memhotplug path)

Oscar Salvador (4):
  mm/page_alloc: Move ifdefery out of free_area_init_core
  mm/page_alloc: Inline function to handle
    CONFIG_DEFERRED_STRUCT_PAGE_INIT
  mm/page_alloc: Move initialization of node and zones to an own
    function
  mm/page_alloc: Introduce memhotplug version of free_area_init_core

Pavel Tatashin (1):
  mm: access zone->node via zone_to_nid() and zone_set_nid()

 include/linux/mm.h     |  10 +---
 include/linux/mmzone.h |  26 +++++++---
 mm/memory_hotplug.c    |  23 ++++-----
 mm/mempolicy.c         |   4 +-
 mm/mm_init.c           |   9 +---
 mm/page_alloc.c        | 132 +++++++++++++++++++++++++++++++++++--------------
 6 files changed, 129 insertions(+), 75 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core
  2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
@ 2018-07-25 22:01 ` osalvador
  2018-07-25 22:01 ` [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Moving the #ifdefs out of the function makes it easier to follow.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e357189cd24a..8a73305f7c55 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6206,6 +6206,37 @@ static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
 	return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+static void pgdat_init_numabalancing(struct pglist_data *pgdat)
+{
+	spin_lock_init(&pgdat->numabalancing_migrate_lock);
+	pgdat->numabalancing_migrate_nr_pages = 0;
+	pgdat->numabalancing_migrate_next_window = jiffies;
+}
+#else
+static void pgdat_init_numabalancing(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void pgdat_init_split_queue(struct pglist_data *pgdat)
+{
+	spin_lock_init(&pgdat->split_queue_lock);
+	INIT_LIST_HEAD(&pgdat->split_queue);
+	pgdat->split_queue_len = 0;
+}
+#else
+static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_COMPACTION
+static void pgdat_init_kcompactd(struct pglist_data *pgdat)
+{
+	init_waitqueue_head(&pgdat->kcompactd_wait);
+}
+#else
+static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
+#endif
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -6220,21 +6251,14 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	int nid = pgdat->node_id;
 
 	pgdat_resize_init(pgdat);
-#ifdef CONFIG_NUMA_BALANCING
-	spin_lock_init(&pgdat->numabalancing_migrate_lock);
-	pgdat->numabalancing_migrate_nr_pages = 0;
-	pgdat->numabalancing_migrate_next_window = jiffies;
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	spin_lock_init(&pgdat->split_queue_lock);
-	INIT_LIST_HEAD(&pgdat->split_queue);
-	pgdat->split_queue_len = 0;
-#endif
+
+	pgdat_init_numabalancing(pgdat);
+	pgdat_init_split_queue(pgdat);
+	pgdat_init_kcompactd(pgdat);
+
 	init_waitqueue_head(&pgdat->kswapd_wait);
 	init_waitqueue_head(&pgdat->pfmemalloc_wait);
-#ifdef CONFIG_COMPACTION
-	init_waitqueue_head(&pgdat->kcompactd_wait);
-#endif
+
 	pgdat_page_ext_init(pgdat);
 	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
  2018-07-25 22:01 ` [PATCH v3 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
@ 2018-07-25 22:01 ` osalvador
  2018-07-26  8:05   ` Michal Hocko
  2018-07-25 22:01 ` [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Pavel Tatashin <pasha.tatashin@oracle.com>

zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in
c files.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/mm.h     |  9 ---------
 include/linux/mmzone.h | 26 ++++++++++++++++++++------
 mm/mempolicy.c         |  4 ++--
 mm/mm_init.c           |  9 ++-------
 mm/page_alloc.c        | 10 ++++------
 5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 726e71475144..6954ad183159 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -940,15 +940,6 @@ static inline int page_zone_id(struct page *page)
 	return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
 }
 
-static inline int zone_to_nid(struct zone *zone)
-{
-#ifdef CONFIG_NUMA
-	return zone->node;
-#else
-	return 0;
-#endif
-}
-
 #ifdef NODE_NOT_IN_PAGE_FLAGS
 extern int page_to_nid(const struct page *page);
 #else
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae1a034c3e2c..17fdff3bfb41 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -842,6 +842,25 @@ static inline bool populated_zone(struct zone *zone)
 	return zone->present_pages;
 }
 
+#ifdef CONFIG_NUMA
+static inline int zone_to_nid(struct zone *zone)
+{
+	return zone->node;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid)
+{
+	zone->node = nid;
+}
+#else
+static inline int zone_to_nid(struct zone *zone)
+{
+	return 0;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid) {}
+#endif
+
 extern int movable_zone;
 
 #ifdef CONFIG_HIGHMEM
@@ -957,12 +976,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)
 
 static inline int zonelist_node_idx(struct zoneref *zoneref)
 {
-#ifdef CONFIG_NUMA
-	/* zone_to_nid not available in this context */
-	return zoneref->zone->node;
-#else
-	return 0;
-#endif /* CONFIG_NUMA */
+	return zone_to_nid(zoneref->zone);
 }
 
 struct zoneref *__next_zones_zonelist(struct zoneref *z,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f0fcf70bcec7..8c1c09b3852a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
 		zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
 		z = first_zones_zonelist(zonelist, highest_zoneidx,
 							&policy->v.nodes);
-		return z->zone ? z->zone->node : node;
+		return z->zone ? zone_to_nid(z->zone) : node;
 	}
 
 	default:
@@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 				node_zonelist(numa_node_id(), GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
 				&pol->v.nodes);
-		polnid = z->zone->node;
+		polnid = zone_to_nid(z->zone);
 		break;
 
 	default:
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5b72266b4b03..6838a530789b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
 				zone->name);
 
 			/* Iterate the zonelist */
-			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
-#ifdef CONFIG_NUMA
-				pr_cont("%d:%s ", zone->node, zone->name);
-#else
-				pr_cont("0:%s ", zone->name);
-#endif /* CONFIG_NUMA */
-			}
+			for_each_zone_zonelist(zone, z, zonelist, zoneid)
+				pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
 			pr_cont("\n");
 		}
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a73305f7c55..10b754fba5fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2909,10 +2909,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 	if (!static_branch_likely(&vm_numa_stat_key))
 		return;
 
-	if (z->node != numa_node_id())
+	if (zone_to_nid(z) != numa_node_id())
 		local_stat = NUMA_OTHER;
 
-	if (z->node == preferred_zone->node)
+	if (zone_to_nid(z) == zone_to_nid(preferred_zone))
 		__inc_numa_state(z, NUMA_HIT);
 	else {
 		__inc_numa_state(z, NUMA_MISS);
@@ -5287,7 +5287,7 @@ int local_memory_node(int node)
 	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
 				   gfp_zone(GFP_KERNEL),
 				   NULL);
-	return z->zone->node;
+	return zone_to_nid(z->zone);
 }
 #endif
 
@@ -6311,9 +6311,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * And all highmem pages will be managed by the buddy system.
 		 */
 		zone->managed_pages = freesize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
-#endif
+		zone_set_nid(zone, nid);
 		zone->name = zone_names[j];
 		zone->zone_pgdat = pgdat;
 		spin_lock_init(&zone->lock);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
  2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
  2018-07-25 22:01 ` [PATCH v3 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
  2018-07-25 22:01 ` [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
@ 2018-07-25 22:01 ` osalvador
  2018-07-26 15:17   ` Pavel Tatashin
  2018-07-25 22:01 ` [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function osalvador
  2018-07-25 22:01 ` [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core osalvador
  4 siblings, 1 reply; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT
to an inline function.
Not having an ifdef in the function makes the code more readable.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10b754fba5fa..4e84a17a5030 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6376,6 +6376,21 @@ static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
 static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { }
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
+{
+	/*
+	 * We start only with one section of pages, more pages are added as
+	 * needed until the rest of deferred pages are initialized.
+	 */
+	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
+						pgdat->node_spanned_pages);
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+#else
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
+#endif
+
 void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 		unsigned long node_start_pfn, unsigned long *zholes_size)
 {
@@ -6401,16 +6416,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 				  zones_size, zholes_size);
 
 	alloc_node_mem_map(pgdat);
+	pgdat_set_deferred_range(pgdat);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-	/*
-	 * We start only with one section of pages, more pages are added as
-	 * needed until the rest of deferred pages are initialized.
-	 */
-	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
-					 pgdat->node_spanned_pages);
-	pgdat->first_deferred_pfn = ULONG_MAX;
-#endif
 	free_area_init_core(pgdat);
 }
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function
  2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
                   ` (2 preceding siblings ...)
  2018-07-25 22:01 ` [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
@ 2018-07-25 22:01 ` osalvador
  2018-07-26  8:12   ` Michal Hocko
  2018-07-25 22:01 ` [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core osalvador
  4 siblings, 1 reply; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Currently, whenever a new node is created/re-used from the memhotplug path,
we call free_area_init_node()->free_area_init_core().
But there is some code that we do not really need to run when we are coming
from such path.

free_area_init_core() performs the following actions:

1) Initializes pgdat internals, such as spinlock, waitqueues and more.
2) Account # nr_all_pages and nr_kernel_pages. These values are used later on
   when creating hash tables.
3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages.
4) Initializes some fields of the zone structure data
5) Calls init_currently_empty_zone to initialize all the freelists
6) Calls memmap_init to initialize all pages belonging to certain zone

When called from memhotplug path, free_area_init_core() only performs actions #1 and #4.

Action #2 is pointless as the zones do not have any pages since either the node was freed,
or we are re-using it, eitherway all zones belonging to this node should have 0 pages.
For the same reason, action #3 results always in manages_pages being 0.

Action #5 and #6 are performed later on when onlining the pages:
 online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
 online_pages()->move_pfn_range_to_zone()->memmap_init_zone()

This patch moves the node/zone initializtion to their own function, so it allows us
to create a small version of free_area_init_core(next patch), where we only perform:

1) Initialization of pgdat internals, such as spinlock, waitqueues and more
4) Initialization of some fields of the zone structure data

This patch only introduces these two functions.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e84a17a5030..a455dc85da19 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6237,21 +6237,9 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
 static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
 #endif
 
-/*
- * Set up the zone data structures:
- *   - mark all pages reserved
- *   - mark all memory queues empty
- *   - clear the memory bitmaps
- *
- * NOTE: pgdat should get zeroed by caller.
- */
-static void __paginginit free_area_init_core(struct pglist_data *pgdat)
+static void __paginginit pgdat_init_internals(struct pglist_data *pgdat)
 {
-	enum zone_type j;
-	int nid = pgdat->node_id;
-
 	pgdat_resize_init(pgdat);
-
 	pgdat_init_numabalancing(pgdat);
 	pgdat_init_split_queue(pgdat);
 	pgdat_init_kcompactd(pgdat);
@@ -6262,7 +6250,35 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	pgdat_page_ext_init(pgdat);
 	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
+}
+
+static void zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+							unsigned long remaining_pages)
+{
+	zone->managed_pages = remaining_pages;
+	zone_set_nid(zone, nid);
+	zone->name = zone_names[idx];
+	zone->zone_pgdat = NODE_DATA(nid);
+	spin_lock_init(&zone->lock);
+	zone_seqlock_init(zone);
+	zone_pcp_init(zone);
+}
+
+/*
+ * Set up the zone data structures:
+ *   - mark all pages reserved
+ *   - mark all memory queues empty
+ *   - clear the memory bitmaps
+ *
+ * NOTE: pgdat should get zeroed by caller.
+ * NOTE: this function is only called during early init.
+ */
+static void __paginginit free_area_init_core(struct pglist_data *pgdat)
+{
+	enum zone_type j;
+	int nid = pgdat->node_id;
 
+	pgdat_init_internals(pgdat);
 	pgdat->per_cpu_nodestats = &boot_nodestats;
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
@@ -6310,13 +6326,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone->managed_pages = freesize;
-		zone_set_nid(zone, nid);
-		zone->name = zone_names[j];
-		zone->zone_pgdat = pgdat;
-		spin_lock_init(&zone->lock);
-		zone_seqlock_init(zone);
-		zone_pcp_init(zone);
+		zone_init_internals(zone, j, nid, freesize);
 
 		if (!size)
 			continue;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core
  2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
                   ` (3 preceding siblings ...)
  2018-07-25 22:01 ` [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function osalvador
@ 2018-07-25 22:01 ` osalvador
  2018-07-26  7:38   ` Oscar Salvador
  2018-07-26 12:02   ` Oscar Salvador
  4 siblings, 2 replies; 21+ messages in thread
From: osalvador @ 2018-07-25 22:01 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Currently, we call free_area_init_node() from the memhotplug path.
In there, we set some pgdat's fields, and call calculate_node_totalpages().
calculate_node_totalpages() calculates the # of pages the node has.

Since the node is either new, or we are re-using it, the zones belonging to
this node should not have any pages, so there is no point to calculate this now.

Actually, we set the values to 0 later on with the calls to:

reset_node_managed_pages()
reset_node_present_pages()

The # of pages per zone and the # of pages per zone will be calculated later on
when onlining the pages:

online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()

This introduces the memhotplug version of free_area_init_core and makes
hotadd_new_pgdat use it.

This function will only be called from memhotplug path:

hotadd_new_pgdat()->free_area_init_core_hotplug().

free_area_init_core_hotplug() performs only a subset of the actions that
free_area_init_core_hotplug() does.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/mm.h  |  1 +
 mm/memory_hotplug.c | 23 +++++++++--------------
 mm/page_alloc.c     | 19 +++++++++++++++++++
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6954ad183159..20430becd908 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2001,6 +2001,7 @@ extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, unsigned long * zones_size,
 		unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
+extern void free_area_init_core_hotplug(int nid);
 
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4eb6e824a80c..bef8a3f7a760 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -978,12 +978,11 @@ static void reset_node_present_pages(pg_data_t *pgdat)
 	pgdat->node_present_pages = 0;
 }
 
+
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
 static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 {
 	struct pglist_data *pgdat;
-	unsigned long zones_size[MAX_NR_ZONES] = {0};
-	unsigned long zholes_size[MAX_NR_ZONES] = {0};
 	unsigned long start_pfn = PFN_DOWN(start);
 
 	pgdat = NODE_DATA(nid);
@@ -1006,8 +1005,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 
 	/* we can use NODE_DATA(nid) from here */
 
+	pgdat->node_id = nid;
+	pgdat->node_start_pfn = start_pfn;
+
 	/* init node's zones as empty zones, we don't have any present pages.*/
-	free_area_init_node(nid, zones_size, start_pfn, zholes_size);
+	free_area_init_core_hotplug(nid);
 	pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
 
 	/*
@@ -1017,18 +1019,11 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 	build_all_zonelists(pgdat);
 
 	/*
-	 * zone->managed_pages is set to an approximate value in
-	 * free_area_init_core(), which will cause
-	 * /sys/device/system/node/nodeX/meminfo has wrong data.
-	 * So reset it to 0 before any memory is onlined.
-	 */
+         * When memory is hot-added, all the memory is in offline state. So
+         * clear all zones' present_pages because they will be updated in
+         * online_pages() and offline_pages().
+         */
 	reset_node_managed_pages(pgdat);
-
-	/*
-	 * When memory is hot-added, all the memory is in offline state. So
-	 * clear all zones' present_pages because they will be updated in
-	 * online_pages() and offline_pages().
-	 */
 	reset_node_present_pages(pgdat);
 
 	return pgdat;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a455dc85da19..a36b4db26b50 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6265,6 +6265,25 @@ static void zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
 }
 
 /*
+ * Set up the zone data structures
+ * - init pgdat internals
+ * - init all zones belonging to this node
+ *
+ * NOTE: this function is only called during memory hotplug
+ */
+void __paginginit free_area_init_core_hotplug(int nid)
+{
+	enum zone_type j;
+	pg_data_t *pgdat = NODE_DATA(nid);
+
+	pgdat_init_internals(pgdat);
+	for(j = 0; j < MAX_NR_ZONES; j++) {
+		struct zone *zone = pgdat->node_zones + j;
+		zone_init_internals(zone, j, nid, 0);
+	}
+}
+
+/*
  * Set up the zone data structures:
  *   - mark all pages reserved
  *   - mark all memory queues empty
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core
  2018-07-25 22:01 ` [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core osalvador
@ 2018-07-26  7:38   ` Oscar Salvador
  2018-07-26 12:02   ` Oscar Salvador
  1 sibling, 0 replies; 21+ messages in thread
From: Oscar Salvador @ 2018-07-26  7:38 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

On Thu, Jul 26, 2018 at 12:01:44AM +0200, osalvador@techadventures.net wrote:
>  extern void free_initmem(void);
> +extern void free_area_init_core_hotplug(int nid);

The declaration should be wrapped with an #ifdef CONFIG_MEMORY_HOTPLUG.

> +void __paginginit free_area_init_core_hotplug(int nid)
> +{
> +	enum zone_type j;
> +	pg_data_t *pgdat = NODE_DATA(nid);
> +
> +	pgdat_init_internals(pgdat);
> +	for(j = 0; j < MAX_NR_ZONES; j++) {
> +		struct zone *zone = pgdat->node_zones + j;
> +		zone_init_internals(zone, j, nid, 0);
> +	}
> +}

The same applies here

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-25 22:01 ` [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
@ 2018-07-26  8:05   ` Michal Hocko
  2018-07-26  8:12     ` Oscar Salvador
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2018-07-26  8:05 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, vbabka, pasha.tatashin, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

On Thu 26-07-18 00:01:41, osalvador@techadventures.net wrote:
> From: Pavel Tatashin <pasha.tatashin@oracle.com>
> 
> zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
> have inline functions to access this field in order to avoid ifdef's in
> c files.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

My previous [1] question is not addressed in the changelog but I will
not insist. If there is any reason to resubmit this then please
consider.

Acked-by: Michal Hocko <mhocko@suse.com>

[1] http://lkml.kernel.org/r/20180719134018.GB7193@dhcp22.suse.cz

> ---
>  include/linux/mm.h     |  9 ---------
>  include/linux/mmzone.h | 26 ++++++++++++++++++++------
>  mm/mempolicy.c         |  4 ++--
>  mm/mm_init.c           |  9 ++-------
>  mm/page_alloc.c        | 10 ++++------
>  5 files changed, 28 insertions(+), 30 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 726e71475144..6954ad183159 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -940,15 +940,6 @@ static inline int page_zone_id(struct page *page)
>  	return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
>  }
>  
> -static inline int zone_to_nid(struct zone *zone)
> -{
> -#ifdef CONFIG_NUMA
> -	return zone->node;
> -#else
> -	return 0;
> -#endif
> -}
> -
>  #ifdef NODE_NOT_IN_PAGE_FLAGS
>  extern int page_to_nid(const struct page *page);
>  #else
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae1a034c3e2c..17fdff3bfb41 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -842,6 +842,25 @@ static inline bool populated_zone(struct zone *zone)
>  	return zone->present_pages;
>  }
>  
> +#ifdef CONFIG_NUMA
> +static inline int zone_to_nid(struct zone *zone)
> +{
> +	return zone->node;
> +}
> +
> +static inline void zone_set_nid(struct zone *zone, int nid)
> +{
> +	zone->node = nid;
> +}
> +#else
> +static inline int zone_to_nid(struct zone *zone)
> +{
> +	return 0;
> +}
> +
> +static inline void zone_set_nid(struct zone *zone, int nid) {}
> +#endif
> +
>  extern int movable_zone;
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -957,12 +976,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)
>  
>  static inline int zonelist_node_idx(struct zoneref *zoneref)
>  {
> -#ifdef CONFIG_NUMA
> -	/* zone_to_nid not available in this context */
> -	return zoneref->zone->node;
> -#else
> -	return 0;
> -#endif /* CONFIG_NUMA */
> +	return zone_to_nid(zoneref->zone);
>  }
>  
>  struct zoneref *__next_zones_zonelist(struct zoneref *z,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f0fcf70bcec7..8c1c09b3852a 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
>  		zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
>  		z = first_zones_zonelist(zonelist, highest_zoneidx,
>  							&policy->v.nodes);
> -		return z->zone ? z->zone->node : node;
> +		return z->zone ? zone_to_nid(z->zone) : node;
>  	}
>  
>  	default:
> @@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
>  				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>  				gfp_zone(GFP_HIGHUSER),
>  				&pol->v.nodes);
> -		polnid = z->zone->node;
> +		polnid = zone_to_nid(z->zone);
>  		break;
>  
>  	default:
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 5b72266b4b03..6838a530789b 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
>  				zone->name);
>  
>  			/* Iterate the zonelist */
> -			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
> -#ifdef CONFIG_NUMA
> -				pr_cont("%d:%s ", zone->node, zone->name);
> -#else
> -				pr_cont("0:%s ", zone->name);
> -#endif /* CONFIG_NUMA */
> -			}
> +			for_each_zone_zonelist(zone, z, zonelist, zoneid)
> +				pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
>  			pr_cont("\n");
>  		}
>  	}
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8a73305f7c55..10b754fba5fa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2909,10 +2909,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
>  	if (!static_branch_likely(&vm_numa_stat_key))
>  		return;
>  
> -	if (z->node != numa_node_id())
> +	if (zone_to_nid(z) != numa_node_id())
>  		local_stat = NUMA_OTHER;
>  
> -	if (z->node == preferred_zone->node)
> +	if (zone_to_nid(z) == zone_to_nid(preferred_zone))
>  		__inc_numa_state(z, NUMA_HIT);
>  	else {
>  		__inc_numa_state(z, NUMA_MISS);
> @@ -5287,7 +5287,7 @@ int local_memory_node(int node)
>  	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
>  				   gfp_zone(GFP_KERNEL),
>  				   NULL);
> -	return z->zone->node;
> +	return zone_to_nid(z->zone);
>  }
>  #endif
>  
> @@ -6311,9 +6311,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  		 * And all highmem pages will be managed by the buddy system.
>  		 */
>  		zone->managed_pages = freesize;
> -#ifdef CONFIG_NUMA
> -		zone->node = nid;
> -#endif
> +		zone_set_nid(zone, nid);
>  		zone->name = zone_names[j];
>  		zone->zone_pgdat = pgdat;
>  		spin_lock_init(&zone->lock);
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function
  2018-07-25 22:01 ` [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function osalvador
@ 2018-07-26  8:12   ` Michal Hocko
  2018-07-26 15:35     ` Pavel Tatashin
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2018-07-26  8:12 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, vbabka, pasha.tatashin, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

On Thu 26-07-18 00:01:43, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> Currently, whenever a new node is created/re-used from the memhotplug path,
> we call free_area_init_node()->free_area_init_core().
> But there is some code that we do not really need to run when we are coming
> from such path.
> 
> free_area_init_core() performs the following actions:
> 
> 1) Initializes pgdat internals, such as spinlock, waitqueues and more.
> 2) Account # nr_all_pages and nr_kernel_pages. These values are used later on
>    when creating hash tables.
> 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages.
> 4) Initializes some fields of the zone structure data
> 5) Calls init_currently_empty_zone to initialize all the freelists
> 6) Calls memmap_init to initialize all pages belonging to certain zone
> 
> When called from memhotplug path, free_area_init_core() only performs actions #1 and #4.
> 
> Action #2 is pointless as the zones do not have any pages since either the node was freed,
> or we are re-using it, eitherway all zones belonging to this node should have 0 pages.
> For the same reason, action #3 results always in manages_pages being 0.
> 
> Action #5 and #6 are performed later on when onlining the pages:
>  online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
>  online_pages()->move_pfn_range_to_zone()->memmap_init_zone()
> 
> This patch moves the node/zone initializtion to their own function, so it allows us
> to create a small version of free_area_init_core(next patch), where we only perform:
> 
> 1) Initialization of pgdat internals, such as spinlock, waitqueues and more
> 4) Initialization of some fields of the zone structure data
> 
> This patch only introduces these two functions.

OK, this looks definitely better. I will have to check that all the
required state is initialized properly. Considering the above
explanation I would simply fold the follow up patch into this one. It is
not so large it would get hard to review and you would make it clear why
the work is done.

> +/*
> + * Set up the zone data structures:
> + *   - mark all pages reserved
> + *   - mark all memory queues empty
> + *   - clear the memory bitmaps
> + *
> + * NOTE: pgdat should get zeroed by caller.
> + * NOTE: this function is only called during early init.
> + */
> +static void __paginginit free_area_init_core(struct pglist_data *pgdat)

now that this function is called only from the early init code we can
make it s@__paginginit@__init@ AFAICS.

> +{
> +	enum zone_type j;
> +	int nid = pgdat->node_id;
>  
> +	pgdat_init_internals(pgdat);
>  	pgdat->per_cpu_nodestats = &boot_nodestats;
>  
>  	for (j = 0; j < MAX_NR_ZONES; j++) {
> @@ -6310,13 +6326,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  		 * when the bootmem allocator frees pages into the buddy system.
>  		 * And all highmem pages will be managed by the buddy system.
>  		 */
> -		zone->managed_pages = freesize;
> -		zone_set_nid(zone, nid);
> -		zone->name = zone_names[j];
> -		zone->zone_pgdat = pgdat;
> -		spin_lock_init(&zone->lock);
> -		zone_seqlock_init(zone);
> -		zone_pcp_init(zone);
> +		zone_init_internals(zone, j, nid, freesize);
>  
>  		if (!size)
>  			continue;
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26  8:05   ` Michal Hocko
@ 2018-07-26  8:12     ` Oscar Salvador
  2018-07-26 15:14         ` Pavel Tatashin
  0 siblings, 1 reply; 21+ messages in thread
From: Oscar Salvador @ 2018-07-26  8:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, vbabka, pasha.tatashin, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

On Thu, Jul 26, 2018 at 10:05:00AM +0200, Michal Hocko wrote:
> On Thu 26-07-18 00:01:41, osalvador@techadventures.net wrote:
> > From: Pavel Tatashin <pasha.tatashin@oracle.com>
> > 
> > zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
> > have inline functions to access this field in order to avoid ifdef's in
> > c files.
> > 
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 
> My previous [1] question is not addressed in the changelog but I will
> not insist. If there is any reason to resubmit this then please
> consider.

Oh, sorry, I missed that.
If I resubmit a new version, I can include the information about
opengrok, although it would be better if Pavel comments on it,
as I have no clue about the software.

If not, maybe Andrew can grab it?

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core
  2018-07-25 22:01 ` [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core osalvador
  2018-07-26  7:38   ` Oscar Salvador
@ 2018-07-26 12:02   ` Oscar Salvador
  1 sibling, 0 replies; 21+ messages in thread
From: Oscar Salvador @ 2018-07-26 12:02 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, pasha.tatashin, mgorman, aaron.lu,
	iamjoonsoo.kim, linux-mm, linux-kernel, dan.j.williams,
	Oscar Salvador

On Thu, Jul 26, 2018 at 12:01:44AM +0200, osalvador@techadventures.net wrote:
> -	 */
> +         * When memory is hot-added, all the memory is in offline state. So
> +         * clear all zones' present_pages because they will be updated in
> +         * online_pages() and offline_pages().
> +         */

Sigh..., I should have run checkpatch. Tabs are missing there


> +void __paginginit free_area_init_core_hotplug(int nid)
> +{
> +	enum zone_type j;
> +	pg_data_t *pgdat = NODE_DATA(nid);
> +
> +	pgdat_init_internals(pgdat);
> +	for(j = 0; j < MAX_NR_ZONES; j++) {

And missing a space here.

Sorry, I will fix all this up in the next re-submission once I got feedback. 

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26  8:12     ` Oscar Salvador
@ 2018-07-26 15:14         ` Pavel Tatashin
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 15:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, akpm, vbabka, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

Hi Oscar,

Below is updated patch, with comment about OpenGrok and Acked-by Michal added.

Thank you,
Pavel

From cca1b083d78d0ff99cce6dfaf12f6380d76390c7 Mon Sep 17 00:00:00 2001
From: Pavel Tatashin <pasha.tatashin@oracle.com>
Date: Thu, 26 Jul 2018 00:01:41 +0200
Subject: [PATCH] mm: access zone->node via zone_to_nid() and zone_set_nid()

zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in
c files.

OpenGrok was used to find places where zone->node is accessed. A public one
is available here: http://src.illumos.org/source/

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h     |  9 ---------
 include/linux/mmzone.h | 26 ++++++++++++++++++++------
 mm/mempolicy.c         |  4 ++--
 mm/mm_init.c           |  9 ++-------
 mm/page_alloc.c        | 10 ++++------
 5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fc14a4bec36d..446fc3e6a85b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -942,15 +942,6 @@ static inline int page_zone_id(struct page *page)
 	return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
 }
 
-static inline int zone_to_nid(struct zone *zone)
-{
-#ifdef CONFIG_NUMA
-	return zone->node;
-#else
-	return 0;
-#endif
-}
-
 #ifdef NODE_NOT_IN_PAGE_FLAGS
 extern int page_to_nid(const struct page *page);
 #else
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 83b1d11e90eb..805b990b27ab 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -841,6 +841,25 @@ static inline bool populated_zone(struct zone *zone)
 	return zone->present_pages;
 }
 
+#ifdef CONFIG_NUMA
+static inline int zone_to_nid(struct zone *zone)
+{
+	return zone->node;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid)
+{
+	zone->node = nid;
+}
+#else
+static inline int zone_to_nid(struct zone *zone)
+{
+	return 0;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid) {}
+#endif
+
 extern int movable_zone;
 
 #ifdef CONFIG_HIGHMEM
@@ -956,12 +975,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)
 
 static inline int zonelist_node_idx(struct zoneref *zoneref)
 {
-#ifdef CONFIG_NUMA
-	/* zone_to_nid not available in this context */
-	return zoneref->zone->node;
-#else
-	return 0;
-#endif /* CONFIG_NUMA */
+	return zone_to_nid(zoneref->zone);
 }
 
 struct zoneref *__next_zones_zonelist(struct zoneref *z,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f0fcf70bcec7..8c1c09b3852a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
 		zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
 		z = first_zones_zonelist(zonelist, highest_zoneidx,
 							&policy->v.nodes);
-		return z->zone ? z->zone->node : node;
+		return z->zone ? zone_to_nid(z->zone) : node;
 	}
 
 	default:
@@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 				node_zonelist(numa_node_id(), GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
 				&pol->v.nodes);
-		polnid = z->zone->node;
+		polnid = zone_to_nid(z->zone);
 		break;
 
 	default:
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5b72266b4b03..6838a530789b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
 				zone->name);
 
 			/* Iterate the zonelist */
-			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
-#ifdef CONFIG_NUMA
-				pr_cont("%d:%s ", zone->node, zone->name);
-#else
-				pr_cont("0:%s ", zone->name);
-#endif /* CONFIG_NUMA */
-			}
+			for_each_zone_zonelist(zone, z, zonelist, zoneid)
+				pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
 			pr_cont("\n");
 		}
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2dec8056a091..1e6c377ef7d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2908,10 +2908,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 	if (!static_branch_likely(&vm_numa_stat_key))
 		return;
 
-	if (z->node != numa_node_id())
+	if (zone_to_nid(z) != numa_node_id())
 		local_stat = NUMA_OTHER;
 
-	if (z->node == preferred_zone->node)
+	if (zone_to_nid(z) == zone_to_nid(preferred_zone))
 		__inc_numa_state(z, NUMA_HIT);
 	else {
 		__inc_numa_state(z, NUMA_MISS);
@@ -5277,7 +5277,7 @@ int local_memory_node(int node)
 	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
 				   gfp_zone(GFP_KERNEL),
 				   NULL);
-	return z->zone->node;
+	return zone_to_nid(z->zone);
 }
 #endif
 
@@ -6277,9 +6277,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * And all highmem pages will be managed by the buddy system.
 		 */
 		zone->managed_pages = freesize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
-#endif
+		zone_set_nid(zone, nid);
 		zone->name = zone_names[j];
 		zone->zone_pgdat = pgdat;
 		spin_lock_init(&zone->lock);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
@ 2018-07-26 15:14         ` Pavel Tatashin
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 15:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, akpm, vbabka, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

Hi Oscar,

Below is updated patch, with comment about OpenGrok and Acked-by Michal added.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
  2018-07-25 22:01 ` [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
@ 2018-07-26 15:17   ` Pavel Tatashin
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 15:17 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, mhocko, vbabka, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

On 18-07-26 00:01:42, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT
> to an inline function.
> Not having an ifdef in the function makes the code more readable.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function
  2018-07-26  8:12   ` Michal Hocko
@ 2018-07-26 15:35     ` Pavel Tatashin
  2018-07-26 19:15       ` Oscar Salvador
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 15:35 UTC (permalink / raw)
  To: mhocko
  Cc: osalvador, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

> OK, this looks definitely better. I will have to check that all the
> required state is initialized properly. Considering the above
> explanation I would simply fold the follow up patch into this one. It is
> not so large it would get hard to review and you would make it clear why
> the work is done.

I will review this work, once Oscar combines patches 4 & 5 as Michal suggested.


>
> > +/*
> > + * Set up the zone data structures:
> > + *   - mark all pages reserved
> > + *   - mark all memory queues empty
> > + *   - clear the memory bitmaps
> > + *
> > + * NOTE: pgdat should get zeroed by caller.
> > + * NOTE: this function is only called during early init.
> > + */
> > +static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>
> now that this function is called only from the early init code we can
> make it s@__paginginit@__init@ AFAICS.

True, in patch 5. Also, zone_init_internals() should be marked as __paginginit.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26 15:14         ` Pavel Tatashin
  (?)
@ 2018-07-26 16:43         ` Michal Hocko
  2018-07-26 17:18           ` Pavel Tatashin
  -1 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2018-07-26 16:43 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Oscar Salvador, akpm, vbabka, mgorman, aaron.lu, iamjoonsoo.kim,
	linux-mm, linux-kernel, dan.j.williams, Oscar Salvador

On Thu 26-07-18 11:14:20, Pavel Tatashin wrote:
> Hi Oscar,
> 
> Below is updated patch, with comment about OpenGrok and Acked-by Michal added.
> 
> Thank you,
> Pavel
> 
> >From cca1b083d78d0ff99cce6dfaf12f6380d76390c7 Mon Sep 17 00:00:00 2001
> From: Pavel Tatashin <pasha.tatashin@oracle.com>
> Date: Thu, 26 Jul 2018 00:01:41 +0200
> Subject: [PATCH] mm: access zone->node via zone_to_nid() and zone_set_nid()
> 
> zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
> have inline functions to access this field in order to avoid ifdef's in
> c files.
> 
> OpenGrok was used to find places where zone->node is accessed. A public one
> is available here: http://src.illumos.org/source/

I assume that tool uses some pattern matching or similar so steps to use
the tool to get your results would be more helpful. This is basically
the same thing as coccinelle generated patches.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26 16:43         ` Michal Hocko
@ 2018-07-26 17:18           ` Pavel Tatashin
  2018-07-26 17:52             ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 17:18 UTC (permalink / raw)
  To: mhocko
  Cc: osalvador, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

> > OpenGrok was used to find places where zone->node is accessed. A public one
> > is available here: http://src.illumos.org/source/
>
> I assume that tool uses some pattern matching or similar so steps to use
> the tool to get your results would be more helpful. This is basically
> the same thing as coccinelle generated patches.

OpenGrok is very easy to use, it is source browser, similar to cscope
except obviously you can't edit the browsed code. I could have used
cscope just as well here.

Pavel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26 17:18           ` Pavel Tatashin
@ 2018-07-26 17:52             ` Michal Hocko
  2018-07-26 17:55               ` Pavel Tatashin
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2018-07-26 17:52 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: osalvador, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

On Thu 26-07-18 13:18:46, Pavel Tatashin wrote:
> > > OpenGrok was used to find places where zone->node is accessed. A public one
> > > is available here: http://src.illumos.org/source/
> >
> > I assume that tool uses some pattern matching or similar so steps to use
> > the tool to get your results would be more helpful. This is basically
> > the same thing as coccinelle generated patches.
> 
> OpenGrok is very easy to use, it is source browser, similar to cscope
> except obviously you can't edit the browsed code. I could have used
> cscope just as well here.

OK, then I misunderstood. I thought it was some kind of c aware grep
that found all the usage for you. If this is cscope like then it is not
worth mentioning in the changelog. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26 17:52             ` Michal Hocko
@ 2018-07-26 17:55               ` Pavel Tatashin
  2018-07-26 19:15                 ` Oscar Salvador
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Tatashin @ 2018-07-26 17:55 UTC (permalink / raw)
  To: mhocko
  Cc: osalvador, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

On Thu, Jul 26, 2018 at 1:52 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 26-07-18 13:18:46, Pavel Tatashin wrote:
> > > > OpenGrok was used to find places where zone->node is accessed. A public one
> > > > is available here: http://src.illumos.org/source/
> > >
> > > I assume that tool uses some pattern matching or similar so steps to use
> > > the tool to get your results would be more helpful. This is basically
> > > the same thing as coccinelle generated patches.
> >
> > OpenGrok is very easy to use, it is source browser, similar to cscope
> > except obviously you can't edit the browsed code. I could have used
> > cscope just as well here.
>
> OK, then I misunderstood. I thought it was some kind of c aware grep
> that found all the usage for you. If this is cscope like then it is not
> worth mentioning in the changelog.

That's what I thought :) Oscar, will you remove the comment about
opengrok, or should I paste a new patch?

Thank you,
Pavel

> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function
  2018-07-26 15:35     ` Pavel Tatashin
@ 2018-07-26 19:15       ` Oscar Salvador
  0 siblings, 0 replies; 21+ messages in thread
From: Oscar Salvador @ 2018-07-26 19:15 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: mhocko, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

On Thu, Jul 26, 2018 at 11:35:35AM -0400, Pavel Tatashin wrote:
> > OK, this looks definitely better. I will have to check that all the
> > required state is initialized properly. Considering the above
> > explanation I would simply fold the follow up patch into this one. It is
> > not so large it would get hard to review and you would make it clear why
> > the work is done.
> 
> I will review this work, once Oscar combines patches 4 & 5 as Michal suggested.

I will send a new version tomorrow with some fixups and patch4 and patch5 joined.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-26 17:55               ` Pavel Tatashin
@ 2018-07-26 19:15                 ` Oscar Salvador
  0 siblings, 0 replies; 21+ messages in thread
From: Oscar Salvador @ 2018-07-26 19:15 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: mhocko, Andrew Morton, Vlastimil Babka, mgorman, aaron.lu,
	iamjoonsoo.kim, Linux Memory Management List, LKML,
	dan.j.williams, osalvador

On Thu, Jul 26, 2018 at 01:55:34PM -0400, Pavel Tatashin wrote:
> On Thu, Jul 26, 2018 at 1:52 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 26-07-18 13:18:46, Pavel Tatashin wrote:
> > > > > OpenGrok was used to find places where zone->node is accessed. A public one
> > > > > is available here: http://src.illumos.org/source/
> > > >
> > > > I assume that tool uses some pattern matching or similar so steps to use
> > > > the tool to get your results would be more helpful. This is basically
> > > > the same thing as coccinelle generated patches.
> > >
> > > OpenGrok is very easy to use, it is source browser, similar to cscope
> > > except obviously you can't edit the browsed code. I could have used
> > > cscope just as well here.
> >
> > OK, then I misunderstood. I thought it was some kind of c aware grep
> > that found all the usage for you. If this is cscope like then it is not
> > worth mentioning in the changelog.
> 
> That's what I thought :) Oscar, will you remove the comment about
> opengrok, or should I paste a new patch?

No worries, I will remove the comment ;-).

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-07-26 19:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-25 22:01 [PATCH v3 0/5] Refactor free_area_init_core and add free_area_init_core_hotplug osalvador
2018-07-25 22:01 ` [PATCH v3 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
2018-07-25 22:01 ` [PATCH v3 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
2018-07-26  8:05   ` Michal Hocko
2018-07-26  8:12     ` Oscar Salvador
2018-07-26 15:14       ` Pavel Tatashin
2018-07-26 15:14         ` Pavel Tatashin
2018-07-26 16:43         ` Michal Hocko
2018-07-26 17:18           ` Pavel Tatashin
2018-07-26 17:52             ` Michal Hocko
2018-07-26 17:55               ` Pavel Tatashin
2018-07-26 19:15                 ` Oscar Salvador
2018-07-25 22:01 ` [PATCH v3 3/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
2018-07-26 15:17   ` Pavel Tatashin
2018-07-25 22:01 ` [PATCH v3 4/5] mm/page_alloc: Move initialization of node and zones to an own function osalvador
2018-07-26  8:12   ` Michal Hocko
2018-07-26 15:35     ` Pavel Tatashin
2018-07-26 19:15       ` Oscar Salvador
2018-07-25 22:01 ` [PATCH v3 5/5] mm/page_alloc: Introduce memhotplug version of free_area_init_core osalvador
2018-07-26  7:38   ` Oscar Salvador
2018-07-26 12:02   ` Oscar Salvador

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.