All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core
@ 2018-07-19 13:27 osalvador
  2018-07-19 13:27 ` [PATCH v2 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

This patchset pretends to make free_area_init_core more readable by
moving the ifdefery to inline functions, and while we are at it,
it optimizes the function a little bit (better explained in patch 3).

Oscar Salvador (4):
  mm/page_alloc: Move ifdefery out of free_area_init_core
  mm/page_alloc: Optimize free_area_init_core
  mm/page_alloc: Inline function to handle
    CONFIG_DEFERRED_STRUCT_PAGE_INIT
  mm/page_alloc: Only call pgdat_set_deferred_range when the system
    boots

Pavel Tatashin (1):
  mm: access zone->node via zone_to_nid() and zone_set_nid()

 include/linux/mm.h     |   9 ---
 include/linux/mmzone.h |  26 ++++++--
 mm/mempolicy.c         |   4 +-
 mm/mm_init.c           |   9 +--
 mm/page_alloc.c        | 159 +++++++++++++++++++++++++++++--------------------
 5 files changed, 120 insertions(+), 87 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core
  2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
@ 2018-07-19 13:27 ` osalvador
  2018-07-19 13:27 ` [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Moving the #ifdefs out of the function makes it easier to follow.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e357189cd24a..8a73305f7c55 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6206,6 +6206,37 @@ static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
 	return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+static void pgdat_init_numabalancing(struct pglist_data *pgdat)
+{
+	spin_lock_init(&pgdat->numabalancing_migrate_lock);
+	pgdat->numabalancing_migrate_nr_pages = 0;
+	pgdat->numabalancing_migrate_next_window = jiffies;
+}
+#else
+static void pgdat_init_numabalancing(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void pgdat_init_split_queue(struct pglist_data *pgdat)
+{
+	spin_lock_init(&pgdat->split_queue_lock);
+	INIT_LIST_HEAD(&pgdat->split_queue);
+	pgdat->split_queue_len = 0;
+}
+#else
+static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
+#endif
+
+#ifdef CONFIG_COMPACTION
+static void pgdat_init_kcompactd(struct pglist_data *pgdat)
+{
+	init_waitqueue_head(&pgdat->kcompactd_wait);
+}
+#else
+static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
+#endif
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -6220,21 +6251,14 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	int nid = pgdat->node_id;
 
 	pgdat_resize_init(pgdat);
-#ifdef CONFIG_NUMA_BALANCING
-	spin_lock_init(&pgdat->numabalancing_migrate_lock);
-	pgdat->numabalancing_migrate_nr_pages = 0;
-	pgdat->numabalancing_migrate_next_window = jiffies;
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	spin_lock_init(&pgdat->split_queue_lock);
-	INIT_LIST_HEAD(&pgdat->split_queue);
-	pgdat->split_queue_len = 0;
-#endif
+
+	pgdat_init_numabalancing(pgdat);
+	pgdat_init_split_queue(pgdat);
+	pgdat_init_kcompactd(pgdat);
+
 	init_waitqueue_head(&pgdat->kswapd_wait);
 	init_waitqueue_head(&pgdat->pfmemalloc_wait);
-#ifdef CONFIG_COMPACTION
-	init_waitqueue_head(&pgdat->kcompactd_wait);
-#endif
+
 	pgdat_page_ext_init(pgdat);
 	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
  2018-07-19 13:27 ` [PATCH v2 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
@ 2018-07-19 13:27 ` osalvador
  2018-07-19 13:40   ` Michal Hocko
  2018-07-19 13:27 ` [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core osalvador
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Pavel Tatashin <pasha.tatashin@oracle.com>

zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
have inline functions to access this field in order to avoid ifdef's in
c files.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/mm.h     |  9 ---------
 include/linux/mmzone.h | 26 ++++++++++++++++++++------
 mm/mempolicy.c         |  4 ++--
 mm/mm_init.c           |  9 ++-------
 mm/page_alloc.c        | 10 ++++------
 5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 726e71475144..6954ad183159 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -940,15 +940,6 @@ static inline int page_zone_id(struct page *page)
 	return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
 }
 
-static inline int zone_to_nid(struct zone *zone)
-{
-#ifdef CONFIG_NUMA
-	return zone->node;
-#else
-	return 0;
-#endif
-}
-
 #ifdef NODE_NOT_IN_PAGE_FLAGS
 extern int page_to_nid(const struct page *page);
 #else
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae1a034c3e2c..17fdff3bfb41 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -842,6 +842,25 @@ static inline bool populated_zone(struct zone *zone)
 	return zone->present_pages;
 }
 
+#ifdef CONFIG_NUMA
+static inline int zone_to_nid(struct zone *zone)
+{
+	return zone->node;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid)
+{
+	zone->node = nid;
+}
+#else
+static inline int zone_to_nid(struct zone *zone)
+{
+	return 0;
+}
+
+static inline void zone_set_nid(struct zone *zone, int nid) {}
+#endif
+
 extern int movable_zone;
 
 #ifdef CONFIG_HIGHMEM
@@ -957,12 +976,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)
 
 static inline int zonelist_node_idx(struct zoneref *zoneref)
 {
-#ifdef CONFIG_NUMA
-	/* zone_to_nid not available in this context */
-	return zoneref->zone->node;
-#else
-	return 0;
-#endif /* CONFIG_NUMA */
+	return zone_to_nid(zoneref->zone);
 }
 
 struct zoneref *__next_zones_zonelist(struct zoneref *z,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f0fcf70bcec7..8c1c09b3852a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
 		zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
 		z = first_zones_zonelist(zonelist, highest_zoneidx,
 							&policy->v.nodes);
-		return z->zone ? z->zone->node : node;
+		return z->zone ? zone_to_nid(z->zone) : node;
 	}
 
 	default:
@@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 				node_zonelist(numa_node_id(), GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
 				&pol->v.nodes);
-		polnid = z->zone->node;
+		polnid = zone_to_nid(z->zone);
 		break;
 
 	default:
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5b72266b4b03..6838a530789b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
 				zone->name);
 
 			/* Iterate the zonelist */
-			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
-#ifdef CONFIG_NUMA
-				pr_cont("%d:%s ", zone->node, zone->name);
-#else
-				pr_cont("0:%s ", zone->name);
-#endif /* CONFIG_NUMA */
-			}
+			for_each_zone_zonelist(zone, z, zonelist, zoneid)
+				pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
 			pr_cont("\n");
 		}
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a73305f7c55..10b754fba5fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2909,10 +2909,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 	if (!static_branch_likely(&vm_numa_stat_key))
 		return;
 
-	if (z->node != numa_node_id())
+	if (zone_to_nid(z) != numa_node_id())
 		local_stat = NUMA_OTHER;
 
-	if (z->node == preferred_zone->node)
+	if (zone_to_nid(z) == zone_to_nid(preferred_zone))
 		__inc_numa_state(z, NUMA_HIT);
 	else {
 		__inc_numa_state(z, NUMA_MISS);
@@ -5287,7 +5287,7 @@ int local_memory_node(int node)
 	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
 				   gfp_zone(GFP_KERNEL),
 				   NULL);
-	return z->zone->node;
+	return zone_to_nid(z->zone);
 }
 #endif
 
@@ -6311,9 +6311,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * And all highmem pages will be managed by the buddy system.
 		 */
 		zone->managed_pages = freesize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
-#endif
+		zone_set_nid(zone, nid);
 		zone->name = zone_names[j];
 		zone->zone_pgdat = pgdat;
 		spin_lock_init(&zone->lock);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
  2018-07-19 13:27 ` [PATCH v2 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
  2018-07-19 13:27 ` [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
@ 2018-07-19 13:27 ` osalvador
  2018-07-19 13:44   ` Michal Hocko
  2018-07-19 13:27 ` [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
  2018-07-19 13:27 ` [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots osalvador
  4 siblings, 1 reply; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

In free_area_init_core we calculate the amount of managed pages
we are left with, by substracting the memmap pages and the pages
reserved for dma.
With the values left, we also account the total of kernel pages and
the total of pages.

Since memmap pages are calculated from zone->spanned_pages,
let us only do these calculcations whenever zone->spanned_pages is greather
than 0.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 73 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 35 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10b754fba5fa..f7a6f4e13f41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6237,6 +6237,40 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
 static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
 #endif
 
+static unsigned long calc_remaining_pages(enum zone_type type, unsigned long freesize,
+								unsigned long size)
+{
+	unsigned long memmap_pages = calc_memmap_size(size, freesize);
+
+	if(!is_highmem_idx(type)) {
+		if (freesize >= memmap_pages) {
+			freesize -= memmap_pages;
+			if (memmap_pages)
+				printk(KERN_DEBUG
+					"  %s zone: %lu pages used for memmap\n",
+					zone_names[type], memmap_pages);
+		} else
+			pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
+				zone_names[type], memmap_pages, freesize);
+	}
+
+	/* Account for reserved pages */
+	if (type == 0 && freesize > dma_reserve) {
+		freesize -= dma_reserve;
+		printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
+		zone_names[0], dma_reserve);
+	}
+
+	if (!is_highmem_idx(type))
+		nr_kernel_pages += freesize;
+	/* Charge for highmem memmap if there are enough kernel pages */
+	else if (nr_kernel_pages > memmap_pages * 2)
+		nr_kernel_pages -= memmap_pages;
+	nr_all_pages += freesize;
+
+	return freesize;
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -6267,43 +6301,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
-		unsigned long size, freesize, memmap_pages;
+		unsigned long size = zone->spanned_pages;
+		unsigned long freesize = zone->present_pages;
 		unsigned long zone_start_pfn = zone->zone_start_pfn;
 
-		size = zone->spanned_pages;
-		freesize = zone->present_pages;
-
-		/*
-		 * Adjust freesize so that it accounts for how much memory
-		 * is used by this zone for memmap. This affects the watermark
-		 * and per-cpu initialisations
-		 */
-		memmap_pages = calc_memmap_size(size, freesize);
-		if (!is_highmem_idx(j)) {
-			if (freesize >= memmap_pages) {
-				freesize -= memmap_pages;
-				if (memmap_pages)
-					printk(KERN_DEBUG
-					       "  %s zone: %lu pages used for memmap\n",
-					       zone_names[j], memmap_pages);
-			} else
-				pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
-					zone_names[j], memmap_pages, freesize);
-		}
-
-		/* Account for reserved pages */
-		if (j == 0 && freesize > dma_reserve) {
-			freesize -= dma_reserve;
-			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
-					zone_names[0], dma_reserve);
-		}
-
-		if (!is_highmem_idx(j))
-			nr_kernel_pages += freesize;
-		/* Charge for highmem memmap if there are enough kernel pages */
-		else if (nr_kernel_pages > memmap_pages * 2)
-			nr_kernel_pages -= memmap_pages;
-		nr_all_pages += freesize;
+		if (size)
+			freesize = calc_remaining_pages(j, freesize, size);
 
 		/*
 		 * Set an approximate value for lowmem here, it will be adjusted
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
  2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
                   ` (2 preceding siblings ...)
  2018-07-19 13:27 ` [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core osalvador
@ 2018-07-19 13:27 ` osalvador
  2018-07-19 13:45   ` Michal Hocko
  2018-07-19 13:27 ` [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots osalvador
  4 siblings, 1 reply; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT
to an inline function.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f7a6f4e13f41..d77bc2a7ec2c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6379,6 +6379,21 @@ static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
 static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { }
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
+{
+	/*
+	 * We start only with one section of pages, more pages are added as
+	 * needed until the rest of deferred pages are initialized.
+	 */
+	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
+						pgdat->node_spanned_pages);
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+#else
+static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
+#endif
+
 void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 		unsigned long node_start_pfn, unsigned long *zholes_size)
 {
@@ -6404,16 +6419,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 				  zones_size, zholes_size);
 
 	alloc_node_mem_map(pgdat);
+	pgdat_set_deferred_range(pgdat);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-	/*
-	 * We start only with one section of pages, more pages are added as
-	 * needed until the rest of deferred pages are initialized.
-	 */
-	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
-					 pgdat->node_spanned_pages);
-	pgdat->first_deferred_pfn = ULONG_MAX;
-#endif
 	free_area_init_core(pgdat);
 }
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
                   ` (3 preceding siblings ...)
  2018-07-19 13:27 ` [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
@ 2018-07-19 13:27 ` osalvador
  2018-07-19 13:46   ` Michal Hocko
  4 siblings, 1 reply; 23+ messages in thread
From: osalvador @ 2018-07-19 13:27 UTC (permalink / raw)
  To: akpm
  Cc: pasha.tatashin, mhocko, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

We should only care about deferred initialization when booting.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d77bc2a7ec2c..5911b64a88ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6419,7 +6419,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 				  zones_size, zholes_size);
 
 	alloc_node_mem_map(pgdat);
-	pgdat_set_deferred_range(pgdat);
+	if (system_state == SYSTEM_BOOTING)
+		pgdat_set_deferred_range(pgdat);
 
 	free_area_init_core(pgdat);
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-19 13:27 ` [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
@ 2018-07-19 13:40   ` Michal Hocko
  2018-07-19 13:44     ` Pavel Tatashin
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 13:40 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 15:27:37, osalvador@techadventures.net wrote:
> From: Pavel Tatashin <pasha.tatashin@oracle.com>
> 
> zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
> have inline functions to access this field in order to avoid ifdef's in
> c files.

Is this a manual find & replace or did you use some scripts?

The change makes sense, but I haven't checked that all the places are
replaced properly. If not we can replace them later.

> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/mm.h     |  9 ---------
>  include/linux/mmzone.h | 26 ++++++++++++++++++++------
>  mm/mempolicy.c         |  4 ++--
>  mm/mm_init.c           |  9 ++-------
>  mm/page_alloc.c        | 10 ++++------
>  5 files changed, 28 insertions(+), 30 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 726e71475144..6954ad183159 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -940,15 +940,6 @@ static inline int page_zone_id(struct page *page)
>  	return (page->flags >> ZONEID_PGSHIFT) & ZONEID_MASK;
>  }
>  
> -static inline int zone_to_nid(struct zone *zone)
> -{
> -#ifdef CONFIG_NUMA
> -	return zone->node;
> -#else
> -	return 0;
> -#endif
> -}
> -
>  #ifdef NODE_NOT_IN_PAGE_FLAGS
>  extern int page_to_nid(const struct page *page);
>  #else
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae1a034c3e2c..17fdff3bfb41 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -842,6 +842,25 @@ static inline bool populated_zone(struct zone *zone)
>  	return zone->present_pages;
>  }
>  
> +#ifdef CONFIG_NUMA
> +static inline int zone_to_nid(struct zone *zone)
> +{
> +	return zone->node;
> +}
> +
> +static inline void zone_set_nid(struct zone *zone, int nid)
> +{
> +	zone->node = nid;
> +}
> +#else
> +static inline int zone_to_nid(struct zone *zone)
> +{
> +	return 0;
> +}
> +
> +static inline void zone_set_nid(struct zone *zone, int nid) {}
> +#endif
> +
>  extern int movable_zone;
>  
>  #ifdef CONFIG_HIGHMEM
> @@ -957,12 +976,7 @@ static inline int zonelist_zone_idx(struct zoneref *zoneref)
>  
>  static inline int zonelist_node_idx(struct zoneref *zoneref)
>  {
> -#ifdef CONFIG_NUMA
> -	/* zone_to_nid not available in this context */
> -	return zoneref->zone->node;
> -#else
> -	return 0;
> -#endif /* CONFIG_NUMA */
> +	return zone_to_nid(zoneref->zone);
>  }
>  
>  struct zoneref *__next_zones_zonelist(struct zoneref *z,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f0fcf70bcec7..8c1c09b3852a 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1784,7 +1784,7 @@ unsigned int mempolicy_slab_node(void)
>  		zonelist = &NODE_DATA(node)->node_zonelists[ZONELIST_FALLBACK];
>  		z = first_zones_zonelist(zonelist, highest_zoneidx,
>  							&policy->v.nodes);
> -		return z->zone ? z->zone->node : node;
> +		return z->zone ? zone_to_nid(z->zone) : node;
>  	}
>  
>  	default:
> @@ -2326,7 +2326,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
>  				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>  				gfp_zone(GFP_HIGHUSER),
>  				&pol->v.nodes);
> -		polnid = z->zone->node;
> +		polnid = zone_to_nid(z->zone);
>  		break;
>  
>  	default:
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 5b72266b4b03..6838a530789b 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -53,13 +53,8 @@ void __init mminit_verify_zonelist(void)
>  				zone->name);
>  
>  			/* Iterate the zonelist */
> -			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
> -#ifdef CONFIG_NUMA
> -				pr_cont("%d:%s ", zone->node, zone->name);
> -#else
> -				pr_cont("0:%s ", zone->name);
> -#endif /* CONFIG_NUMA */
> -			}
> +			for_each_zone_zonelist(zone, z, zonelist, zoneid)
> +				pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
>  			pr_cont("\n");
>  		}
>  	}
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8a73305f7c55..10b754fba5fa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2909,10 +2909,10 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
>  	if (!static_branch_likely(&vm_numa_stat_key))
>  		return;
>  
> -	if (z->node != numa_node_id())
> +	if (zone_to_nid(z) != numa_node_id())
>  		local_stat = NUMA_OTHER;
>  
> -	if (z->node == preferred_zone->node)
> +	if (zone_to_nid(z) == zone_to_nid(preferred_zone))
>  		__inc_numa_state(z, NUMA_HIT);
>  	else {
>  		__inc_numa_state(z, NUMA_MISS);
> @@ -5287,7 +5287,7 @@ int local_memory_node(int node)
>  	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
>  				   gfp_zone(GFP_KERNEL),
>  				   NULL);
> -	return z->zone->node;
> +	return zone_to_nid(z->zone);
>  }
>  #endif
>  
> @@ -6311,9 +6311,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  		 * And all highmem pages will be managed by the buddy system.
>  		 */
>  		zone->managed_pages = freesize;
> -#ifdef CONFIG_NUMA
> -		zone->node = nid;
> -#endif
> +		zone_set_nid(zone, nid);
>  		zone->name = zone_names[j];
>  		zone->zone_pgdat = pgdat;
>  		spin_lock_init(&zone->lock);
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-19 13:40   ` Michal Hocko
@ 2018-07-19 13:44     ` Pavel Tatashin
  2018-07-19 13:47       ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Tatashin @ 2018-07-19 13:44 UTC (permalink / raw)
  To: Michal Hocko, osalvador
  Cc: akpm, vbabka, aaron.lu, iamjoonsoo.kim, linux-kernel, linux-mm,
	Oscar Salvador



On 07/19/2018 09:40 AM, Michal Hocko wrote:
> On Thu 19-07-18 15:27:37, osalvador@techadventures.net wrote:
>> From: Pavel Tatashin <pasha.tatashin@oracle.com>
>>
>> zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
>> have inline functions to access this field in order to avoid ifdef's in
>> c files.
> 
> Is this a manual find & replace or did you use some scripts?

I used opengrok:

http://src.illumos.org/source/search?q=%22zone-%3Enode%22&defs=&refs=&path=&hist=&project=linux-master

http://src.illumos.org/source/search?q=%22z-%3Enode%22&defs=&refs=&path=&hist=&project=linux-master

> 
> The change makes sense, but I haven't checked that all the places are
> replaced properly. If not we can replace them later.
> 
>> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 13:27 ` [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core osalvador
@ 2018-07-19 13:44   ` Michal Hocko
  2018-07-19 14:03     ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 13:44 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 15:27:38, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> In free_area_init_core we calculate the amount of managed pages
> we are left with, by substracting the memmap pages and the pages
> reserved for dma.
> With the values left, we also account the total of kernel pages and
> the total of pages.
> 
> Since memmap pages are calculated from zone->spanned_pages,
> let us only do these calculcations whenever zone->spanned_pages is greather
> than 0.

But why do we care? How do we test this? In other words, why is this
worth merging?

> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/page_alloc.c | 73 ++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 38 insertions(+), 35 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 10b754fba5fa..f7a6f4e13f41 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6237,6 +6237,40 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
>  static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
>  #endif
>  
> +static unsigned long calc_remaining_pages(enum zone_type type, unsigned long freesize,
> +								unsigned long size)
> +{
> +	unsigned long memmap_pages = calc_memmap_size(size, freesize);
> +
> +	if(!is_highmem_idx(type)) {
> +		if (freesize >= memmap_pages) {
> +			freesize -= memmap_pages;
> +			if (memmap_pages)
> +				printk(KERN_DEBUG
> +					"  %s zone: %lu pages used for memmap\n",
> +					zone_names[type], memmap_pages);
> +		} else
> +			pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
> +				zone_names[type], memmap_pages, freesize);
> +	}
> +
> +	/* Account for reserved pages */
> +	if (type == 0 && freesize > dma_reserve) {
> +		freesize -= dma_reserve;
> +		printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
> +		zone_names[0], dma_reserve);
> +	}
> +
> +	if (!is_highmem_idx(type))
> +		nr_kernel_pages += freesize;
> +	/* Charge for highmem memmap if there are enough kernel pages */
> +	else if (nr_kernel_pages > memmap_pages * 2)
> +		nr_kernel_pages -= memmap_pages;
> +	nr_all_pages += freesize;
> +
> +	return freesize;
> +}
> +
>  /*
>   * Set up the zone data structures:
>   *   - mark all pages reserved
> @@ -6267,43 +6301,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  
>  	for (j = 0; j < MAX_NR_ZONES; j++) {
>  		struct zone *zone = pgdat->node_zones + j;
> -		unsigned long size, freesize, memmap_pages;
> +		unsigned long size = zone->spanned_pages;
> +		unsigned long freesize = zone->present_pages;
>  		unsigned long zone_start_pfn = zone->zone_start_pfn;
>  
> -		size = zone->spanned_pages;
> -		freesize = zone->present_pages;
> -
> -		/*
> -		 * Adjust freesize so that it accounts for how much memory
> -		 * is used by this zone for memmap. This affects the watermark
> -		 * and per-cpu initialisations
> -		 */
> -		memmap_pages = calc_memmap_size(size, freesize);
> -		if (!is_highmem_idx(j)) {
> -			if (freesize >= memmap_pages) {
> -				freesize -= memmap_pages;
> -				if (memmap_pages)
> -					printk(KERN_DEBUG
> -					       "  %s zone: %lu pages used for memmap\n",
> -					       zone_names[j], memmap_pages);
> -			} else
> -				pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
> -					zone_names[j], memmap_pages, freesize);
> -		}
> -
> -		/* Account for reserved pages */
> -		if (j == 0 && freesize > dma_reserve) {
> -			freesize -= dma_reserve;
> -			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
> -					zone_names[0], dma_reserve);
> -		}
> -
> -		if (!is_highmem_idx(j))
> -			nr_kernel_pages += freesize;
> -		/* Charge for highmem memmap if there are enough kernel pages */
> -		else if (nr_kernel_pages > memmap_pages * 2)
> -			nr_kernel_pages -= memmap_pages;
> -		nr_all_pages += freesize;
> +		if (size)
> +			freesize = calc_remaining_pages(j, freesize, size);
>  
>  		/*
>  		 * Set an approximate value for lowmem here, it will be adjusted
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
  2018-07-19 13:27 ` [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
@ 2018-07-19 13:45   ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 13:45 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 15:27:39, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT
> to an inline function.

... to remove an ugly ifdef in the code.

Please always mention _why_ in the changelog. I can clearly see what
you've done in the diff.

> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/page_alloc.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f7a6f4e13f41..d77bc2a7ec2c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6379,6 +6379,21 @@ static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
>  static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { }
>  #endif /* CONFIG_FLAT_NODE_MEM_MAP */
>  
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
> +{
> +	/*
> +	 * We start only with one section of pages, more pages are added as
> +	 * needed until the rest of deferred pages are initialized.
> +	 */
> +	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
> +						pgdat->node_spanned_pages);
> +	pgdat->first_deferred_pfn = ULONG_MAX;
> +}
> +#else
> +static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
> +#endif
> +
>  void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  		unsigned long node_start_pfn, unsigned long *zholes_size)
>  {
> @@ -6404,16 +6419,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  				  zones_size, zholes_size);
>  
>  	alloc_node_mem_map(pgdat);
> +	pgdat_set_deferred_range(pgdat);
>  
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> -	/*
> -	 * We start only with one section of pages, more pages are added as
> -	 * needed until the rest of deferred pages are initialized.
> -	 */
> -	pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
> -					 pgdat->node_spanned_pages);
> -	pgdat->first_deferred_pfn = ULONG_MAX;
> -#endif
>  	free_area_init_core(pgdat);
>  }
>  
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 13:27 ` [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots osalvador
@ 2018-07-19 13:46   ` Michal Hocko
  2018-07-19 13:58     ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 13:46 UTC (permalink / raw)
  To: osalvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 15:27:40, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> We should only care about deferred initialization when booting.

Again why is this worth doing?
 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d77bc2a7ec2c..5911b64a88ab 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6419,7 +6419,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>  				  zones_size, zholes_size);
>  
>  	alloc_node_mem_map(pgdat);
> -	pgdat_set_deferred_range(pgdat);
> +	if (system_state == SYSTEM_BOOTING)
> +		pgdat_set_deferred_range(pgdat);
>  
>  	free_area_init_core(pgdat);
>  }
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid()
  2018-07-19 13:44     ` Pavel Tatashin
@ 2018-07-19 13:47       ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 13:47 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: osalvador, akpm, vbabka, aaron.lu, iamjoonsoo.kim, linux-kernel,
	linux-mm, Oscar Salvador

On Thu 19-07-18 09:44:09, Pavel Tatashin wrote:
> 
> 
> On 07/19/2018 09:40 AM, Michal Hocko wrote:
> > On Thu 19-07-18 15:27:37, osalvador@techadventures.net wrote:
> >> From: Pavel Tatashin <pasha.tatashin@oracle.com>
> >>
> >> zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to
> >> have inline functions to access this field in order to avoid ifdef's in
> >> c files.
> > 
> > Is this a manual find & replace or did you use some scripts?
> 
> I used opengrok:
> 
> http://src.illumos.org/source/search?q=%22zone-%3Enode%22&defs=&refs=&path=&hist=&project=linux-master
> 
> http://src.illumos.org/source/search?q=%22z-%3Enode%22&defs=&refs=&path=&hist=&project=linux-master

Then it is good to mention that in the changelog so that people might
use the same tool locally and compare the result or even learn about the
tool ;)
 
> > The change makes sense, but I haven't checked that all the places are
> > replaced properly. If not we can replace them later.
> > 
> >> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> >> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> >> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> > 
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thank you,
> Pavel

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 13:46   ` Michal Hocko
@ 2018-07-19 13:58     ` Oscar Salvador
  2018-07-19 14:03       ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2018-07-19 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu, Jul 19, 2018 at 03:46:22PM +0200, Michal Hocko wrote:
> On Thu 19-07-18 15:27:40, osalvador@techadventures.net wrote:
> > From: Oscar Salvador <osalvador@suse.de>
> > 
> > We should only care about deferred initialization when booting.
> 
> Again why is this worth doing?

Well, it is not a big win if that is what you meant.

Those two fields are only being used when dealing with deferred pages,
which only happens at boot time.

If later on, free_area_init_node gets called from memhotplug code,
we will also set the fields, although they will not be used.

Is this a problem? No, but I think it is more clear from the code if we
see when this is called.
So I would say it was only for code consistency.

If you think this this is not worth, I am ok with dropping it.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 13:58     ` Oscar Salvador
@ 2018-07-19 14:03       ` Michal Hocko
  2018-07-19 14:27         ` Pavel Tatashin
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 14:03 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 15:58:59, Oscar Salvador wrote:
> On Thu, Jul 19, 2018 at 03:46:22PM +0200, Michal Hocko wrote:
> > On Thu 19-07-18 15:27:40, osalvador@techadventures.net wrote:
> > > From: Oscar Salvador <osalvador@suse.de>
> > > 
> > > We should only care about deferred initialization when booting.
> > 
> > Again why is this worth doing?
> 
> Well, it is not a big win if that is what you meant.
> 
> Those two fields are only being used when dealing with deferred pages,
> which only happens at boot time.
> 
> If later on, free_area_init_node gets called from memhotplug code,
> we will also set the fields, although they will not be used.
> 
> Is this a problem? No, but I think it is more clear from the code if we
> see when this is called.
> So I would say it was only for code consistency.

Then put it to the changelog.

> If you think this this is not worth, I am ok with dropping it.

I am not really sure. I am not a big fan of SYSTEM_BOOTING global
thingy so I would rather not spread its usage.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 13:44   ` Michal Hocko
@ 2018-07-19 14:03     ` Oscar Salvador
  2018-07-19 15:15       ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2018-07-19 14:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu, Jul 19, 2018 at 03:44:17PM +0200, Michal Hocko wrote:
> On Thu 19-07-18 15:27:38, osalvador@techadventures.net wrote:
> > From: Oscar Salvador <osalvador@suse.de>
> > 
> > In free_area_init_core we calculate the amount of managed pages
> > we are left with, by substracting the memmap pages and the pages
> > reserved for dma.
> > With the values left, we also account the total of kernel pages and
> > the total of pages.
> > 
> > Since memmap pages are calculated from zone->spanned_pages,
> > let us only do these calculcations whenever zone->spanned_pages is greather
> > than 0.
> 
> But why do we care? How do we test this? In other words, why is this
> worth merging?
 
Uhm, unless the values are going to be updated, why do we want to go through all
comparasions/checks?
I thought it was a nice thing to have the chance to skip that block unless we are going to
update the counters.

Again, if you think this only adds complexity and no good, I can drop it.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 14:03       ` Michal Hocko
@ 2018-07-19 14:27         ` Pavel Tatashin
  2018-07-19 15:01           ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Tatashin @ 2018-07-19 14:27 UTC (permalink / raw)
  To: mhocko
  Cc: osalvador, Andrew Morton, Vlastimil Babka, aaron.lu,
	iamjoonsoo.kim, LKML, Linux Memory Management List, osalvador

On Thu, Jul 19, 2018 at 10:03 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 19-07-18 15:58:59, Oscar Salvador wrote:
> > On Thu, Jul 19, 2018 at 03:46:22PM +0200, Michal Hocko wrote:
> > > On Thu 19-07-18 15:27:40, osalvador@techadventures.net wrote:
> > > > From: Oscar Salvador <osalvador@suse.de>
> > > >
> > > > We should only care about deferred initialization when booting.
> > >
> > > Again why is this worth doing?
> >
> > Well, it is not a big win if that is what you meant.
> >
> > Those two fields are only being used when dealing with deferred pages,
> > which only happens at boot time.
> >
> > If later on, free_area_init_node gets called from memhotplug code,
> > we will also set the fields, although they will not be used.
> >
> > Is this a problem? No, but I think it is more clear from the code if we
> > see when this is called.
> > So I would say it was only for code consistency.
>
> Then put it to the changelog.
>
> > If you think this this is not worth, I am ok with dropping it.
>
> I am not really sure. I am not a big fan of SYSTEM_BOOTING global
> thingy so I would rather not spread its usage.

I agree, I do not think this patch is necessary. Calling
pgdat_set_deferred_range() does not hurt in hotplug context, and it is
cheap too. SYSTEM_BOOTING sometimes useful, but it is better to use it
only where necessary, where without this "if" we will encounter some
bugs.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots
  2018-07-19 14:27         ` Pavel Tatashin
@ 2018-07-19 15:01           ` Oscar Salvador
  0 siblings, 0 replies; 23+ messages in thread
From: Oscar Salvador @ 2018-07-19 15:01 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: mhocko, Andrew Morton, Vlastimil Babka, aaron.lu, iamjoonsoo.kim,
	LKML, Linux Memory Management List, osalvador

On Thu, Jul 19, 2018 at 10:27:44AM -0400, Pavel Tatashin wrote:
> On Thu, Jul 19, 2018 at 10:03 AM Michal Hocko <mhocko@kernel.org> wrote:
> > I am not really sure. I am not a big fan of SYSTEM_BOOTING global
> > thingy so I would rather not spread its usage.
> 
> I agree, I do not think this patch is necessary. Calling
> pgdat_set_deferred_range() does not hurt in hotplug context, and it is
> cheap too. SYSTEM_BOOTING sometimes useful, but it is better to use it
> only where necessary, where without this "if" we will encounter some
> bugs.

Ok, let us drop it then ;-).

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 14:03     ` Oscar Salvador
@ 2018-07-19 15:15       ` Michal Hocko
  2018-07-19 20:52         ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-19 15:15 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 16:03:27, Oscar Salvador wrote:
> On Thu, Jul 19, 2018 at 03:44:17PM +0200, Michal Hocko wrote:
> > On Thu 19-07-18 15:27:38, osalvador@techadventures.net wrote:
> > > From: Oscar Salvador <osalvador@suse.de>
> > > 
> > > In free_area_init_core we calculate the amount of managed pages
> > > we are left with, by substracting the memmap pages and the pages
> > > reserved for dma.
> > > With the values left, we also account the total of kernel pages and
> > > the total of pages.
> > > 
> > > Since memmap pages are calculated from zone->spanned_pages,
> > > let us only do these calculcations whenever zone->spanned_pages is greather
> > > than 0.
> > 
> > But why do we care? How do we test this? In other words, why is this
> > worth merging?
>  
> Uhm, unless the values are going to be updated, why do we want to go through all
> comparasions/checks?
> I thought it was a nice thing to have the chance to skip that block unless we are going to
> update the counters.
> 
> Again, if you think this only adds complexity and no good, I can drop it.

Your changelog doesn't really explain the motivation. Does the change
help performance? Is this a pure cleanup?

The function is certainly not an example of beauty. It is more an
example of changes done on top of older ones without much thinking. But
I do not see your change would make it so much better. I would consider
it a much nicer cleanup if it was split into logical units each doing
one specific thing.

Btw. are you sure this change is correct? E.g.
		/*
		 * Set an approximate value for lowmem here, it will be adjusted
		 * when the bootmem allocator frees pages into the buddy system.
		 * And all highmem pages will be managed by the buddy system.
		 */
		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;

expects freesize to be calculated properly and just from quick reading
the code I do not see why skipping other adjustments is ok for size > 0.
Maybe this is OK, I dunno and my brain is already heading few days off
but a real cleanup wouldn't even make me think what the heck is going on
here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 15:15       ` Michal Hocko
@ 2018-07-19 20:52         ` Oscar Salvador
  2018-07-20 10:03           ` Oscar Salvador
  2018-07-23  8:30           ` Michal Hocko
  0 siblings, 2 replies; 23+ messages in thread
From: Oscar Salvador @ 2018-07-19 20:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu, Jul 19, 2018 at 05:15:55PM +0200, Michal Hocko wrote:
> Your changelog doesn't really explain the motivation. Does the change
> help performance? Is this a pure cleanup?

Hi Michal,

Sorry to not have explained this better from the very beginning.

It should help a bit in performance terms as we would be skipping those
condition checks and assignations for zones that do not have any pages.
It is not a huge win, but I think that skipping code we do not really need to run
is worh to have.

> The function is certainly not an example of beauty. It is more an
> example of changes done on top of older ones without much thinking. But
> I do not see your change would make it so much better. I would consider
> it a much nicer cleanup if it was split into logical units each doing
> one specific thing.

About the cleanup, I thought that moving that block of code to a separate function
would make the code easier to follow.
If you think that this is still not enough, I can try to split it and see the outcome.

> Btw. are you sure this change is correct? E.g.
> 		/*
> 		 * Set an approximate value for lowmem here, it will be adjusted
> 		 * when the bootmem allocator frees pages into the buddy system.
> 		 * And all highmem pages will be managed by the buddy system.
> 		 */
> 		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
> 
> expects freesize to be calculated properly and just from quick reading
> the code I do not see why skipping other adjustments is ok for size > 0.
> Maybe this is OK, I dunno and my brain is already heading few days off
> but a real cleanup wouldn't even make me think what the heck is going on
> here.

This changed in commit e69438596bb3e97809e76be315e54a4a444f4797.
Current code does not have "realsize" anymore.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 20:52         ` Oscar Salvador
@ 2018-07-20 10:03           ` Oscar Salvador
  2018-07-23  8:35             ` Michal Hocko
  2018-07-23  8:30           ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2018-07-20 10:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu, Jul 19, 2018 at 10:52:35PM +0200, Oscar Salvador wrote:
> On Thu, Jul 19, 2018 at 05:15:55PM +0200, Michal Hocko wrote:
> > Your changelog doesn't really explain the motivation. Does the change
> > help performance? Is this a pure cleanup?
> 
> Hi Michal,
> 
> Sorry to not have explained this better from the very beginning.
> 
> It should help a bit in performance terms as we would be skipping those
> condition checks and assignations for zones that do not have any pages.
> It is not a huge win, but I think that skipping code we do not really need to run
> is worh to have.
> 
> > The function is certainly not an example of beauty. It is more an
> > example of changes done on top of older ones without much thinking. But
> > I do not see your change would make it so much better. I would consider
> > it a much nicer cleanup if it was split into logical units each doing
> > one specific thing.
> 
> About the cleanup, I thought that moving that block of code to a separate function
> would make the code easier to follow.
> If you think that this is still not enough, I can try to split it and see the outcome.

I tried to split it innto three logical blocks:

- Substract memmap pages
- Substract dma reserves
- Account kernel pages (nr_kernel_pages and nr_total_pages)

Is this something that makes sense to you:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10b754fba5fa..1397dcdd4a3c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6237,6 +6237,47 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat)
 static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
 #endif
 
+static void account_kernel_pages(enum zone_type j, unsigned long freesize,
+						unsigned long memmap_pages)
+{
+	if (!is_highmem_idx(j))
+		nr_kernel_pages += freesize;
+	/* Charge for highmem memmap if there are enough kernel pages */
+	else if (nr_kernel_pages > memmap_pages * 2)
+		 nr_kernel_pages -= memmap_pages;
+	nr_all_pages += freesize;
+}
+
+static unsigned long substract_dma_reserves(unsigned long freesize)
+{
+	/* Account for reserved pages */
+	if (freesize > dma_reserve) {
+		freesize -= dma_reserve;
+		printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
+					zone_names[0], dma_reserve);
+	}
+
+	return freesize;
+}
+
+static unsigned long substract_memmap_pages(unsigned long freesize, unsigned long memmap_pages)
+{
+	/*
+	 * Adjust freesize so that it accounts for how much memory
+	 * is used by this zone for memmap. This affects the watermark
+	 * and per-cpu initialisations
+	 */
+	if (freesize >= memmap_pages) {
+		freesize -= memmap_pages;
+		if (memmap_pages)
+			printk(KERN_DEBUG "  %s zone: %lu pages used for memmap\n",
+							zone_names[j], memmap_pages);
+	} else
+		pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
+				zone_names[j], memmap_pages, freesize);
+	return freesize;
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -6267,44 +6308,20 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
-		unsigned long size, freesize, memmap_pages;
+		unsigned long size = zone->spanned_pages
+		unsigned long freesize = zone->present_pages;
 		unsigned long zone_start_pfn = zone->zone_start_pfn;
 
-		size = zone->spanned_pages;
-		freesize = zone->present_pages;
-
-		/*
-		 * Adjust freesize so that it accounts for how much memory
-		 * is used by this zone for memmap. This affects the watermark
-		 * and per-cpu initialisations
-		 */
-		memmap_pages = calc_memmap_size(size, freesize);
-		if (!is_highmem_idx(j)) {
-			if (freesize >= memmap_pages) {
-				freesize -= memmap_pages;
-				if (memmap_pages)
-					printk(KERN_DEBUG
-					       "  %s zone: %lu pages used for memmap\n",
-					       zone_names[j], memmap_pages);
-			} else
-				pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
-					zone_names[j], memmap_pages, freesize);
-		}
+		if (size) {
+			unsigned long memmap_pages = calc_memmap_size(size, freesize);
+			if (!is_highmem_idx(j))
+				freesize =  substract_memmap_pages(freesize, memmap_pages);
 
-		/* Account for reserved pages */
-		if (j == 0 && freesize > dma_reserve) {
-			freesize -= dma_reserve;
-			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
-					zone_names[0], dma_reserve);
+			if (j == ZONE_DMA)
+				freesize = substract_dma_reserves(freesize);
+			account_kernel_pages(j, freesize, memmap_pages);
 		}
 
-		if (!is_highmem_idx(j))
-			nr_kernel_pages += freesize;
-		/* Charge for highmem memmap if there are enough kernel pages */
-		else if (nr_kernel_pages > memmap_pages * 2)
-			nr_kernel_pages -= memmap_pages;
-		nr_all_pages += freesize;

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-19 20:52         ` Oscar Salvador
  2018-07-20 10:03           ` Oscar Salvador
@ 2018-07-23  8:30           ` Michal Hocko
  1 sibling, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-07-23  8:30 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Thu 19-07-18 22:52:35, Oscar Salvador wrote:
> On Thu, Jul 19, 2018 at 05:15:55PM +0200, Michal Hocko wrote:
> > Your changelog doesn't really explain the motivation. Does the change
> > help performance? Is this a pure cleanup?
> 
> Hi Michal,
> 
> Sorry to not have explained this better from the very beginning.
> 
> It should help a bit in performance terms as we would be skipping those
> condition checks and assignations for zones that do not have any pages.
> It is not a huge win, but I think that skipping code we do not really need to run
> is worh to have.

It is always preferable to have numbers when you are claiming
performance benefits.

> 
> > The function is certainly not an example of beauty. It is more an
> > example of changes done on top of older ones without much thinking. But
> > I do not see your change would make it so much better. I would consider
> > it a much nicer cleanup if it was split into logical units each doing
> > one specific thing.
> 
> About the cleanup, I thought that moving that block of code to a separate function
> would make the code easier to follow.
> If you think that this is still not enough, I can try to split it and see the outcome.

Well, on the other hand, is the change really worth it? Moving code
around is not free either. Any git blame tracking to understand why the
code is done this way will be harder.

But just to make it clear, I do not want to discourage you from touching
this area. I just believe that if you really want to do that then the
result shouldn't just tweak one specific case and tweak it. It would be
much more appreciated if the new code had much better structure and less
hacks.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-20 10:03           ` Oscar Salvador
@ 2018-07-23  8:35             ` Michal Hocko
  2018-07-23 11:42               ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-07-23  8:35 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Fri 20-07-18 12:03:27, Oscar Salvador wrote:
> On Thu, Jul 19, 2018 at 10:52:35PM +0200, Oscar Salvador wrote:
> > On Thu, Jul 19, 2018 at 05:15:55PM +0200, Michal Hocko wrote:
> > > Your changelog doesn't really explain the motivation. Does the change
> > > help performance? Is this a pure cleanup?
> > 
> > Hi Michal,
> > 
> > Sorry to not have explained this better from the very beginning.
> > 
> > It should help a bit in performance terms as we would be skipping those
> > condition checks and assignations for zones that do not have any pages.
> > It is not a huge win, but I think that skipping code we do not really need to run
> > is worh to have.
> > 
> > > The function is certainly not an example of beauty. It is more an
> > > example of changes done on top of older ones without much thinking. But
> > > I do not see your change would make it so much better. I would consider
> > > it a much nicer cleanup if it was split into logical units each doing
> > > one specific thing.
> > 
> > About the cleanup, I thought that moving that block of code to a separate function
> > would make the code easier to follow.
> > If you think that this is still not enough, I can try to split it and see the outcome.
> 
> I tried to split it innto three logical blocks:
> 
> - Substract memmap pages
> - Substract dma reserves
> - Account kernel pages (nr_kernel_pages and nr_total_pages)

No, I do not think this is much better. Why do we need to separate those
functions out? I think you are too focused on the current function
without a broader context. Think about it. We have two code paths.
Early initialization and the hotplug. The two are subtly different in
some aspects. Maybe reusing free_area_init_core is the wrong thing and
we should have a dedicated subset of this function. This would make the
code more clear probably. You wouldn't have to think which part of
free_area_init_core is special and what has to be done if this function
was to be used in a different context. See my point?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core
  2018-07-23  8:35             ` Michal Hocko
@ 2018-07-23 11:42               ` Oscar Salvador
  0 siblings, 0 replies; 23+ messages in thread
From: Oscar Salvador @ 2018-07-23 11:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, pasha.tatashin, vbabka, aaron.lu, iamjoonsoo.kim,
	linux-kernel, linux-mm, Oscar Salvador

On Mon, Jul 23, 2018 at 10:35:19AM +0200, Michal Hocko wrote:
> No, I do not think this is much better. Why do we need to separate those
> functions out? I think you are too focused on the current function
> without a broader context. Think about it. We have two code paths.
> Early initialization and the hotplug. The two are subtly different in
> some aspects. Maybe reusing free_area_init_core is the wrong thing and
> we should have a dedicated subset of this function. This would make the
> code more clear probably. You wouldn't have to think which part of
> free_area_init_core is special and what has to be done if this function
> was to be used in a different context. See my point?

Yes, I see your point now.
I will think about it with a wider approach.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-07-23 11:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 13:27 [PATCH v2 0/5] Refactor free_area_init_node/free_area_init_core osalvador
2018-07-19 13:27 ` [PATCH v2 1/5] mm/page_alloc: Move ifdefery out of free_area_init_core osalvador
2018-07-19 13:27 ` [PATCH v2 2/5] mm: access zone->node via zone_to_nid() and zone_set_nid() osalvador
2018-07-19 13:40   ` Michal Hocko
2018-07-19 13:44     ` Pavel Tatashin
2018-07-19 13:47       ` Michal Hocko
2018-07-19 13:27 ` [PATCH v2 3/5] mm/page_alloc: Optimize free_area_init_core osalvador
2018-07-19 13:44   ` Michal Hocko
2018-07-19 14:03     ` Oscar Salvador
2018-07-19 15:15       ` Michal Hocko
2018-07-19 20:52         ` Oscar Salvador
2018-07-20 10:03           ` Oscar Salvador
2018-07-23  8:35             ` Michal Hocko
2018-07-23 11:42               ` Oscar Salvador
2018-07-23  8:30           ` Michal Hocko
2018-07-19 13:27 ` [PATCH v2 4/5] mm/page_alloc: Inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT osalvador
2018-07-19 13:45   ` Michal Hocko
2018-07-19 13:27 ` [PATCH v2 5/5] mm/page_alloc: Only call pgdat_set_deferred_range when the system boots osalvador
2018-07-19 13:46   ` Michal Hocko
2018-07-19 13:58     ` Oscar Salvador
2018-07-19 14:03       ` Michal Hocko
2018-07-19 14:27         ` Pavel Tatashin
2018-07-19 15:01           ` Oscar Salvador

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.