All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: <linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH 6/9] mm, page_alloc: simplify zonelist initialization
Date: Fri, 14 Jul 2017 10:00:03 +0200	[thread overview]
Message-ID: <20170714080006.7250-7-mhocko@kernel.org> (raw)
In-Reply-To: <20170714080006.7250-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

build_zonelists gradually builds zonelists from the nearest to the most
distant node. As we do not know how many populated zones we will have in
each node we rely on the _zoneref to terminate initialized part of the
zonelist by a NULL zone. While this is functionally correct it is quite
suboptimal because we cannot allow updaters to race with zonelists
users because they could see an empty zonelist and fail the allocation
or hit the OOM killer in the worst case.

We can do much better, though. We can store the node ordering into an
already existing node_order array and then give this array to
build_zonelists_in_node_order and do the whole initialization at once.
zonelists consumers still might see halfway initialized state but that
should be much more tolerateable because the list will not be empty and
they would either see some zone twice or skip over some zone(s) in the
worst case which shouldn't lead to immediate failures.

This patch alone doesn't introduce any functional change yet, though, it
is merely a preparatory work for later changes.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00e117922b3f..78bd62418380 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4913,17 +4913,20 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
  * This results in maximum locality--normal zone overflows into local
  * DMA zone, if any--but risks exhausting DMA zone.
  */
-static void build_zonelists_in_node_order(pg_data_t *pgdat, int node)
+static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order)
 {
-	int j;
 	struct zonelist *zonelist;
+	int i, zoneref_idx = 0;
 
 	zonelist = &pgdat->node_zonelists[ZONELIST_FALLBACK];
-	for (j = 0; zonelist->_zonerefs[j].zone != NULL; j++)
-		;
-	j = build_zonelists_node(NODE_DATA(node), zonelist, j);
-	zonelist->_zonerefs[j].zone = NULL;
-	zonelist->_zonerefs[j].zone_idx = 0;
+
+	for (i = 0; i < MAX_NUMNODES; i++) {
+		pg_data_t *node = NODE_DATA(node_order[i]);
+
+		zoneref_idx = build_zonelists_node(node, zonelist, zoneref_idx);
+	}
+	zonelist->_zonerefs[zoneref_idx].zone = NULL;
+	zonelist->_zonerefs[zoneref_idx].zone_idx = 0;
 }
 
 /*
@@ -4931,13 +4934,13 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node)
  */
 static void build_thisnode_zonelists(pg_data_t *pgdat)
 {
-	int j;
 	struct zonelist *zonelist;
+	int zoneref_idx = 0;
 
 	zonelist = &pgdat->node_zonelists[ZONELIST_NOFALLBACK];
-	j = build_zonelists_node(pgdat, zonelist, 0);
-	zonelist->_zonerefs[j].zone = NULL;
-	zonelist->_zonerefs[j].zone_idx = 0;
+	zoneref_idx = build_zonelists_node(pgdat, zonelist, zoneref_idx);
+	zonelist->_zonerefs[zoneref_idx].zone = NULL;
+	zonelist->_zonerefs[zoneref_idx].zone_idx = 0;
 }
 
 /*
@@ -4946,21 +4949,13 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
  * exhausted, but results in overflowing to remote node while memory
  * may still exist in local DMA zone.
  */
-static int node_order[MAX_NUMNODES];
 
 static void build_zonelists(pg_data_t *pgdat)
 {
-	int i, node, load;
+	static int node_order[MAX_NUMNODES];
+	int node, load, i = 0;
 	nodemask_t used_mask;
 	int local_node, prev_node;
-	struct zonelist *zonelist;
-
-	/* initialize zonelists */
-	for (i = 0; i < MAX_ZONELISTS; i++) {
-		zonelist = pgdat->node_zonelists + i;
-		zonelist->_zonerefs[0].zone = NULL;
-		zonelist->_zonerefs[0].zone_idx = 0;
-	}
 
 	/* NUMA-aware ordering of nodes */
 	local_node = pgdat->node_id;
@@ -4969,8 +4964,6 @@ static void build_zonelists(pg_data_t *pgdat)
 	nodes_clear(used_mask);
 
 	memset(node_order, 0, sizeof(node_order));
-	i = 0;
-
 	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
 		/*
 		 * We don't want to pressure a particular node.
@@ -4981,11 +4974,12 @@ static void build_zonelists(pg_data_t *pgdat)
 		    node_distance(local_node, prev_node))
 			node_load[node] = load;
 
+		node_order[i++] = node;
 		prev_node = node;
 		load--;
-		build_zonelists_in_node_order(pgdat, node);
 	}
 
+	build_zonelists_in_node_order(pgdat, node_order);
 	build_thisnode_zonelists(pgdat);
 }
 
-- 
2.11.0

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH 6/9] mm, page_alloc: simplify zonelist initialization
Date: Fri, 14 Jul 2017 10:00:03 +0200	[thread overview]
Message-ID: <20170714080006.7250-7-mhocko@kernel.org> (raw)
In-Reply-To: <20170714080006.7250-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

build_zonelists gradually builds zonelists from the nearest to the most
distant node. As we do not know how many populated zones we will have in
each node we rely on the _zoneref to terminate initialized part of the
zonelist by a NULL zone. While this is functionally correct it is quite
suboptimal because we cannot allow updaters to race with zonelists
users because they could see an empty zonelist and fail the allocation
or hit the OOM killer in the worst case.

We can do much better, though. We can store the node ordering into an
already existing node_order array and then give this array to
build_zonelists_in_node_order and do the whole initialization at once.
zonelists consumers still might see halfway initialized state but that
should be much more tolerateable because the list will not be empty and
they would either see some zone twice or skip over some zone(s) in the
worst case which shouldn't lead to immediate failures.

This patch alone doesn't introduce any functional change yet, though, it
is merely a preparatory work for later changes.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00e117922b3f..78bd62418380 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4913,17 +4913,20 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
  * This results in maximum locality--normal zone overflows into local
  * DMA zone, if any--but risks exhausting DMA zone.
  */
-static void build_zonelists_in_node_order(pg_data_t *pgdat, int node)
+static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order)
 {
-	int j;
 	struct zonelist *zonelist;
+	int i, zoneref_idx = 0;
 
 	zonelist = &pgdat->node_zonelists[ZONELIST_FALLBACK];
-	for (j = 0; zonelist->_zonerefs[j].zone != NULL; j++)
-		;
-	j = build_zonelists_node(NODE_DATA(node), zonelist, j);
-	zonelist->_zonerefs[j].zone = NULL;
-	zonelist->_zonerefs[j].zone_idx = 0;
+
+	for (i = 0; i < MAX_NUMNODES; i++) {
+		pg_data_t *node = NODE_DATA(node_order[i]);
+
+		zoneref_idx = build_zonelists_node(node, zonelist, zoneref_idx);
+	}
+	zonelist->_zonerefs[zoneref_idx].zone = NULL;
+	zonelist->_zonerefs[zoneref_idx].zone_idx = 0;
 }
 
 /*
@@ -4931,13 +4934,13 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node)
  */
 static void build_thisnode_zonelists(pg_data_t *pgdat)
 {
-	int j;
 	struct zonelist *zonelist;
+	int zoneref_idx = 0;
 
 	zonelist = &pgdat->node_zonelists[ZONELIST_NOFALLBACK];
-	j = build_zonelists_node(pgdat, zonelist, 0);
-	zonelist->_zonerefs[j].zone = NULL;
-	zonelist->_zonerefs[j].zone_idx = 0;
+	zoneref_idx = build_zonelists_node(pgdat, zonelist, zoneref_idx);
+	zonelist->_zonerefs[zoneref_idx].zone = NULL;
+	zonelist->_zonerefs[zoneref_idx].zone_idx = 0;
 }
 
 /*
@@ -4946,21 +4949,13 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
  * exhausted, but results in overflowing to remote node while memory
  * may still exist in local DMA zone.
  */
-static int node_order[MAX_NUMNODES];
 
 static void build_zonelists(pg_data_t *pgdat)
 {
-	int i, node, load;
+	static int node_order[MAX_NUMNODES];
+	int node, load, i = 0;
 	nodemask_t used_mask;
 	int local_node, prev_node;
-	struct zonelist *zonelist;
-
-	/* initialize zonelists */
-	for (i = 0; i < MAX_ZONELISTS; i++) {
-		zonelist = pgdat->node_zonelists + i;
-		zonelist->_zonerefs[0].zone = NULL;
-		zonelist->_zonerefs[0].zone_idx = 0;
-	}
 
 	/* NUMA-aware ordering of nodes */
 	local_node = pgdat->node_id;
@@ -4969,8 +4964,6 @@ static void build_zonelists(pg_data_t *pgdat)
 	nodes_clear(used_mask);
 
 	memset(node_order, 0, sizeof(node_order));
-	i = 0;
-
 	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
 		/*
 		 * We don't want to pressure a particular node.
@@ -4981,11 +4974,12 @@ static void build_zonelists(pg_data_t *pgdat)
 		    node_distance(local_node, prev_node))
 			node_load[node] = load;
 
+		node_order[i++] = node;
 		prev_node = node;
 		load--;
-		build_zonelists_in_node_order(pgdat, node);
 	}
 
+	build_zonelists_in_node_order(pgdat, node_order);
 	build_thisnode_zonelists(pgdat);
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-07-14  8:00 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-14  7:59 [PATCH 0/9] cleanup zonelists initialization Michal Hocko
2017-07-14  7:59 ` Michal Hocko
2017-07-14  7:59 ` Michal Hocko
2017-07-14  7:59 ` [PATCH 1/9] mm, page_alloc: rip out ZONELIST_ORDER_ZONE Michal Hocko
2017-07-14  7:59   ` Michal Hocko
2017-07-14  9:36   ` Mel Gorman
2017-07-14  9:36     ` Mel Gorman
2017-07-14  9:36     ` Mel Gorman
2017-07-14 10:47     ` Michal Hocko
2017-07-14 10:47       ` Michal Hocko
2017-07-14 11:16       ` Mel Gorman
2017-07-14 11:16         ` Mel Gorman
2017-07-14 11:16         ` Mel Gorman
2017-07-14 11:38         ` Michal Hocko
2017-07-14 11:38           ` Michal Hocko
2017-07-14 11:38           ` Michal Hocko
2017-07-14 12:56           ` Mel Gorman
2017-07-14 12:56             ` Mel Gorman
2017-07-14 13:01             ` Mel Gorman
2017-07-14 13:01               ` Mel Gorman
2017-07-14 13:01               ` Mel Gorman
2017-07-14 13:08             ` Michal Hocko
2017-07-14 13:08               ` Michal Hocko
2017-07-19  9:33   ` Vlastimil Babka
2017-07-19  9:33     ` Vlastimil Babka
2017-07-19  9:33     ` Vlastimil Babka
2017-07-19 13:44     ` Michal Hocko
2017-07-19 13:44       ` Michal Hocko
2017-07-19 13:44       ` Michal Hocko
2017-07-14  7:59 ` [PATCH 2/9] mm, page_alloc: remove boot pageset initialization from memory hotplug Michal Hocko
2017-07-14  7:59   ` Michal Hocko
2017-07-14  9:39   ` Mel Gorman
2017-07-14  9:39     ` Mel Gorman
2017-07-19 13:15   ` Vlastimil Babka
2017-07-19 13:15     ` Vlastimil Babka
2017-07-14  8:00 ` [PATCH 3/9] mm, page_alloc: do not set_cpu_numa_mem on empty nodes initialization Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  9:48   ` Mel Gorman
2017-07-14  9:48     ` Mel Gorman
2017-07-14 10:50     ` Michal Hocko
2017-07-14 10:50       ` Michal Hocko
2017-07-14 12:32       ` Mel Gorman
2017-07-14 12:32         ` Mel Gorman
2017-07-14 12:39         ` Michal Hocko
2017-07-14 12:39           ` Michal Hocko
2017-07-14 12:56           ` Mel Gorman
2017-07-14 12:56             ` Mel Gorman
2017-07-19 13:19   ` Vlastimil Babka
2017-07-19 13:19     ` Vlastimil Babka
2017-07-14  8:00 ` [PATCH 4/9] mm, memory_hotplug: drop zone from build_all_zonelists Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-19 13:33   ` Vlastimil Babka
2017-07-19 13:33     ` Vlastimil Babka
2017-07-20  8:15     ` Michal Hocko
2017-07-20  8:15       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 5/9] mm, memory_hotplug: remove explicit build_all_zonelists from try_online_node Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14 12:14   ` Michal Hocko
2017-07-14 12:14     ` Michal Hocko
2017-07-20  6:13   ` Vlastimil Babka
2017-07-20  6:13     ` Vlastimil Babka
2017-07-14  8:00 ` Michal Hocko [this message]
2017-07-14  8:00   ` [PATCH 6/9] mm, page_alloc: simplify zonelist initialization Michal Hocko
2017-07-14  9:55   ` Mel Gorman
2017-07-14  9:55     ` Mel Gorman
2017-07-14 10:51     ` Michal Hocko
2017-07-14 10:51       ` Michal Hocko
2017-07-14 12:46   ` Mel Gorman
2017-07-14 12:46     ` Mel Gorman
2017-07-14 13:02     ` Michal Hocko
2017-07-14 13:02       ` Michal Hocko
2017-07-14 14:18       ` Mel Gorman
2017-07-14 14:18         ` Mel Gorman
2017-07-17  6:06         ` Michal Hocko
2017-07-17  6:06           ` Michal Hocko
2017-07-17  8:07           ` Mel Gorman
2017-07-17  8:07             ` Mel Gorman
2017-07-17  8:19             ` Michal Hocko
2017-07-17  8:19               ` Michal Hocko
2017-07-17  8:58               ` Mel Gorman
2017-07-17  8:58                 ` Mel Gorman
2017-07-17  9:15                 ` Michal Hocko
2017-07-17  9:15                   ` Michal Hocko
2017-07-20  6:55   ` Vlastimil Babka
2017-07-20  6:55     ` Vlastimil Babka
2017-07-20  7:19     ` Michal Hocko
2017-07-20  7:19       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 7/9] mm, page_alloc: remove stop_machine from build_all_zonelists Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  9:59   ` Mel Gorman
2017-07-14  9:59     ` Mel Gorman
2017-07-14 11:00     ` Michal Hocko
2017-07-14 11:00       ` Michal Hocko
2017-07-14 12:47       ` Mel Gorman
2017-07-14 12:47         ` Mel Gorman
2017-07-14 11:29   ` Vlastimil Babka
2017-07-14 11:29     ` Vlastimil Babka
2017-07-14 11:43     ` Michal Hocko
2017-07-14 11:43       ` Michal Hocko
2017-07-14 11:45       ` Michal Hocko
2017-07-14 11:45         ` Michal Hocko
2017-07-20  6:16         ` Vlastimil Babka
2017-07-20  6:16           ` Vlastimil Babka
2017-07-20  7:24   ` Vlastimil Babka
2017-07-20  7:24     ` Vlastimil Babka
2017-07-20  9:21     ` Michal Hocko
2017-07-20  9:21       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 8/9] mm, memory_hotplug: get rid of zonelists_mutex Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  8:00 ` [PATCH 9/9] mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-20  8:04   ` Vlastimil Babka
2017-07-20  8:04     ` Vlastimil Babka
2017-07-21 14:39 [PATCH -v1 0/9] cleanup zonelists initialization Michal Hocko
2017-07-21 14:39 ` [PATCH 6/9] mm, page_alloc: simplify zonelist initialization Michal Hocko
2017-07-21 14:39   ` Michal Hocko
2017-07-24  9:25   ` Vlastimil Babka
2017-07-24  9:25     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170714080006.7250-7-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.