linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/migrate: move node demotion code to near its user
@ 2021-12-06  3:12 Huang Ying
  2021-12-06  5:42 ` Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Huang Ying @ 2021-12-06  3:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Dave Hansen, Yang Shi,
	Zi Yan, Oscar Salvador, Michal Hocko, Wei Xu, David Rientjes,
	Dan Williams, David Hildenbrand, Greg Thelen, Keith Busch,
	Yang Shi, Baolin Wang

Now, node_demotion and next_demtion_node() is placed between
__unmap_and_move() and unmap_and_move().  This hurts the code
readability.  So, move it to near its user in the file.  There's no
any functionality change in this patch.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/migrate.c | 265 +++++++++++++++++++++++++--------------------------
 1 file changed, 132 insertions(+), 133 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index c503ef1f4360..d487a399253b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1083,139 +1083,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	return rc;
 }
 
-
-/*
- * node_demotion[] example:
- *
- * Consider a system with two sockets.  Each socket has
- * three classes of memory attached: fast, medium and slow.
- * Each memory class is placed in its own NUMA node.  The
- * CPUs are placed in the node with the "fast" memory.  The
- * 6 NUMA nodes (0-5) might be split among the sockets like
- * this:
- *
- *	Socket A: 0, 1, 2
- *	Socket B: 3, 4, 5
- *
- * When Node 0 fills up, its memory should be migrated to
- * Node 1.  When Node 1 fills up, it should be migrated to
- * Node 2.  The migration path start on the nodes with the
- * processors (since allocations default to this node) and
- * fast memory, progress through medium and end with the
- * slow memory:
- *
- *	0 -> 1 -> 2 -> stop
- *	3 -> 4 -> 5 -> stop
- *
- * This is represented in the node_demotion[] like this:
- *
- *	{  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
- *	{  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
- *	{  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
- *	{  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
- *	{  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
- *	{  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
- *
- * Moreover some systems may have multiple slow memory nodes.
- * Suppose a system has one socket with 3 memory nodes, node 0
- * is fast memory type, and node 1/2 both are slow memory
- * type, and the distance between fast memory node and slow
- * memory node is same. So the migration path should be:
- *
- *	0 -> 1/2 -> stop
- *
- * This is represented in the node_demotion[] like this:
- *	{ nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
- *	{ nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
- *	{ nr=0, nodes[0]=-1, }, // Node 2 does not migrate
- */
-
-/*
- * Writes to this array occur without locking.  Cycles are
- * not allowed: Node X demotes to Y which demotes to X...
- *
- * If multiple reads are performed, a single rcu_read_lock()
- * must be held over all reads to ensure that no cycles are
- * observed.
- */
-#define DEFAULT_DEMOTION_TARGET_NODES 15
-
-#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
-#define DEMOTION_TARGET_NODES	(MAX_NUMNODES - 1)
-#else
-#define DEMOTION_TARGET_NODES	DEFAULT_DEMOTION_TARGET_NODES
-#endif
-
-struct demotion_nodes {
-	unsigned short nr;
-	short nodes[DEMOTION_TARGET_NODES];
-};
-
-static struct demotion_nodes *node_demotion __read_mostly;
-
-/**
- * next_demotion_node() - Get the next node in the demotion path
- * @node: The starting node to lookup the next node
- *
- * Return: node id for next memory node in the demotion path hierarchy
- * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
- * @node online or guarantee that it *continues* to be the next demotion
- * target.
- */
-int next_demotion_node(int node)
-{
-	struct demotion_nodes *nd;
-	unsigned short target_nr, index;
-	int target;
-
-	if (!node_demotion)
-		return NUMA_NO_NODE;
-
-	nd = &node_demotion[node];
-
-	/*
-	 * node_demotion[] is updated without excluding this
-	 * function from running.  RCU doesn't provide any
-	 * compiler barriers, so the READ_ONCE() is required
-	 * to avoid compiler reordering or read merging.
-	 *
-	 * Make sure to use RCU over entire code blocks if
-	 * node_demotion[] reads need to be consistent.
-	 */
-	rcu_read_lock();
-	target_nr = READ_ONCE(nd->nr);
-
-	switch (target_nr) {
-	case 0:
-		target = NUMA_NO_NODE;
-		goto out;
-	case 1:
-		index = 0;
-		break;
-	default:
-		/*
-		 * If there are multiple target nodes, just select one
-		 * target node randomly.
-		 *
-		 * In addition, we can also use round-robin to select
-		 * target node, but we should introduce another variable
-		 * for node_demotion[] to record last selected target node,
-		 * that may cause cache ping-pong due to the changing of
-		 * last target node. Or introducing per-cpu data to avoid
-		 * caching issue, which seems more complicated. So selecting
-		 * target node randomly seems better until now.
-		 */
-		index = get_random_int() % target_nr;
-		break;
-	}
-
-	target = READ_ONCE(nd->nodes[index]);
-
-out:
-	rcu_read_unlock();
-	return target;
-}
-
 /*
  * Obtain the lock on page, remove all ptes and migrate the page
  * to the newly allocated page in newpage.
@@ -3035,6 +2902,138 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
 EXPORT_SYMBOL(migrate_vma_finalize);
 #endif /* CONFIG_DEVICE_PRIVATE */
 
+/*
+ * node_demotion[] example:
+ *
+ * Consider a system with two sockets.  Each socket has
+ * three classes of memory attached: fast, medium and slow.
+ * Each memory class is placed in its own NUMA node.  The
+ * CPUs are placed in the node with the "fast" memory.  The
+ * 6 NUMA nodes (0-5) might be split among the sockets like
+ * this:
+ *
+ *	Socket A: 0, 1, 2
+ *	Socket B: 3, 4, 5
+ *
+ * When Node 0 fills up, its memory should be migrated to
+ * Node 1.  When Node 1 fills up, it should be migrated to
+ * Node 2.  The migration path start on the nodes with the
+ * processors (since allocations default to this node) and
+ * fast memory, progress through medium and end with the
+ * slow memory:
+ *
+ *	0 -> 1 -> 2 -> stop
+ *	3 -> 4 -> 5 -> stop
+ *
+ * This is represented in the node_demotion[] like this:
+ *
+ *	{  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
+ *	{  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
+ *	{  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
+ *	{  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
+ *	{  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
+ *	{  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
+ *
+ * Moreover some systems may have multiple slow memory nodes.
+ * Suppose a system has one socket with 3 memory nodes, node 0
+ * is fast memory type, and node 1/2 both are slow memory
+ * type, and the distance between fast memory node and slow
+ * memory node is same. So the migration path should be:
+ *
+ *	0 -> 1/2 -> stop
+ *
+ * This is represented in the node_demotion[] like this:
+ *	{ nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
+ *	{ nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
+ *	{ nr=0, nodes[0]=-1, }, // Node 2 does not migrate
+ */
+
+/*
+ * Writes to this array occur without locking.  Cycles are
+ * not allowed: Node X demotes to Y which demotes to X...
+ *
+ * If multiple reads are performed, a single rcu_read_lock()
+ * must be held over all reads to ensure that no cycles are
+ * observed.
+ */
+#define DEFAULT_DEMOTION_TARGET_NODES 15
+
+#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
+#define DEMOTION_TARGET_NODES	(MAX_NUMNODES - 1)
+#else
+#define DEMOTION_TARGET_NODES	DEFAULT_DEMOTION_TARGET_NODES
+#endif
+
+struct demotion_nodes {
+	unsigned short nr;
+	short nodes[DEMOTION_TARGET_NODES];
+};
+
+static struct demotion_nodes *node_demotion __read_mostly;
+
+/**
+ * next_demotion_node() - Get the next node in the demotion path
+ * @node: The starting node to lookup the next node
+ *
+ * Return: node id for next memory node in the demotion path hierarchy
+ * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
+ * @node online or guarantee that it *continues* to be the next demotion
+ * target.
+ */
+int next_demotion_node(int node)
+{
+	struct demotion_nodes *nd;
+	unsigned short target_nr, index;
+	int target;
+
+	if (!node_demotion)
+		return NUMA_NO_NODE;
+
+	nd = &node_demotion[node];
+
+	/*
+	 * node_demotion[] is updated without excluding this
+	 * function from running.  RCU doesn't provide any
+	 * compiler barriers, so the READ_ONCE() is required
+	 * to avoid compiler reordering or read merging.
+	 *
+	 * Make sure to use RCU over entire code blocks if
+	 * node_demotion[] reads need to be consistent.
+	 */
+	rcu_read_lock();
+	target_nr = READ_ONCE(nd->nr);
+
+	switch (target_nr) {
+	case 0:
+		target = NUMA_NO_NODE;
+		goto out;
+	case 1:
+		index = 0;
+		break;
+	default:
+		/*
+		 * If there are multiple target nodes, just select one
+		 * target node randomly.
+		 *
+		 * In addition, we can also use round-robin to select
+		 * target node, but we should introduce another variable
+		 * for node_demotion[] to record last selected target node,
+		 * that may cause cache ping-pong due to the changing of
+		 * last target node. Or introducing per-cpu data to avoid
+		 * caching issue, which seems more complicated. So selecting
+		 * target node randomly seems better until now.
+		 */
+		index = get_random_int() % target_nr;
+		break;
+	}
+
+	target = READ_ONCE(nd->nodes[index]);
+
+out:
+	rcu_read_unlock();
+	return target;
+}
+
 #if defined(CONFIG_HOTPLUG_CPU)
 /* Disable reclaim-based migration. */
 static void __disable_all_migrate_targets(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/migrate: move node demotion code to near its user
  2021-12-06  3:12 [PATCH] mm/migrate: move node demotion code to near its user Huang Ying
@ 2021-12-06  5:42 ` Baolin Wang
  2021-12-06 18:43 ` Yang Shi
  2021-12-06 22:12 ` Wei Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Baolin Wang @ 2021-12-06  5:42 UTC (permalink / raw)
  To: Huang Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Dave Hansen, Yang Shi, Zi Yan,
	Oscar Salvador, Michal Hocko, Wei Xu, David Rientjes,
	Dan Williams, David Hildenbrand, Greg Thelen, Keith Busch,
	Yang Shi



On 2021/12/6 11:12, Huang Ying wrote:
> Now, node_demotion and next_demtion_node() is placed between
> __unmap_and_move() and unmap_and_move().  This hurts the code
> readability.  So, move it to near its user in the file.  There's no
> any functionality change in this patch.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Xu <weixugc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/migrate.c | 265 +++++++++++++++++++++++++--------------------------
>   1 file changed, 132 insertions(+), 133 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c503ef1f4360..d487a399253b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1083,139 +1083,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>   	return rc;
>   }
>   
> -
> -/*
> - * node_demotion[] example:
> - *
> - * Consider a system with two sockets.  Each socket has
> - * three classes of memory attached: fast, medium and slow.
> - * Each memory class is placed in its own NUMA node.  The
> - * CPUs are placed in the node with the "fast" memory.  The
> - * 6 NUMA nodes (0-5) might be split among the sockets like
> - * this:
> - *
> - *	Socket A: 0, 1, 2
> - *	Socket B: 3, 4, 5
> - *
> - * When Node 0 fills up, its memory should be migrated to
> - * Node 1.  When Node 1 fills up, it should be migrated to
> - * Node 2.  The migration path start on the nodes with the
> - * processors (since allocations default to this node) and
> - * fast memory, progress through medium and end with the
> - * slow memory:
> - *
> - *	0 -> 1 -> 2 -> stop
> - *	3 -> 4 -> 5 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *
> - *	{  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> - *	{  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> - *	{  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> - *	{  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> - *	{  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> - *	{  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> - *
> - * Moreover some systems may have multiple slow memory nodes.
> - * Suppose a system has one socket with 3 memory nodes, node 0
> - * is fast memory type, and node 1/2 both are slow memory
> - * type, and the distance between fast memory node and slow
> - * memory node is same. So the migration path should be:
> - *
> - *	0 -> 1/2 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *	{ nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> - *	{ nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> - *	{ nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> - */
> -
> -/*
> - * Writes to this array occur without locking.  Cycles are
> - * not allowed: Node X demotes to Y which demotes to X...
> - *
> - * If multiple reads are performed, a single rcu_read_lock()
> - * must be held over all reads to ensure that no cycles are
> - * observed.
> - */
> -#define DEFAULT_DEMOTION_TARGET_NODES 15
> -
> -#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> -#define DEMOTION_TARGET_NODES	(MAX_NUMNODES - 1)
> -#else
> -#define DEMOTION_TARGET_NODES	DEFAULT_DEMOTION_TARGET_NODES
> -#endif
> -
> -struct demotion_nodes {
> -	unsigned short nr;
> -	short nodes[DEMOTION_TARGET_NODES];
> -};
> -
> -static struct demotion_nodes *node_demotion __read_mostly;
> -
> -/**
> - * next_demotion_node() - Get the next node in the demotion path
> - * @node: The starting node to lookup the next node
> - *
> - * Return: node id for next memory node in the demotion path hierarchy
> - * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> - * @node online or guarantee that it *continues* to be the next demotion
> - * target.
> - */
> -int next_demotion_node(int node)
> -{
> -	struct demotion_nodes *nd;
> -	unsigned short target_nr, index;
> -	int target;
> -
> -	if (!node_demotion)
> -		return NUMA_NO_NODE;
> -
> -	nd = &node_demotion[node];
> -
> -	/*
> -	 * node_demotion[] is updated without excluding this
> -	 * function from running.  RCU doesn't provide any
> -	 * compiler barriers, so the READ_ONCE() is required
> -	 * to avoid compiler reordering or read merging.
> -	 *
> -	 * Make sure to use RCU over entire code blocks if
> -	 * node_demotion[] reads need to be consistent.
> -	 */
> -	rcu_read_lock();
> -	target_nr = READ_ONCE(nd->nr);
> -
> -	switch (target_nr) {
> -	case 0:
> -		target = NUMA_NO_NODE;
> -		goto out;
> -	case 1:
> -		index = 0;
> -		break;
> -	default:
> -		/*
> -		 * If there are multiple target nodes, just select one
> -		 * target node randomly.
> -		 *
> -		 * In addition, we can also use round-robin to select
> -		 * target node, but we should introduce another variable
> -		 * for node_demotion[] to record last selected target node,
> -		 * that may cause cache ping-pong due to the changing of
> -		 * last target node. Or introducing per-cpu data to avoid
> -		 * caching issue, which seems more complicated. So selecting
> -		 * target node randomly seems better until now.
> -		 */
> -		index = get_random_int() % target_nr;
> -		break;
> -	}
> -
> -	target = READ_ONCE(nd->nodes[index]);
> -
> -out:
> -	rcu_read_unlock();
> -	return target;
> -}
> -
>   /*
>    * Obtain the lock on page, remove all ptes and migrate the page
>    * to the newly allocated page in newpage.
> @@ -3035,6 +2902,138 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
>   EXPORT_SYMBOL(migrate_vma_finalize);
>   #endif /* CONFIG_DEVICE_PRIVATE */
>   
> +/*
> + * node_demotion[] example:
> + *
> + * Consider a system with two sockets.  Each socket has
> + * three classes of memory attached: fast, medium and slow.
> + * Each memory class is placed in its own NUMA node.  The
> + * CPUs are placed in the node with the "fast" memory.  The
> + * 6 NUMA nodes (0-5) might be split among the sockets like
> + * this:
> + *
> + *	Socket A: 0, 1, 2
> + *	Socket B: 3, 4, 5
> + *
> + * When Node 0 fills up, its memory should be migrated to
> + * Node 1.  When Node 1 fills up, it should be migrated to
> + * Node 2.  The migration path start on the nodes with the
> + * processors (since allocations default to this node) and
> + * fast memory, progress through medium and end with the
> + * slow memory:
> + *
> + *	0 -> 1 -> 2 -> stop
> + *	3 -> 4 -> 5 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *
> + *	{  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> + *	{  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> + *	{  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> + *	{  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> + *	{  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> + *	{  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> + *
> + * Moreover some systems may have multiple slow memory nodes.
> + * Suppose a system has one socket with 3 memory nodes, node 0
> + * is fast memory type, and node 1/2 both are slow memory
> + * type, and the distance between fast memory node and slow
> + * memory node is same. So the migration path should be:
> + *
> + *	0 -> 1/2 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *	{ nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> + *	{ nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> + *	{ nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> + */
> +
> +/*
> + * Writes to this array occur without locking.  Cycles are
> + * not allowed: Node X demotes to Y which demotes to X...
> + *
> + * If multiple reads are performed, a single rcu_read_lock()
> + * must be held over all reads to ensure that no cycles are
> + * observed.
> + */
> +#define DEFAULT_DEMOTION_TARGET_NODES 15
> +
> +#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> +#define DEMOTION_TARGET_NODES	(MAX_NUMNODES - 1)
> +#else
> +#define DEMOTION_TARGET_NODES	DEFAULT_DEMOTION_TARGET_NODES
> +#endif
> +
> +struct demotion_nodes {
> +	unsigned short nr;
> +	short nodes[DEMOTION_TARGET_NODES];
> +};
> +
> +static struct demotion_nodes *node_demotion __read_mostly;
> +
> +/**
> + * next_demotion_node() - Get the next node in the demotion path
> + * @node: The starting node to lookup the next node
> + *
> + * Return: node id for next memory node in the demotion path hierarchy
> + * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> + * @node online or guarantee that it *continues* to be the next demotion
> + * target.
> + */
> +int next_demotion_node(int node)
> +{
> +	struct demotion_nodes *nd;
> +	unsigned short target_nr, index;
> +	int target;
> +
> +	if (!node_demotion)
> +		return NUMA_NO_NODE;
> +
> +	nd = &node_demotion[node];
> +
> +	/*
> +	 * node_demotion[] is updated without excluding this
> +	 * function from running.  RCU doesn't provide any
> +	 * compiler barriers, so the READ_ONCE() is required
> +	 * to avoid compiler reordering or read merging.
> +	 *
> +	 * Make sure to use RCU over entire code blocks if
> +	 * node_demotion[] reads need to be consistent.
> +	 */
> +	rcu_read_lock();
> +	target_nr = READ_ONCE(nd->nr);
> +
> +	switch (target_nr) {
> +	case 0:
> +		target = NUMA_NO_NODE;
> +		goto out;
> +	case 1:
> +		index = 0;
> +		break;
> +	default:
> +		/*
> +		 * If there are multiple target nodes, just select one
> +		 * target node randomly.
> +		 *
> +		 * In addition, we can also use round-robin to select
> +		 * target node, but we should introduce another variable
> +		 * for node_demotion[] to record last selected target node,
> +		 * that may cause cache ping-pong due to the changing of
> +		 * last target node. Or introducing per-cpu data to avoid
> +		 * caching issue, which seems more complicated. So selecting
> +		 * target node randomly seems better until now.
> +		 */
> +		index = get_random_int() % target_nr;
> +		break;
> +	}
> +
> +	target = READ_ONCE(nd->nodes[index]);
> +
> +out:
> +	rcu_read_unlock();
> +	return target;
> +}
> +
>   #if defined(CONFIG_HOTPLUG_CPU)
>   /* Disable reclaim-based migration. */
>   static void __disable_all_migrate_targets(void)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/migrate: move node demotion code to near its user
  2021-12-06  3:12 [PATCH] mm/migrate: move node demotion code to near its user Huang Ying
  2021-12-06  5:42 ` Baolin Wang
@ 2021-12-06 18:43 ` Yang Shi
  2021-12-06 22:12 ` Wei Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Yang Shi @ 2021-12-06 18:43 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, Linux MM, Linux Kernel Mailing List, Dave Hansen,
	Zi Yan, Oscar Salvador, Michal Hocko, Wei Xu, David Rientjes,
	Dan Williams, David Hildenbrand, Greg Thelen, Keith Busch,
	Yang Shi, Baolin Wang

On Sun, Dec 5, 2021 at 7:12 PM Huang Ying <ying.huang@intel.com> wrote:
>
> Now, node_demotion and next_demtion_node() is placed between
> __unmap_and_move() and unmap_and_move().  This hurts the code
> readability.  So, move it to near its user in the file.  There's no
> any functionality change in this patch.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Xu <weixugc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/migrate.c | 265 +++++++++++++++++++++++++--------------------------
>  1 file changed, 132 insertions(+), 133 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c503ef1f4360..d487a399253b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1083,139 +1083,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>         return rc;
>  }
>
> -
> -/*
> - * node_demotion[] example:
> - *
> - * Consider a system with two sockets.  Each socket has
> - * three classes of memory attached: fast, medium and slow.
> - * Each memory class is placed in its own NUMA node.  The
> - * CPUs are placed in the node with the "fast" memory.  The
> - * 6 NUMA nodes (0-5) might be split among the sockets like
> - * this:
> - *
> - *     Socket A: 0, 1, 2
> - *     Socket B: 3, 4, 5
> - *
> - * When Node 0 fills up, its memory should be migrated to
> - * Node 1.  When Node 1 fills up, it should be migrated to
> - * Node 2.  The migration path start on the nodes with the
> - * processors (since allocations default to this node) and
> - * fast memory, progress through medium and end with the
> - * slow memory:
> - *
> - *     0 -> 1 -> 2 -> stop
> - *     3 -> 4 -> 5 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *
> - *     {  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> - *     {  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> - *     {  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> - *     {  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> - *     {  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> - *     {  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> - *
> - * Moreover some systems may have multiple slow memory nodes.
> - * Suppose a system has one socket with 3 memory nodes, node 0
> - * is fast memory type, and node 1/2 both are slow memory
> - * type, and the distance between fast memory node and slow
> - * memory node is same. So the migration path should be:
> - *
> - *     0 -> 1/2 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *     { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> - *     { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> - *     { nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> - */
> -
> -/*
> - * Writes to this array occur without locking.  Cycles are
> - * not allowed: Node X demotes to Y which demotes to X...
> - *
> - * If multiple reads are performed, a single rcu_read_lock()
> - * must be held over all reads to ensure that no cycles are
> - * observed.
> - */
> -#define DEFAULT_DEMOTION_TARGET_NODES 15
> -
> -#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> -#define DEMOTION_TARGET_NODES  (MAX_NUMNODES - 1)
> -#else
> -#define DEMOTION_TARGET_NODES  DEFAULT_DEMOTION_TARGET_NODES
> -#endif
> -
> -struct demotion_nodes {
> -       unsigned short nr;
> -       short nodes[DEMOTION_TARGET_NODES];
> -};
> -
> -static struct demotion_nodes *node_demotion __read_mostly;
> -
> -/**
> - * next_demotion_node() - Get the next node in the demotion path
> - * @node: The starting node to lookup the next node
> - *
> - * Return: node id for next memory node in the demotion path hierarchy
> - * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> - * @node online or guarantee that it *continues* to be the next demotion
> - * target.
> - */
> -int next_demotion_node(int node)
> -{
> -       struct demotion_nodes *nd;
> -       unsigned short target_nr, index;
> -       int target;
> -
> -       if (!node_demotion)
> -               return NUMA_NO_NODE;
> -
> -       nd = &node_demotion[node];
> -
> -       /*
> -        * node_demotion[] is updated without excluding this
> -        * function from running.  RCU doesn't provide any
> -        * compiler barriers, so the READ_ONCE() is required
> -        * to avoid compiler reordering or read merging.
> -        *
> -        * Make sure to use RCU over entire code blocks if
> -        * node_demotion[] reads need to be consistent.
> -        */
> -       rcu_read_lock();
> -       target_nr = READ_ONCE(nd->nr);
> -
> -       switch (target_nr) {
> -       case 0:
> -               target = NUMA_NO_NODE;
> -               goto out;
> -       case 1:
> -               index = 0;
> -               break;
> -       default:
> -               /*
> -                * If there are multiple target nodes, just select one
> -                * target node randomly.
> -                *
> -                * In addition, we can also use round-robin to select
> -                * target node, but we should introduce another variable
> -                * for node_demotion[] to record last selected target node,
> -                * that may cause cache ping-pong due to the changing of
> -                * last target node. Or introducing per-cpu data to avoid
> -                * caching issue, which seems more complicated. So selecting
> -                * target node randomly seems better until now.
> -                */
> -               index = get_random_int() % target_nr;
> -               break;
> -       }
> -
> -       target = READ_ONCE(nd->nodes[index]);
> -
> -out:
> -       rcu_read_unlock();
> -       return target;
> -}
> -
>  /*
>   * Obtain the lock on page, remove all ptes and migrate the page
>   * to the newly allocated page in newpage.
> @@ -3035,6 +2902,138 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
>  EXPORT_SYMBOL(migrate_vma_finalize);
>  #endif /* CONFIG_DEVICE_PRIVATE */
>
> +/*
> + * node_demotion[] example:
> + *
> + * Consider a system with two sockets.  Each socket has
> + * three classes of memory attached: fast, medium and slow.
> + * Each memory class is placed in its own NUMA node.  The
> + * CPUs are placed in the node with the "fast" memory.  The
> + * 6 NUMA nodes (0-5) might be split among the sockets like
> + * this:
> + *
> + *     Socket A: 0, 1, 2
> + *     Socket B: 3, 4, 5
> + *
> + * When Node 0 fills up, its memory should be migrated to
> + * Node 1.  When Node 1 fills up, it should be migrated to
> + * Node 2.  The migration path start on the nodes with the
> + * processors (since allocations default to this node) and
> + * fast memory, progress through medium and end with the
> + * slow memory:
> + *
> + *     0 -> 1 -> 2 -> stop
> + *     3 -> 4 -> 5 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *
> + *     {  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> + *     {  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> + *     {  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> + *     {  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> + *     {  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> + *     {  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> + *
> + * Moreover some systems may have multiple slow memory nodes.
> + * Suppose a system has one socket with 3 memory nodes, node 0
> + * is fast memory type, and node 1/2 both are slow memory
> + * type, and the distance between fast memory node and slow
> + * memory node is same. So the migration path should be:
> + *
> + *     0 -> 1/2 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *     { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> + *     { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> + *     { nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> + */
> +
> +/*
> + * Writes to this array occur without locking.  Cycles are
> + * not allowed: Node X demotes to Y which demotes to X...
> + *
> + * If multiple reads are performed, a single rcu_read_lock()
> + * must be held over all reads to ensure that no cycles are
> + * observed.
> + */
> +#define DEFAULT_DEMOTION_TARGET_NODES 15
> +
> +#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> +#define DEMOTION_TARGET_NODES  (MAX_NUMNODES - 1)
> +#else
> +#define DEMOTION_TARGET_NODES  DEFAULT_DEMOTION_TARGET_NODES
> +#endif
> +
> +struct demotion_nodes {
> +       unsigned short nr;
> +       short nodes[DEMOTION_TARGET_NODES];
> +};
> +
> +static struct demotion_nodes *node_demotion __read_mostly;
> +
> +/**
> + * next_demotion_node() - Get the next node in the demotion path
> + * @node: The starting node to lookup the next node
> + *
> + * Return: node id for next memory node in the demotion path hierarchy
> + * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> + * @node online or guarantee that it *continues* to be the next demotion
> + * target.
> + */
> +int next_demotion_node(int node)
> +{
> +       struct demotion_nodes *nd;
> +       unsigned short target_nr, index;
> +       int target;
> +
> +       if (!node_demotion)
> +               return NUMA_NO_NODE;
> +
> +       nd = &node_demotion[node];
> +
> +       /*
> +        * node_demotion[] is updated without excluding this
> +        * function from running.  RCU doesn't provide any
> +        * compiler barriers, so the READ_ONCE() is required
> +        * to avoid compiler reordering or read merging.
> +        *
> +        * Make sure to use RCU over entire code blocks if
> +        * node_demotion[] reads need to be consistent.
> +        */
> +       rcu_read_lock();
> +       target_nr = READ_ONCE(nd->nr);
> +
> +       switch (target_nr) {
> +       case 0:
> +               target = NUMA_NO_NODE;
> +               goto out;
> +       case 1:
> +               index = 0;
> +               break;
> +       default:
> +               /*
> +                * If there are multiple target nodes, just select one
> +                * target node randomly.
> +                *
> +                * In addition, we can also use round-robin to select
> +                * target node, but we should introduce another variable
> +                * for node_demotion[] to record last selected target node,
> +                * that may cause cache ping-pong due to the changing of
> +                * last target node. Or introducing per-cpu data to avoid
> +                * caching issue, which seems more complicated. So selecting
> +                * target node randomly seems better until now.
> +                */
> +               index = get_random_int() % target_nr;
> +               break;
> +       }
> +
> +       target = READ_ONCE(nd->nodes[index]);
> +
> +out:
> +       rcu_read_unlock();
> +       return target;
> +}
> +
>  #if defined(CONFIG_HOTPLUG_CPU)
>  /* Disable reclaim-based migration. */
>  static void __disable_all_migrate_targets(void)
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/migrate: move node demotion code to near its user
  2021-12-06  3:12 [PATCH] mm/migrate: move node demotion code to near its user Huang Ying
  2021-12-06  5:42 ` Baolin Wang
  2021-12-06 18:43 ` Yang Shi
@ 2021-12-06 22:12 ` Wei Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Wei Xu @ 2021-12-06 22:12 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Dave Hansen, Yang Shi,
	Zi Yan, Oscar Salvador, Michal Hocko, David Rientjes,
	Dan Williams, David Hildenbrand, Greg Thelen, Keith Busch,
	Yang Shi, Baolin Wang

On Sun, Dec 5, 2021 at 7:12 PM Huang Ying <ying.huang@intel.com> wrote:
>
> Now, node_demotion and next_demtion_node() is placed between
> __unmap_and_move() and unmap_and_move().  This hurts the code
> readability.  So, move it to near its user in the file.  There's no
> any functionality change in this patch.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Xu <weixugc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>

Reviewed-by: Wei Xu <weixugc@google.com>

> ---
>  mm/migrate.c | 265 +++++++++++++++++++++++++--------------------------
>  1 file changed, 132 insertions(+), 133 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c503ef1f4360..d487a399253b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1083,139 +1083,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>         return rc;
>  }
>
> -
> -/*
> - * node_demotion[] example:
> - *
> - * Consider a system with two sockets.  Each socket has
> - * three classes of memory attached: fast, medium and slow.
> - * Each memory class is placed in its own NUMA node.  The
> - * CPUs are placed in the node with the "fast" memory.  The
> - * 6 NUMA nodes (0-5) might be split among the sockets like
> - * this:
> - *
> - *     Socket A: 0, 1, 2
> - *     Socket B: 3, 4, 5
> - *
> - * When Node 0 fills up, its memory should be migrated to
> - * Node 1.  When Node 1 fills up, it should be migrated to
> - * Node 2.  The migration path start on the nodes with the
> - * processors (since allocations default to this node) and
> - * fast memory, progress through medium and end with the
> - * slow memory:
> - *
> - *     0 -> 1 -> 2 -> stop
> - *     3 -> 4 -> 5 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *
> - *     {  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> - *     {  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> - *     {  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> - *     {  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> - *     {  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> - *     {  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> - *
> - * Moreover some systems may have multiple slow memory nodes.
> - * Suppose a system has one socket with 3 memory nodes, node 0
> - * is fast memory type, and node 1/2 both are slow memory
> - * type, and the distance between fast memory node and slow
> - * memory node is same. So the migration path should be:
> - *
> - *     0 -> 1/2 -> stop
> - *
> - * This is represented in the node_demotion[] like this:
> - *     { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> - *     { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> - *     { nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> - */
> -
> -/*
> - * Writes to this array occur without locking.  Cycles are
> - * not allowed: Node X demotes to Y which demotes to X...
> - *
> - * If multiple reads are performed, a single rcu_read_lock()
> - * must be held over all reads to ensure that no cycles are
> - * observed.
> - */
> -#define DEFAULT_DEMOTION_TARGET_NODES 15
> -
> -#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> -#define DEMOTION_TARGET_NODES  (MAX_NUMNODES - 1)
> -#else
> -#define DEMOTION_TARGET_NODES  DEFAULT_DEMOTION_TARGET_NODES
> -#endif
> -
> -struct demotion_nodes {
> -       unsigned short nr;
> -       short nodes[DEMOTION_TARGET_NODES];
> -};
> -
> -static struct demotion_nodes *node_demotion __read_mostly;
> -
> -/**
> - * next_demotion_node() - Get the next node in the demotion path
> - * @node: The starting node to lookup the next node
> - *
> - * Return: node id for next memory node in the demotion path hierarchy
> - * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> - * @node online or guarantee that it *continues* to be the next demotion
> - * target.
> - */
> -int next_demotion_node(int node)
> -{
> -       struct demotion_nodes *nd;
> -       unsigned short target_nr, index;
> -       int target;
> -
> -       if (!node_demotion)
> -               return NUMA_NO_NODE;
> -
> -       nd = &node_demotion[node];
> -
> -       /*
> -        * node_demotion[] is updated without excluding this
> -        * function from running.  RCU doesn't provide any
> -        * compiler barriers, so the READ_ONCE() is required
> -        * to avoid compiler reordering or read merging.
> -        *
> -        * Make sure to use RCU over entire code blocks if
> -        * node_demotion[] reads need to be consistent.
> -        */
> -       rcu_read_lock();
> -       target_nr = READ_ONCE(nd->nr);
> -
> -       switch (target_nr) {
> -       case 0:
> -               target = NUMA_NO_NODE;
> -               goto out;
> -       case 1:
> -               index = 0;
> -               break;
> -       default:
> -               /*
> -                * If there are multiple target nodes, just select one
> -                * target node randomly.
> -                *
> -                * In addition, we can also use round-robin to select
> -                * target node, but we should introduce another variable
> -                * for node_demotion[] to record last selected target node,
> -                * that may cause cache ping-pong due to the changing of
> -                * last target node. Or introducing per-cpu data to avoid
> -                * caching issue, which seems more complicated. So selecting
> -                * target node randomly seems better until now.
> -                */
> -               index = get_random_int() % target_nr;
> -               break;
> -       }
> -
> -       target = READ_ONCE(nd->nodes[index]);
> -
> -out:
> -       rcu_read_unlock();
> -       return target;
> -}
> -
>  /*
>   * Obtain the lock on page, remove all ptes and migrate the page
>   * to the newly allocated page in newpage.
> @@ -3035,6 +2902,138 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
>  EXPORT_SYMBOL(migrate_vma_finalize);
>  #endif /* CONFIG_DEVICE_PRIVATE */
>
> +/*
> + * node_demotion[] example:
> + *
> + * Consider a system with two sockets.  Each socket has
> + * three classes of memory attached: fast, medium and slow.
> + * Each memory class is placed in its own NUMA node.  The
> + * CPUs are placed in the node with the "fast" memory.  The
> + * 6 NUMA nodes (0-5) might be split among the sockets like
> + * this:
> + *
> + *     Socket A: 0, 1, 2
> + *     Socket B: 3, 4, 5
> + *
> + * When Node 0 fills up, its memory should be migrated to
> + * Node 1.  When Node 1 fills up, it should be migrated to
> + * Node 2.  The migration path start on the nodes with the
> + * processors (since allocations default to this node) and
> + * fast memory, progress through medium and end with the
> + * slow memory:
> + *
> + *     0 -> 1 -> 2 -> stop
> + *     3 -> 4 -> 5 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *
> + *     {  nr=1, nodes[0]=1 }, // Node 0 migrates to 1
> + *     {  nr=1, nodes[0]=2 }, // Node 1 migrates to 2
> + *     {  nr=0, nodes[0]=-1 }, // Node 2 does not migrate
> + *     {  nr=1, nodes[0]=4 }, // Node 3 migrates to 4
> + *     {  nr=1, nodes[0]=5 }, // Node 4 migrates to 5
> + *     {  nr=0, nodes[0]=-1 }, // Node 5 does not migrate
> + *
> + * Moreover some systems may have multiple slow memory nodes.
> + * Suppose a system has one socket with 3 memory nodes, node 0
> + * is fast memory type, and node 1/2 both are slow memory
> + * type, and the distance between fast memory node and slow
> + * memory node is same. So the migration path should be:
> + *
> + *     0 -> 1/2 -> stop
> + *
> + * This is represented in the node_demotion[] like this:
> + *     { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2
> + *     { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate
> + *     { nr=0, nodes[0]=-1, }, // Node 2 does not migrate
> + */
> +
> +/*
> + * Writes to this array occur without locking.  Cycles are
> + * not allowed: Node X demotes to Y which demotes to X...
> + *
> + * If multiple reads are performed, a single rcu_read_lock()
> + * must be held over all reads to ensure that no cycles are
> + * observed.
> + */
> +#define DEFAULT_DEMOTION_TARGET_NODES 15
> +
> +#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES
> +#define DEMOTION_TARGET_NODES  (MAX_NUMNODES - 1)
> +#else
> +#define DEMOTION_TARGET_NODES  DEFAULT_DEMOTION_TARGET_NODES
> +#endif
> +
> +struct demotion_nodes {
> +       unsigned short nr;
> +       short nodes[DEMOTION_TARGET_NODES];
> +};
> +
> +static struct demotion_nodes *node_demotion __read_mostly;
> +
> +/**
> + * next_demotion_node() - Get the next node in the demotion path
> + * @node: The starting node to lookup the next node
> + *
> + * Return: node id for next memory node in the demotion path hierarchy
> + * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
> + * @node online or guarantee that it *continues* to be the next demotion
> + * target.
> + */
> +int next_demotion_node(int node)
> +{
> +       struct demotion_nodes *nd;
> +       unsigned short target_nr, index;
> +       int target;
> +
> +       if (!node_demotion)
> +               return NUMA_NO_NODE;
> +
> +       nd = &node_demotion[node];
> +
> +       /*
> +        * node_demotion[] is updated without excluding this
> +        * function from running.  RCU doesn't provide any
> +        * compiler barriers, so the READ_ONCE() is required
> +        * to avoid compiler reordering or read merging.
> +        *
> +        * Make sure to use RCU over entire code blocks if
> +        * node_demotion[] reads need to be consistent.
> +        */
> +       rcu_read_lock();
> +       target_nr = READ_ONCE(nd->nr);
> +
> +       switch (target_nr) {
> +       case 0:
> +               target = NUMA_NO_NODE;
> +               goto out;
> +       case 1:
> +               index = 0;
> +               break;
> +       default:
> +               /*
> +                * If there are multiple target nodes, just select one
> +                * target node randomly.
> +                *
> +                * In addition, we can also use round-robin to select
> +                * target node, but we should introduce another variable
> +                * for node_demotion[] to record last selected target node,
> +                * that may cause cache ping-pong due to the changing of
> +                * last target node. Or introducing per-cpu data to avoid
> +                * caching issue, which seems more complicated. So selecting
> +                * target node randomly seems better until now.
> +                */
> +               index = get_random_int() % target_nr;
> +               break;
> +       }
> +
> +       target = READ_ONCE(nd->nodes[index]);
> +
> +out:
> +       rcu_read_unlock();
> +       return target;
> +}
> +
>  #if defined(CONFIG_HOTPLUG_CPU)
>  /* Disable reclaim-based migration. */
>  static void __disable_all_migrate_targets(void)
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-06 22:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06  3:12 [PATCH] mm/migrate: move node demotion code to near its user Huang Ying
2021-12-06  5:42 ` Baolin Wang
2021-12-06 18:43 ` Yang Shi
2021-12-06 22:12 ` Wei Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).