linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
@ 2022-01-23  1:35 Wei Yang
  2022-01-23  1:35 ` [PATCH 2/2] mm/page_alloc: add penalty to local_node Wei Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Wei Yang @ 2022-01-23  1:35 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Wei Yang

To make node order in round-robin in the same distance group, we add a
penalty to the first node we got in each round.

To get a round-robin order in the same distance group, we don't need to
decrease the penalty since:

  * find_next_best_node() always iterates node in the same order
  * distance matters more then penalty in find_next_best_node()
  * in nodes with the same distance, the first one would be picked up

So it is fine to increase same penalty when we get the first node in the
same distance group.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/page_alloc.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..f27afd517652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6245,13 +6245,12 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
 static void build_zonelists(pg_data_t *pgdat)
 {
 	static int node_order[MAX_NUMNODES];
-	int node, load, nr_nodes = 0;
+	int node, nr_nodes = 0;
 	nodemask_t used_mask = NODE_MASK_NONE;
 	int local_node, prev_node;
 
 	/* NUMA-aware ordering of nodes */
 	local_node = pgdat->node_id;
-	load = nr_online_nodes;
 	prev_node = local_node;
 
 	memset(node_order, 0, sizeof(node_order));
@@ -6263,11 +6262,10 @@ static void build_zonelists(pg_data_t *pgdat)
 		 */
 		if (node_distance(local_node, node) !=
 		    node_distance(local_node, prev_node))
-			node_load[node] += load;
+			node_load[node] += nr_online_nodes;
 
 		node_order[nr_nodes++] = node;
 		prev_node = node;
-		load--;
 	}
 
 	build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] mm/page_alloc: add penalty to local_node
  2022-01-23  1:35 [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Wei Yang
@ 2022-01-23  1:35 ` Wei Yang
  2022-03-21 21:17 ` [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Andrew Morton
  2022-04-05 17:11 ` Vlastimil Babka
  2 siblings, 0 replies; 7+ messages in thread
From: Wei Yang @ 2022-01-23  1:35 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, Wei Yang, Krupa Ramakrishnan,
	KAMEZAWA Hiroyuki, Michal Hocko

'Commit 54d032ced983 ("mm/page_alloc: use accumulated load when building
node fallback list")' fix a bug on zonelist order. This let me think
about what would happen if we have a node system with the following
distance matrix.

   Node 0  1  2  3  4  5  6  7
   ----------------------------
   0    10 12 12 12 32 32 32 32
   1    12 10 12 12 32 32 32 32
   2    12 12 10 12 32 32 32 32
   3    12 12 12 10 32 32 32 32
   4    32 32 32 32 10 12 12 12
   5    32 32 32 32 12 10 12 12
   6    32 32 32 32 12 12 10 12
   7    32 32 32 32 12 12 12 10

Unfortunately for this case, the node fallback list gets built like this:

   Node Fallback list
   ---------------------
    0:   0  1  2  3  4  5  6  7
    1:   1  0  2  3  5  6  7  4
    2:   2  3  0  1  6  7  4  5
    3:   3  2  0  1  7  4  5  6
    4:   4  5  6  7  0  1  2  3
    5:   5  4  6  7  1  2  3  0
    6:   6  7  4  5  2  3  0  1
    7:   7  6  4  5  3  0  1  2

We found the order in diagonal block is not expected. The reason is we
don't penalty local node.

After penalty local node, the node fallback list gets built like this:

   Node Fallback list
   ---------------------
   0:   0  1  2  3  4  5  6  7
   1:   1  2  3  0  5  6  7  4
   2:   2  3  0  1  6  7  4  5
   3:   3  0  1  2  7  4  5  6
   4:   4  5  6  7  0  1  2  3
   5:   5  6  7  4  1  2  3  0
   6:   6  7  4  5  2  3  0  1
   7:   7  4  5  6  3  0  1  2

Now the fallback list is in round-robin order.

I am not very familiar with the node distance pattern, while I tried the
following distance matrix. Both of them works with this change.

   Node 0  1  2  3
   ----------------
   0    10 10 10 10
   1    10 10 10 10
   2    10 10 10 10
   3    10 10 10 10

   Node 0  1  2  3  4  5  6  7
   ----------------------------
   0    10 10 10 10 32 32 32 32
   1    10 10 10 10 32 32 32 32
   2    10 10 10 10 32 32 32 32
   3    10 10 10 10 32 32 32 32
   4    32 32 32 32 10 10 10 10
   5    32 32 32 32 10 10 10 10
   6    32 32 32 32 10 10 10 10
   7    32 32 32 32 10 10 10 10

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f27afd517652..0cc25429a17e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6260,8 +6260,9 @@ static void build_zonelists(pg_data_t *pgdat)
 		 * So adding penalty to the first node in same
 		 * distance group to make it round-robin.
 		 */
-		if (node_distance(local_node, node) !=
-		    node_distance(local_node, prev_node))
+		if ((node_distance(local_node, node) !=
+		    node_distance(local_node, prev_node)) ||
+		    node == local_node)
 			node_load[node] += nr_online_nodes;
 
 		node_order[nr_nodes++] = node;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
  2022-01-23  1:35 [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Wei Yang
  2022-01-23  1:35 ` [PATCH 2/2] mm/page_alloc: add penalty to local_node Wei Yang
@ 2022-03-21 21:17 ` Andrew Morton
  2022-03-26 15:48   ` Wei Yang
  2022-04-05 17:11 ` Vlastimil Babka
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2022-03-21 21:17 UTC (permalink / raw)
  To: Wei Yang; +Cc: linux-mm, linux-kernel

Could someone please help review this series?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
  2022-03-21 21:17 ` [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Andrew Morton
@ 2022-03-26 15:48   ` Wei Yang
  0 siblings, 0 replies; 7+ messages in thread
From: Wei Yang @ 2022-03-26 15:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Wei Yang, linux-mm, linux-kernel

On Mon, Mar 21, 2022 at 02:17:23PM -0700, Andrew Morton wrote:
>Could someone please help review this series?
>

Andrew,

Sorry for my mistake, I forgot one element in my emulation code for Patch 2.
Current kernel could handle the situation described in commit log properly.

So Patch 2 is not necessary. Sorry for the disturbance.

>Thanks.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
  2022-01-23  1:35 [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Wei Yang
  2022-01-23  1:35 ` [PATCH 2/2] mm/page_alloc: add penalty to local_node Wei Yang
  2022-03-21 21:17 ` [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Andrew Morton
@ 2022-04-05 17:11 ` Vlastimil Babka
  2022-04-06 23:47   ` Wei Yang
  2 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2022-04-05 17:11 UTC (permalink / raw)
  To: Wei Yang, akpm; +Cc: linux-mm, linux-kernel

On 1/23/22 02:35, Wei Yang wrote:
> To make node order in round-robin in the same distance group, we add a
> penalty to the first node we got in each round.
> 
> To get a round-robin order in the same distance group, we don't need to
> decrease the penalty since:
> 
>   * find_next_best_node() always iterates node in the same order
>   * distance matters more then penalty in find_next_best_node()
>   * in nodes with the same distance, the first one would be picked up
> 
> So it is fine to increase same penalty when we get the first node in the
> same distance group.

With that logic I'm not even sure if we need nr_online_nodes as penalty or
it could be just 1. Would you know?

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  mm/page_alloc.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..f27afd517652 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6245,13 +6245,12 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
>  static void build_zonelists(pg_data_t *pgdat)
>  {
>  	static int node_order[MAX_NUMNODES];
> -	int node, load, nr_nodes = 0;
> +	int node, nr_nodes = 0;
>  	nodemask_t used_mask = NODE_MASK_NONE;
>  	int local_node, prev_node;
>  
>  	/* NUMA-aware ordering of nodes */
>  	local_node = pgdat->node_id;
> -	load = nr_online_nodes;
>  	prev_node = local_node;
>  
>  	memset(node_order, 0, sizeof(node_order));
> @@ -6263,11 +6262,10 @@ static void build_zonelists(pg_data_t *pgdat)
>  		 */
>  		if (node_distance(local_node, node) !=
>  		    node_distance(local_node, prev_node))
> -			node_load[node] += load;
> +			node_load[node] += nr_online_nodes;
>  
>  		node_order[nr_nodes++] = node;
>  		prev_node = node;
> -		load--;
>  	}
>  
>  	build_zonelists_in_node_order(pgdat, node_order, nr_nodes);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
  2022-04-05 17:11 ` Vlastimil Babka
@ 2022-04-06 23:47   ` Wei Yang
  2022-04-07  9:53     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Wei Yang @ 2022-04-06 23:47 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Wei Yang, akpm, linux-mm, linux-kernel

On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>On 1/23/22 02:35, Wei Yang wrote:
>> To make node order in round-robin in the same distance group, we add a
>> penalty to the first node we got in each round.
>> 
>> To get a round-robin order in the same distance group, we don't need to
>> decrease the penalty since:
>> 
>>   * find_next_best_node() always iterates node in the same order
>>   * distance matters more then penalty in find_next_best_node()
>>   * in nodes with the same distance, the first one would be picked up
>> 
>> So it is fine to increase same penalty when we get the first node in the
>> same distance group.
>
>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>it could be just 1. Would you know?

Yes, it has the same effect.

[    0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
[    0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
[    0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
[    0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
[    0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
[    0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
[    0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
[    0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2

Do you prefer to set it to 1?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order
  2022-04-06 23:47   ` Wei Yang
@ 2022-04-07  9:53     ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2022-04-07  9:53 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, linux-mm, linux-kernel, Oscar Salvador, David Hildenbrand

On 4/7/22 01:47, Wei Yang wrote:
> On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>>On 1/23/22 02:35, Wei Yang wrote:
>>> To make node order in round-robin in the same distance group, we add a
>>> penalty to the first node we got in each round.
>>> 
>>> To get a round-robin order in the same distance group, we don't need to
>>> decrease the penalty since:
>>> 
>>>   * find_next_best_node() always iterates node in the same order
>>>   * distance matters more then penalty in find_next_best_node()
>>>   * in nodes with the same distance, the first one would be picked up
>>> 
>>> So it is fine to increase same penalty when we get the first node in the
>>> same distance group.
>>
>>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>>it could be just 1. Would you know?
> 
> Yes, it has the same effect.

Good.

> [    0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
> [    0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
> [    0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
> [    0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
> [    0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
> [    0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
> [    0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
> [    0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2
> 
> Do you prefer to set it to 1?

Yeah I think it's worth simplyfing as much as feasible, so the code is more
obvious. I think we can also then remove the MAX_NODE_LOAD #define and usage.

Also please Cc at least Oscar and David (added to Cc now) on v2 as they have
been active in memory hotplug area recently.

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-07  9:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-23  1:35 [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Wei Yang
2022-01-23  1:35 ` [PATCH 2/2] mm/page_alloc: add penalty to local_node Wei Yang
2022-03-21 21:17 ` [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order Andrew Morton
2022-03-26 15:48   ` Wei Yang
2022-04-05 17:11 ` Vlastimil Babka
2022-04-06 23:47   ` Wei Yang
2022-04-07  9:53     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).