* [PATCH v2 0/2] Support multiple target nodes demotion @ 2021-11-11 7:48 Baolin Wang 2021-11-11 7:48 ` [PATCH v2 1/2] mm: migrate: " Baolin Wang 2021-11-11 7:48 ` [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically Baolin Wang 0 siblings, 2 replies; 8+ messages in thread From: Baolin Wang @ 2021-11-11 7:48 UTC (permalink / raw) To: akpm, ying.huang, dave.hansen Cc: ziy, osalvador, shy828301, baolin.wang, zhongjiang-ali, xlpang, linux-mm, linux-kernel Hi, This patch set is used to support multiple target nodes demotion if a system has multiple slow memory nodes. Please help to review. Thanks. Changes from v1: - Add a new patch to allocate the node_demotion dynamically. - Update some comments. - Simplify some variables' name. Changes from RFC v2: - Change to 'short' type for target nodes array. - Remove nodemask instead selecting target node directly. - Add WARN_ONCE() if the target nodes exceed the maximum value. Changes from RFC v1: - Re-define the node_demotion structure. - Set up multiple target nodes by validating the node distance. - Add more comments. Baolin Wang (2): mm: migrate: Support multiple target nodes demotion mm: migrate: Allocate the node_demotion structure dynamically mm/migrate.c | 158 +++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 122 insertions(+), 36 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] mm: migrate: Support multiple target nodes demotion 2021-11-11 7:48 [PATCH v2 0/2] Support multiple target nodes demotion Baolin Wang @ 2021-11-11 7:48 ` Baolin Wang 2021-11-11 8:20 ` Huang, Ying 2021-11-11 7:48 ` [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically Baolin Wang 1 sibling, 1 reply; 8+ messages in thread From: Baolin Wang @ 2021-11-11 7:48 UTC (permalink / raw) To: akpm, ying.huang, dave.hansen Cc: ziy, osalvador, shy828301, baolin.wang, zhongjiang-ali, xlpang, linux-mm, linux-kernel We have some machines with multiple memory types like below, which have one fast (DRAM) memory node and two slow (persistent memory) memory nodes. According to current node demotion policy, if node 0 fills up, its memory should be migrated to node 1, when node 1 fills up, its memory will be migrated to node 2: node 0 -> node 1 -> node 2 ->stop. But this is not efficient and suitbale memory migration route for our machine with multiple slow memory nodes. Since the distance between node 0 to node 1 and node 0 to node 2 is equal, and memory migration between slow memory nodes will increase persistent memory bandwidth greatly, which will hurt the whole system's performance. Thus for this case, we can treat the slow memory node 1 and node 2 as a whole slow memory region, and we should migrate memory from node 0 to node 1 and node 2 if node 0 fills up. This patch changes the node_demotion data structure to support multiple target nodes, and establishes the migration path to support multiple target nodes with validating if the node distance is the best or not. available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 62153 MB node 0 free: 55135 MB node 1 cpus: node 1 size: 127007 MB node 1 free: 126930 MB node 2 cpus: node 2 size: 126968 MB node 2 free: 126878 MB node distances: node 0 1 2 0: 10 20 20 1: 20 10 20 2: 20 20 10 Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/migrate.c | 138 +++++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 102 insertions(+), 36 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index cf25b00..126e9e6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -50,6 +50,7 @@ #include <linux/ptrace.h> #include <linux/oom.h> #include <linux/memory.h> +#include <linux/random.h> #include <asm/tlbflush.h> @@ -1119,12 +1120,25 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * * This is represented in the node_demotion[] like this: * - * { 1, // Node 0 migrates to 1 - * 2, // Node 1 migrates to 2 - * -1, // Node 2 does not migrate - * 4, // Node 3 migrates to 4 - * 5, // Node 4 migrates to 5 - * -1} // Node 5 does not migrate + * { nr=1, nodes[0]=1 }, // Node 0 migrates to 1 + * { nr=1, nodes[0]=2 }, // Node 1 migrates to 2 + * { nr=0, nodes[0]=-1 }, // Node 2 does not migrate + * { nr=1, nodes[0]=4 }, // Node 3 migrates to 4 + * { nr=1, nodes[0]=5 }, // Node 4 migrates to 5 + * { nr=0, nodes[0]=-1 }, // Node 5 does not migrate + * + * Moreover some systems may have multiple slow memory nodes. + * Suppose a system has one socket with 3 memory nodes, node 0 + * is fast memory type, and node 1/2 both are slow memory + * type, and the distance between fast memory node and slow + * memory node is same. So the migration path should be: + * + * 0 -> 1/2 -> stop + * + * This is represented in the node_demotion[] like this: + * { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2 + * { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate + * { nr=0, nodes[0]=-1, }, // Node 2 does not migrate */ /* @@ -1135,8 +1149,13 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * must be held over all reads to ensure that no cycles are * observed. */ -static int node_demotion[MAX_NUMNODES] __read_mostly = - {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE}; +#define DEFAULT_DEMOTION_TARGET_NODES 15 +struct demotion_nodes { + unsigned short nr; + short nodes[DEFAULT_DEMOTION_TARGET_NODES]; +}; + +static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; /** * next_demotion_node() - Get the next node in the demotion path @@ -1149,6 +1168,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage, */ int next_demotion_node(int node) { + struct demotion_nodes *nd = &node_demotion[node]; + unsigned short target_nr, index; int target; /* @@ -1161,9 +1182,25 @@ int next_demotion_node(int node) * node_demotion[] reads need to be consistent. */ rcu_read_lock(); - target = READ_ONCE(node_demotion[node]); - rcu_read_unlock(); + target_nr = READ_ONCE(nd->nr); + + if (target_nr == 0) { + target = NUMA_NO_NODE; + goto out; + } else if (target_nr == 1) { + index = 0; + } else { + /* + * If there are multiple target nodes, just select one + * target node randomly. + */ + index = get_random_int() % target_nr; + } + + target = READ_ONCE(nd->nodes[index]); +out: + rcu_read_unlock(); return target; } @@ -2974,10 +3011,13 @@ void migrate_vma_finalize(struct migrate_vma *migrate) /* Disable reclaim-based migration. */ static void __disable_all_migrate_targets(void) { - int node; + int node, i; - for_each_online_node(node) - node_demotion[node] = NUMA_NO_NODE; + for_each_online_node(node) { + node_demotion[node].nr = 0; + for (i = 0; i < DEFAULT_DEMOTION_TARGET_NODES; i++) + node_demotion[node].nodes[i] = NUMA_NO_NODE; + } } static void disable_all_migrate_targets(void) @@ -3004,26 +3044,35 @@ static void disable_all_migrate_targets(void) * Failing here is OK. It might just indicate * being at the end of a chain. */ -static int establish_migrate_target(int node, nodemask_t *used) +static int establish_migrate_target(int node, nodemask_t *used, + int best_distance) { - int migration_target; + int migration_target, index, val; + struct demotion_nodes *nd = &node_demotion[node]; + + migration_target = find_next_best_node(node, used); + if (migration_target == NUMA_NO_NODE) + return NUMA_NO_NODE; /* - * Can not set a migration target on a - * node with it already set. - * - * No need for READ_ONCE() here since this - * in the write path for node_demotion[]. - * This should be the only thread writing. + * If the node has been set a migration target node before, + * which means it's the best distance between them. Still + * check if this node can be demoted to other target nodes + * if they have a same best distance. */ - if (node_demotion[node] != NUMA_NO_NODE) - return NUMA_NO_NODE; + if (best_distance != -1) { + val = node_distance(node, migration_target); + if (val > best_distance) + return NUMA_NO_NODE; + } - migration_target = find_next_best_node(node, used); - if (migration_target == NUMA_NO_NODE) + index = nd->nr; + if (WARN_ONCE(index >= DEFAULT_DEMOTION_TARGET_NODES, + "Exceeds maximum demotion target nodes\n")) return NUMA_NO_NODE; - node_demotion[node] = migration_target; + nd->nodes[index] = migration_target; + nd->nr++; return migration_target; } @@ -3039,7 +3088,9 @@ static int establish_migrate_target(int node, nodemask_t *used) * * The difference here is that cycles must be avoided. If * node0 migrates to node1, then neither node1, nor anything - * node1 migrates to can migrate to node0. + * node1 migrates to can migrate to node0. Also one node can + * be migrated to multiple nodes if the target nodes all have + * a same best-distance against the source node. * * This function can run simultaneously with readers of * node_demotion[]. However, it can not run simultaneously @@ -3051,7 +3102,7 @@ static void __set_migration_target_nodes(void) nodemask_t next_pass = NODE_MASK_NONE; nodemask_t this_pass = NODE_MASK_NONE; nodemask_t used_targets = NODE_MASK_NONE; - int node; + int node, best_distance; /* * Avoid any oddities like cycles that could occur @@ -3080,18 +3131,33 @@ static void __set_migration_target_nodes(void) * multiple source nodes to share a destination. */ nodes_or(used_targets, used_targets, this_pass); - for_each_node_mask(node, this_pass) { - int target_node = establish_migrate_target(node, &used_targets); - if (target_node == NUMA_NO_NODE) - continue; + for_each_node_mask(node, this_pass) { + best_distance = -1; /* - * Visit targets from this pass in the next pass. - * Eventually, every node will have been part of - * a pass, and will become set in 'used_targets'. + * Try to set up the migration path for the node, and the target + * migration nodes can be multiple, so doing a loop to find all + * the target nodes if they all have a best node distance. */ - node_set(target_node, next_pass); + do { + int target_node = + establish_migrate_target(node, &used_targets, + best_distance); + + if (target_node == NUMA_NO_NODE) + break; + + if (best_distance == -1) + best_distance = node_distance(node, target_node); + + /* + * Visit targets from this pass in the next pass. + * Eventually, every node will have been part of + * a pass, and will become set in 'used_targets'. + */ + node_set(target_node, next_pass); + } while (1); } /* * 'next_pass' contains nodes which became migration -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: Support multiple target nodes demotion 2021-11-11 7:48 ` [PATCH v2 1/2] mm: migrate: " Baolin Wang @ 2021-11-11 8:20 ` Huang, Ying 2021-11-11 10:52 ` Baolin Wang 0 siblings, 1 reply; 8+ messages in thread From: Huang, Ying @ 2021-11-11 8:20 UTC (permalink / raw) To: Baolin Wang Cc: akpm, dave.hansen, ziy, osalvador, shy828301, zhongjiang-ali, xlpang, linux-mm, linux-kernel Baolin Wang <baolin.wang@linux.alibaba.com> writes: > We have some machines with multiple memory types like below, which > have one fast (DRAM) memory node and two slow (persistent memory) memory > nodes. According to current node demotion policy, if node 0 fills up, > its memory should be migrated to node 1, when node 1 fills up, its > memory will be migrated to node 2: node 0 -> node 1 -> node 2 ->stop. > > But this is not efficient and suitbale memory migration route > for our machine with multiple slow memory nodes. Since the distance > between node 0 to node 1 and node 0 to node 2 is equal, and memory > migration between slow memory nodes will increase persistent memory > bandwidth greatly, which will hurt the whole system's performance. > > Thus for this case, we can treat the slow memory node 1 and node 2 > as a whole slow memory region, and we should migrate memory from > node 0 to node 1 and node 2 if node 0 fills up. > > This patch changes the node_demotion data structure to support multiple > target nodes, and establishes the migration path to support multiple > target nodes with validating if the node distance is the best or not. > > available: 3 nodes (0-2) > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > node 0 size: 62153 MB > node 0 free: 55135 MB > node 1 cpus: > node 1 size: 127007 MB > node 1 free: 126930 MB > node 2 cpus: > node 2 size: 126968 MB > node 2 free: 126878 MB > node distances: > node 0 1 2 > 0: 10 20 20 > 1: 20 10 20 > 2: 20 20 10 > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/migrate.c | 138 +++++++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 102 insertions(+), 36 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index cf25b00..126e9e6 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -50,6 +50,7 @@ > #include <linux/ptrace.h> > #include <linux/oom.h> > #include <linux/memory.h> > +#include <linux/random.h> > > #include <asm/tlbflush.h> > > @@ -1119,12 +1120,25 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > * > * This is represented in the node_demotion[] like this: > * > - * { 1, // Node 0 migrates to 1 > - * 2, // Node 1 migrates to 2 > - * -1, // Node 2 does not migrate > - * 4, // Node 3 migrates to 4 > - * 5, // Node 4 migrates to 5 > - * -1} // Node 5 does not migrate > + * { nr=1, nodes[0]=1 }, // Node 0 migrates to 1 > + * { nr=1, nodes[0]=2 }, // Node 1 migrates to 2 > + * { nr=0, nodes[0]=-1 }, // Node 2 does not migrate > + * { nr=1, nodes[0]=4 }, // Node 3 migrates to 4 > + * { nr=1, nodes[0]=5 }, // Node 4 migrates to 5 > + * { nr=0, nodes[0]=-1 }, // Node 5 does not migrate > + * > + * Moreover some systems may have multiple slow memory nodes. > + * Suppose a system has one socket with 3 memory nodes, node 0 > + * is fast memory type, and node 1/2 both are slow memory > + * type, and the distance between fast memory node and slow > + * memory node is same. So the migration path should be: > + * > + * 0 -> 1/2 -> stop > + * > + * This is represented in the node_demotion[] like this: > + * { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2 > + * { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate > + * { nr=0, nodes[0]=-1, }, // Node 2 does not migrate > */ > > /* > @@ -1135,8 +1149,13 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > * must be held over all reads to ensure that no cycles are > * observed. > */ > -static int node_demotion[MAX_NUMNODES] __read_mostly = > - {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE}; > +#define DEFAULT_DEMOTION_TARGET_NODES 15 > +struct demotion_nodes { > + unsigned short nr; > + short nodes[DEFAULT_DEMOTION_TARGET_NODES]; > +}; > + > +static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; > > /** > * next_demotion_node() - Get the next node in the demotion path > @@ -1149,6 +1168,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > */ > int next_demotion_node(int node) > { > + struct demotion_nodes *nd = &node_demotion[node]; > + unsigned short target_nr, index; > int target; > > /* > @@ -1161,9 +1182,25 @@ int next_demotion_node(int node) > * node_demotion[] reads need to be consistent. > */ > rcu_read_lock(); > - target = READ_ONCE(node_demotion[node]); > - rcu_read_unlock(); > + target_nr = READ_ONCE(nd->nr); > + > + if (target_nr == 0) { > + target = NUMA_NO_NODE; > + goto out; > + } else if (target_nr == 1) { > + index = 0; > + } else { > + /* > + * If there are multiple target nodes, just select one > + * target node randomly. > + */ > + index = get_random_int() % target_nr; > + } How about use "switch" here? Best Regards, Huang, Ying > + > + target = READ_ONCE(nd->nodes[index]); > > +out: > + rcu_read_unlock(); > return target; > } > [snip] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: Support multiple target nodes demotion 2021-11-11 8:20 ` Huang, Ying @ 2021-11-11 10:52 ` Baolin Wang 0 siblings, 0 replies; 8+ messages in thread From: Baolin Wang @ 2021-11-11 10:52 UTC (permalink / raw) To: Huang, Ying Cc: akpm, dave.hansen, ziy, osalvador, shy828301, zhongjiang-ali, xlpang, linux-mm, linux-kernel On 2021/11/11 16:20, Huang, Ying wrote: > Baolin Wang <baolin.wang@linux.alibaba.com> writes: > >> We have some machines with multiple memory types like below, which >> have one fast (DRAM) memory node and two slow (persistent memory) memory >> nodes. According to current node demotion policy, if node 0 fills up, >> its memory should be migrated to node 1, when node 1 fills up, its >> memory will be migrated to node 2: node 0 -> node 1 -> node 2 ->stop. >> >> But this is not efficient and suitbale memory migration route >> for our machine with multiple slow memory nodes. Since the distance >> between node 0 to node 1 and node 0 to node 2 is equal, and memory >> migration between slow memory nodes will increase persistent memory >> bandwidth greatly, which will hurt the whole system's performance. >> >> Thus for this case, we can treat the slow memory node 1 and node 2 >> as a whole slow memory region, and we should migrate memory from >> node 0 to node 1 and node 2 if node 0 fills up. >> >> This patch changes the node_demotion data structure to support multiple >> target nodes, and establishes the migration path to support multiple >> target nodes with validating if the node distance is the best or not. >> >> available: 3 nodes (0-2) >> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >> node 0 size: 62153 MB >> node 0 free: 55135 MB >> node 1 cpus: >> node 1 size: 127007 MB >> node 1 free: 126930 MB >> node 2 cpus: >> node 2 size: 126968 MB >> node 2 free: 126878 MB >> node distances: >> node 0 1 2 >> 0: 10 20 20 >> 1: 20 10 20 >> 2: 20 20 10 >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/migrate.c | 138 +++++++++++++++++++++++++++++++++++++++++++---------------- >> 1 file changed, 102 insertions(+), 36 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index cf25b00..126e9e6 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -50,6 +50,7 @@ >> #include <linux/ptrace.h> >> #include <linux/oom.h> >> #include <linux/memory.h> >> +#include <linux/random.h> >> >> #include <asm/tlbflush.h> >> >> @@ -1119,12 +1120,25 @@ static int __unmap_and_move(struct page *page, struct page *newpage, >> * >> * This is represented in the node_demotion[] like this: >> * >> - * { 1, // Node 0 migrates to 1 >> - * 2, // Node 1 migrates to 2 >> - * -1, // Node 2 does not migrate >> - * 4, // Node 3 migrates to 4 >> - * 5, // Node 4 migrates to 5 >> - * -1} // Node 5 does not migrate >> + * { nr=1, nodes[0]=1 }, // Node 0 migrates to 1 >> + * { nr=1, nodes[0]=2 }, // Node 1 migrates to 2 >> + * { nr=0, nodes[0]=-1 }, // Node 2 does not migrate >> + * { nr=1, nodes[0]=4 }, // Node 3 migrates to 4 >> + * { nr=1, nodes[0]=5 }, // Node 4 migrates to 5 >> + * { nr=0, nodes[0]=-1 }, // Node 5 does not migrate >> + * >> + * Moreover some systems may have multiple slow memory nodes. >> + * Suppose a system has one socket with 3 memory nodes, node 0 >> + * is fast memory type, and node 1/2 both are slow memory >> + * type, and the distance between fast memory node and slow >> + * memory node is same. So the migration path should be: >> + * >> + * 0 -> 1/2 -> stop >> + * >> + * This is represented in the node_demotion[] like this: >> + * { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2 >> + * { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate >> + * { nr=0, nodes[0]=-1, }, // Node 2 does not migrate >> */ >> >> /* >> @@ -1135,8 +1149,13 @@ static int __unmap_and_move(struct page *page, struct page *newpage, >> * must be held over all reads to ensure that no cycles are >> * observed. >> */ >> -static int node_demotion[MAX_NUMNODES] __read_mostly = >> - {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE}; >> +#define DEFAULT_DEMOTION_TARGET_NODES 15 >> +struct demotion_nodes { >> + unsigned short nr; >> + short nodes[DEFAULT_DEMOTION_TARGET_NODES]; >> +}; >> + >> +static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; >> >> /** >> * next_demotion_node() - Get the next node in the demotion path >> @@ -1149,6 +1168,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage, >> */ >> int next_demotion_node(int node) >> { >> + struct demotion_nodes *nd = &node_demotion[node]; >> + unsigned short target_nr, index; >> int target; >> >> /* >> @@ -1161,9 +1182,25 @@ int next_demotion_node(int node) >> * node_demotion[] reads need to be consistent. >> */ >> rcu_read_lock(); >> - target = READ_ONCE(node_demotion[node]); >> - rcu_read_unlock(); >> + target_nr = READ_ONCE(nd->nr); >> + >> + if (target_nr == 0) { >> + target = NUMA_NO_NODE; >> + goto out; >> + } else if (target_nr == 1) { >> + index = 0; >> + } else { >> + /* >> + * If there are multiple target nodes, just select one >> + * target node randomly. >> + */ >> + index = get_random_int() % target_nr; >> + } > > How about use "switch" here? Sure, will do. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically 2021-11-11 7:48 [PATCH v2 0/2] Support multiple target nodes demotion Baolin Wang 2021-11-11 7:48 ` [PATCH v2 1/2] mm: migrate: " Baolin Wang @ 2021-11-11 7:48 ` Baolin Wang 2021-11-11 8:51 ` Huang, Ying 1 sibling, 1 reply; 8+ messages in thread From: Baolin Wang @ 2021-11-11 7:48 UTC (permalink / raw) To: akpm, ying.huang, dave.hansen Cc: ziy, osalvador, shy828301, baolin.wang, zhongjiang-ali, xlpang, linux-mm, linux-kernel For the worst case (MAX_NUMNODES=1024), the node_demotion structure can consume 32k bytes, which appears too large, so we can change to allocate node_demotion dynamically at initialization time. Meanwhile allocating the target demotion nodes array dynamically to select a suitable size according to the MAX_NUMNODES. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/migrate.c | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 126e9e6..0145b38 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1152,10 +1152,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage, #define DEFAULT_DEMOTION_TARGET_NODES 15 struct demotion_nodes { unsigned short nr; - short nodes[DEFAULT_DEMOTION_TARGET_NODES]; + short nodes[]; }; -static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; +static struct demotion_nodes *node_demotion[MAX_NUMNODES] __read_mostly; +static unsigned short target_nodes_max; /** * next_demotion_node() - Get the next node in the demotion path @@ -1168,10 +1169,13 @@ struct demotion_nodes { */ int next_demotion_node(int node) { - struct demotion_nodes *nd = &node_demotion[node]; + struct demotion_nodes *nd = node_demotion[node]; unsigned short target_nr, index; int target; + if (!nd) + return NUMA_NO_NODE; + /* * node_demotion[] is updated without excluding this * function from running. RCU doesn't provide any @@ -3014,9 +3018,9 @@ static void __disable_all_migrate_targets(void) int node, i; for_each_online_node(node) { - node_demotion[node].nr = 0; - for (i = 0; i < DEFAULT_DEMOTION_TARGET_NODES; i++) - node_demotion[node].nodes[i] = NUMA_NO_NODE; + node_demotion[node]->nr = 0; + for (i = 0; i < target_nodes_max; i++) + node_demotion[node]->nodes[i] = NUMA_NO_NODE; } } @@ -3048,7 +3052,10 @@ static int establish_migrate_target(int node, nodemask_t *used, int best_distance) { int migration_target, index, val; - struct demotion_nodes *nd = &node_demotion[node]; + struct demotion_nodes *nd = node_demotion[node]; + + if (WARN_ONCE(!nd, "Can not set up migration path for node:%d\n", node)) + return NUMA_NO_NODE; migration_target = find_next_best_node(node, used); if (migration_target == NUMA_NO_NODE) @@ -3067,7 +3074,7 @@ static int establish_migrate_target(int node, nodemask_t *used, } index = nd->nr; - if (WARN_ONCE(index >= DEFAULT_DEMOTION_TARGET_NODES, + if (WARN_ONCE(index >= target_nodes_max, "Exceeds maximum demotion target nodes\n")) return NUMA_NO_NODE; @@ -3256,7 +3263,20 @@ static int migration_offline_cpu(unsigned int cpu) static int __init migrate_on_reclaim_init(void) { - int ret; + struct demotion_nodes *nd; + int ret, node; + + /* Keep the maximum target demotion nodes are less than MAX_NUMNODES. */ + target_nodes_max = min_t(unsigned short, DEFAULT_DEMOTION_TARGET_NODES, + MAX_NUMNODES - 1); + for_each_node(node) { + nd = kmalloc(struct_size(nd, nodes, target_nodes_max), + GFP_KERNEL); + if (!nd) + continue; + + node_demotion[node] = nd; + } ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", NULL, migration_offline_cpu); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically 2021-11-11 7:48 ` [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically Baolin Wang @ 2021-11-11 8:51 ` Huang, Ying 2021-11-11 11:20 ` Baolin Wang 0 siblings, 1 reply; 8+ messages in thread From: Huang, Ying @ 2021-11-11 8:51 UTC (permalink / raw) To: Baolin Wang Cc: akpm, dave.hansen, ziy, osalvador, shy828301, zhongjiang-ali, xlpang, linux-mm, linux-kernel Baolin Wang <baolin.wang@linux.alibaba.com> writes: > For the worst case (MAX_NUMNODES=1024), the node_demotion structure can > consume 32k bytes, which appears too large, so we can change to allocate > node_demotion dynamically at initialization time. Meanwhile allocating > the target demotion nodes array dynamically to select a suitable size > according to the MAX_NUMNODES. > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/migrate.c | 38 +++++++++++++++++++++++++++++--------- > 1 file changed, 29 insertions(+), 9 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 126e9e6..0145b38 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1152,10 +1152,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > #define DEFAULT_DEMOTION_TARGET_NODES 15 > struct demotion_nodes { > unsigned short nr; > - short nodes[DEFAULT_DEMOTION_TARGET_NODES]; > + short nodes[]; > }; > > -static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; > +static struct demotion_nodes *node_demotion[MAX_NUMNODES] __read_mostly; > +static unsigned short target_nodes_max; I think we can use something as below, #if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES #define DEMOTION_TARGET_NODES (MAX_NUMNODES - 1) #else #define DEMOTION_TARGET_NODES DEFAULT_DEMOTION_TARGET_NODES #endif static struct demotion_nodes *node_demotion; Then we can allocate nr_node_ids * sizeof(struct demotion_nodes) for node_demotion. Best Regards, Huang, Ying [snip] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically 2021-11-11 8:51 ` Huang, Ying @ 2021-11-11 11:20 ` Baolin Wang 2021-11-11 23:39 ` Huang, Ying 0 siblings, 1 reply; 8+ messages in thread From: Baolin Wang @ 2021-11-11 11:20 UTC (permalink / raw) To: Huang, Ying Cc: akpm, dave.hansen, ziy, osalvador, shy828301, zhongjiang-ali, xlpang, linux-mm, linux-kernel On 2021/11/11 16:51, Huang, Ying wrote: > Baolin Wang <baolin.wang@linux.alibaba.com> writes: > >> For the worst case (MAX_NUMNODES=1024), the node_demotion structure can >> consume 32k bytes, which appears too large, so we can change to allocate >> node_demotion dynamically at initialization time. Meanwhile allocating >> the target demotion nodes array dynamically to select a suitable size >> according to the MAX_NUMNODES. >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/migrate.c | 38 +++++++++++++++++++++++++++++--------- >> 1 file changed, 29 insertions(+), 9 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 126e9e6..0145b38 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1152,10 +1152,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage, >> #define DEFAULT_DEMOTION_TARGET_NODES 15 >> struct demotion_nodes { >> unsigned short nr; >> - short nodes[DEFAULT_DEMOTION_TARGET_NODES]; >> + short nodes[]; >> }; >> >> -static struct demotion_nodes node_demotion[MAX_NUMNODES] __read_mostly; >> +static struct demotion_nodes *node_demotion[MAX_NUMNODES] __read_mostly; >> +static unsigned short target_nodes_max; > > I think we can use something as below, > > #if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES > #define DEMOTION_TARGET_NODES (MAX_NUMNODES - 1) > #else > #define DEMOTION_TARGET_NODES DEFAULT_DEMOTION_TARGET_NODES > #endif Yes, looks better. > > static struct demotion_nodes *node_demotion; > > Then we can allocate nr_node_ids * sizeof(struct demotion_nodes) for node_demotion. Yeah, this is simple. The reason I want to declare the structure like "struct demotion_nodes *node_demotion[MAX_NUMNODES]" is that, we can validate the non-possible nodes which are invalid to demote memory, and in case the node_demotion[nid] is failed to be allocated which can be validated, though this is unlikely. However, I agree with you to keep things simple now and can be merged into patch 1. Will do in next version. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically 2021-11-11 11:20 ` Baolin Wang @ 2021-11-11 23:39 ` Huang, Ying 0 siblings, 0 replies; 8+ messages in thread From: Huang, Ying @ 2021-11-11 23:39 UTC (permalink / raw) To: Baolin Wang Cc: akpm, dave.hansen, ziy, osalvador, shy828301, zhongjiang-ali, xlpang, linux-mm, linux-kernel Baolin Wang <baolin.wang@linux.alibaba.com> writes: > On 2021/11/11 16:51, Huang, Ying wrote: >> Baolin Wang <baolin.wang@linux.alibaba.com> writes: >> >>> For the worst case (MAX_NUMNODES=1024), the node_demotion structure can >>> consume 32k bytes, which appears too large, so we can change to allocate >>> node_demotion dynamically at initialization time. Meanwhile allocating >>> the target demotion nodes array dynamically to select a suitable size >>> according to the MAX_NUMNODES. >>> >>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>> --- >>> mm/migrate.c | 38 +++++++++++++++++++++++++++++--------- >>> 1 file changed, 29 insertions(+), 9 deletions(-) >>> >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index 126e9e6..0145b38 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -1152,10 +1152,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage, >>> #define DEFAULT_DEMOTION_TARGET_NODES 15 >>> struct demotion_nodes { >>> unsigned short nr; >>> - short nodes[DEFAULT_DEMOTION_TARGET_NODES]; >>> + short nodes[]; >>> }; >>> -static struct demotion_nodes node_demotion[MAX_NUMNODES] >>> __read_mostly; >>> +static struct demotion_nodes *node_demotion[MAX_NUMNODES] __read_mostly; >>> +static unsigned short target_nodes_max; >> I think we can use something as below, >> #if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES >> #define DEMOTION_TARGET_NODES (MAX_NUMNODES - 1) >> #else >> #define DEMOTION_TARGET_NODES DEFAULT_DEMOTION_TARGET_NODES >> #endif > > Yes, looks better. > >> static struct demotion_nodes *node_demotion; >> Then we can allocate nr_node_ids * sizeof(struct demotion_nodes) for >> node_demotion. > > Yeah, this is simple. The reason I want to declare the structure like > "struct demotion_nodes *node_demotion[MAX_NUMNODES]" is that, we can > validate the non-possible nodes which are invalid to demote memory, > and in case the node_demotion[nid] is failed to be allocated which can > be validated, though this is unlikely. In case allocation failure, we can still check "node_demotion == NULL". Best Regards, Huang, Ying > However, I agree with you to > keep things simple now and can be merged into patch 1. Will do in next > version. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-11-11 23:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-11 7:48 [PATCH v2 0/2] Support multiple target nodes demotion Baolin Wang 2021-11-11 7:48 ` [PATCH v2 1/2] mm: migrate: " Baolin Wang 2021-11-11 8:20 ` Huang, Ying 2021-11-11 10:52 ` Baolin Wang 2021-11-11 7:48 ` [PATCH v2 2/2] mm: migrate: Allocate the node_demotion structure dynamically Baolin Wang 2021-11-11 8:51 ` Huang, Ying 2021-11-11 11:20 ` Baolin Wang 2021-11-11 23:39 ` Huang, Ying
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).