linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] hugetlb: Support node specified when using cma for gigantic hugepages
@ 2021-10-14  6:08 Baolin Wang
  2021-10-14 21:48 ` Mike Kravetz
  0 siblings, 1 reply; 3+ messages in thread
From: Baolin Wang @ 2021-10-14  6:08 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: mhocko, guro, corbet, yaozhenguo1, baolin.wang, linux-mm,
	linux-kernel, linux-doc

Now the size of CMA area for gigantic hugepages runtime allocation is
balanced for all online nodes, but we also want to specify the size of
CMA per-node, or only one node in some cases, which are similar with
commit 86acc55c3d32 ("hugetlbfs: extend the definition of hugepages
parameter to support node allocation")[1].

For example, on some multi-nodes systems, each node's memory can be
different, allocating the same size of CMA for each node is not suitable
for the low-memory nodes. Meanwhile some workloads like DPDK mentioned by
Zhenguo in patch [1] only need hugepages in one node.

On the other hand, we have some machines with multiple types of memory,
like DRAM and PMEM (persistent memory). On this system, we may want to
specify all the hugepages only on DRAM node, or specify the proportion
of DRAM node and PMEM node, to tuning the performance of the workloads.

Thus this patch adds node format for 'hugetlb_cma' parameter to support
specifying the size of CMA per-node. An example is as follows:

hugetlb_cma=0:5G,2:5G

which means allocating 5G size of CMA area on node 0 and node 2
respectively. And the users should use the node specific sysfs file to
allocate the gigantic hugepages if specified the CMA size on that node.

[1]
https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
Changes from v1:
 - Update the commit log.
 - Avoid changing the behavior for 'balanced' gigantic huge page pool
 allocations.
 - Catch the invalid node specified in hugetlb_cma_reserve().
 - Validate the size of CMA for each node in hugetlb_cma_reserve().
---
 Documentation/admin-guide/kernel-parameters.txt |  6 +-
 mm/hugetlb.c                                    | 98 ++++++++++++++++++++++---
 2 files changed, 93 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3ad8e9d0..a147faa5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1587,8 +1587,10 @@
 			registers.  Default set by CONFIG_HPET_MMAP_DEFAULT.
 
 	hugetlb_cma=	[HW,CMA] The size of a CMA area used for allocation
-			of gigantic hugepages.
-			Format: nn[KMGTPE]
+			of gigantic hugepages. Or using node format, the size
+			of a CMA area per node can be specified.
+			Format: nn[KMGTPE] or (node format)
+				<node>:nn[KMGTPE][,<node>:nn[KMGTPE]]
 
 			Reserve a CMA area of given size and allocate gigantic
 			hugepages using the CMA allocator. If enabled, the
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6d2f4c2..ac9afc2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -50,6 +50,7 @@
 
 #ifdef CONFIG_CMA
 static struct cma *hugetlb_cma[MAX_NUMNODES];
+static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
 static bool hugetlb_cma_page(struct page *page, unsigned int order)
 {
 	return cma_pages_valid(hugetlb_cma[page_to_nid(page)], page,
@@ -62,6 +63,7 @@ static bool hugetlb_cma_page(struct page *page, unsigned int order)
 }
 #endif
 static unsigned long hugetlb_cma_size __initdata;
+static nodemask_t hugetlb_cma_nodes_allowed = NODE_MASK_NONE;
 
 /*
  * Minimum page order among possible hugepage sizes, set to a proper value
@@ -3508,7 +3510,16 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
 		/*
 		 * Node specific request.  count adjustment happens in
 		 * set_max_huge_pages() after acquiring hugetlb_lock.
+		 *
+		 * If we've specified the size of CMA area per node for
+		 * gigantic hugepages, should catch the warning if the
+		 * nid is not in the 'hugetlb_cma_nodes_allowed' nodemask.
 		 */
+		if (hstate_is_gigantic(h) &&
+		    !nodes_empty(hugetlb_cma_nodes_allowed) &&
+		    !node_isset(nid, hugetlb_cma_nodes_allowed))
+			pr_warn("hugetlb_cma: no reservation on this node %d\n", nid);
+
 		init_nodemask_of_node(&nodes_allowed, nid);
 		n_mask = &nodes_allowed;
 	}
@@ -6745,7 +6756,38 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
 
 static int __init cmdline_parse_hugetlb_cma(char *p)
 {
-	hugetlb_cma_size = memparse(p, &p);
+	int nid, count = 0;
+	unsigned long tmp;
+	char *s = p;
+
+	while (*s) {
+		if (sscanf(s, "%lu%n", &tmp, &count) != 1)
+			break;
+
+		if (s[count] == ':') {
+			nid = tmp;
+			if (nid < 0 || nid >= MAX_NUMNODES)
+				break;
+
+			s += count + 1;
+			tmp = memparse(s, &s);
+			hugetlb_cma_size_in_node[nid] = tmp;
+			hugetlb_cma_size += tmp;
+
+			/*
+			 * Skip the separator if have one, otherwise
+			 * break the parsing.
+			 */
+			if (*s == ',')
+				s++;
+			else
+				break;
+		} else {
+			hugetlb_cma_size = memparse(p, &p);
+			break;
+		}
+	}
+
 	return 0;
 }
 
@@ -6754,6 +6796,7 @@ static int __init cmdline_parse_hugetlb_cma(char *p)
 void __init hugetlb_cma_reserve(int order)
 {
 	unsigned long size, reserved, per_node;
+	bool node_specific_cma_alloc = false;
 	int nid;
 
 	cma_reserve_called = true;
@@ -6761,26 +6804,61 @@ void __init hugetlb_cma_reserve(int order)
 	if (!hugetlb_cma_size)
 		return;
 
+	for (nid = 0; nid < MAX_NUMNODES; nid++) {
+		if (hugetlb_cma_size_in_node[nid] == 0)
+			continue;
+
+		if (!node_state(nid, N_ONLINE)) {
+			pr_warn("hugetlb_cma: invalid node %d specified\n", nid);
+			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
+			hugetlb_cma_size_in_node[nid] = 0;
+			continue;
+		}
+
+		if (hugetlb_cma_size_in_node[nid] < (PAGE_SIZE << order)) {
+			pr_warn("hugetlb_cma: cma area of node %d should be at least %lu MiB\n",
+				nid, (PAGE_SIZE << order) / SZ_1M);
+			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
+			hugetlb_cma_size_in_node[nid] = 0;
+		} else {
+			node_specific_cma_alloc = true;
+		}
+	}
+
+	/* Validate the CMA size again in case some invalid nodes specified. */
+	if (!hugetlb_cma_size)
+		return;
+
 	if (hugetlb_cma_size < (PAGE_SIZE << order)) {
 		pr_warn("hugetlb_cma: cma area should be at least %lu MiB\n",
 			(PAGE_SIZE << order) / SZ_1M);
 		return;
 	}
 
-	/*
-	 * If 3 GB area is requested on a machine with 4 numa nodes,
-	 * let's allocate 1 GB on first three nodes and ignore the last one.
-	 */
-	per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes);
-	pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n",
-		hugetlb_cma_size / SZ_1M, per_node / SZ_1M);
+	if (!node_specific_cma_alloc) {
+		/*
+		 * If 3 GB area is requested on a machine with 4 numa nodes,
+		 * let's allocate 1 GB on first three nodes and ignore the last one.
+		 */
+		per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes);
+		pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n",
+			hugetlb_cma_size / SZ_1M, per_node / SZ_1M);
+	}
 
 	reserved = 0;
 	for_each_node_state(nid, N_ONLINE) {
 		int res;
 		char name[CMA_MAX_NAME];
 
-		size = min(per_node, hugetlb_cma_size - reserved);
+		if (node_specific_cma_alloc) {
+			if (hugetlb_cma_size_in_node[nid] == 0)
+				continue;
+
+			size = hugetlb_cma_size_in_node[nid];
+		} else {
+			size = min(per_node, hugetlb_cma_size - reserved);
+		}
+
 		size = round_up(size, PAGE_SIZE << order);
 
 		snprintf(name, sizeof(name), "hugetlb%d", nid);
@@ -6799,6 +6877,8 @@ void __init hugetlb_cma_reserve(int order)
 			continue;
 		}
 
+		if (node_specific_cma_alloc)
+			node_set(nid, hugetlb_cma_nodes_allowed);
 		reserved += size;
 		pr_info("hugetlb_cma: reserved %lu MiB on node %d\n",
 			size / SZ_1M, nid);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] hugetlb: Support node specified when using cma for gigantic hugepages
  2021-10-14  6:08 [PATCH v2] hugetlb: Support node specified when using cma for gigantic hugepages Baolin Wang
@ 2021-10-14 21:48 ` Mike Kravetz
  2021-10-15  1:15   ` Baolin Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Kravetz @ 2021-10-14 21:48 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: mhocko, guro, corbet, yaozhenguo1, linux-mm, linux-kernel, linux-doc

On 10/13/21 11:08 PM, Baolin Wang wrote:
> Now the size of CMA area for gigantic hugepages runtime allocation is
> balanced for all online nodes, but we also want to specify the size of
> CMA per-node, or only one node in some cases, which are similar with
> commit 86acc55c3d32 ("hugetlbfs: extend the definition of hugepages
> parameter to support node allocation")[1].

I would not include the commit hash here.  IIUC, this can change as it
is moved to Linus' tree in the next merge window.

> For example, on some multi-nodes systems, each node's memory can be
> different, allocating the same size of CMA for each node is not suitable
> for the low-memory nodes. Meanwhile some workloads like DPDK mentioned by
> Zhenguo in patch [1] only need hugepages in one node.
> 
> On the other hand, we have some machines with multiple types of memory,
> like DRAM and PMEM (persistent memory). On this system, we may want to
> specify all the hugepages only on DRAM node, or specify the proportion
> of DRAM node and PMEM node, to tuning the performance of the workloads.
> 
> Thus this patch adds node format for 'hugetlb_cma' parameter to support
> specifying the size of CMA per-node. An example is as follows:
> 
> hugetlb_cma=0:5G,2:5G
> 
> which means allocating 5G size of CMA area on node 0 and node 2
> respectively. And the users should use the node specific sysfs file to
> allocate the gigantic hugepages if specified the CMA size on that node.
> 
> [1]
> https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> Changes from v1:
>  - Update the commit log.
>  - Avoid changing the behavior for 'balanced' gigantic huge page pool
>  allocations.
>  - Catch the invalid node specified in hugetlb_cma_reserve().
>  - Validate the size of CMA for each node in hugetlb_cma_reserve().
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  6 +-
>  mm/hugetlb.c                                    | 98 ++++++++++++++++++++++---
>  2 files changed, 93 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3ad8e9d0..a147faa5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1587,8 +1587,10 @@
>  			registers.  Default set by CONFIG_HPET_MMAP_DEFAULT.
>  
>  	hugetlb_cma=	[HW,CMA] The size of a CMA area used for allocation
> -			of gigantic hugepages.
> -			Format: nn[KMGTPE]
> +			of gigantic hugepages. Or using node format, the size
> +			of a CMA area per node can be specified.
> +			Format: nn[KMGTPE] or (node format)
> +				<node>:nn[KMGTPE][,<node>:nn[KMGTPE]]
>  
>  			Reserve a CMA area of given size and allocate gigantic
>  			hugepages using the CMA allocator. If enabled, the
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6d2f4c2..ac9afc2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -50,6 +50,7 @@
>  
>  #ifdef CONFIG_CMA
>  static struct cma *hugetlb_cma[MAX_NUMNODES];
> +static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
>  static bool hugetlb_cma_page(struct page *page, unsigned int order)
>  {
>  	return cma_pages_valid(hugetlb_cma[page_to_nid(page)], page,
> @@ -62,6 +63,7 @@ static bool hugetlb_cma_page(struct page *page, unsigned int order)
>  }
>  #endif
>  static unsigned long hugetlb_cma_size __initdata;
> +static nodemask_t hugetlb_cma_nodes_allowed = NODE_MASK_NONE;
>  
>  /*
>   * Minimum page order among possible hugepage sizes, set to a proper value
> @@ -3508,7 +3510,16 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
>  		/*
>  		 * Node specific request.  count adjustment happens in
>  		 * set_max_huge_pages() after acquiring hugetlb_lock.
> +		 *
> +		 * If we've specified the size of CMA area per node for
> +		 * gigantic hugepages, should catch the warning if the
> +		 * nid is not in the 'hugetlb_cma_nodes_allowed' nodemask.
>  		 */
> +		if (hstate_is_gigantic(h) &&
> +		    !nodes_empty(hugetlb_cma_nodes_allowed) &&
> +		    !node_isset(nid, hugetlb_cma_nodes_allowed))
> +			pr_warn("hugetlb_cma: no reservation on this node %d\n", nid);
> +

I would prefer to drop this code and hugetlb_cma_nodes_allowed.  Why?

CMA is an alternative allocation mechanism for gigantic pages.  The
allocator will fall back to the normal allocator (alloc_contig_pages) if
allocation from CMA fails.

This warning implies that the user 'forgot' to reserve CMA on the
specified node, or is perhaps allocating gigantic pages on the wrong
node.  We can not be sure this is the case.

I agree that in most cases when a user requests node specific CMA
reservations, they will likely want to perform gigantic page allocations
on the same nodes.  However, that may not always be the case and in such
cases the warning could be confusing.

We do not print warnings today when allocating huge pages via the
proc/sysfs interfaces.  We should not add one unless there is a very
good reason.

>  		init_nodemask_of_node(&nodes_allowed, nid);
>  		n_mask = &nodes_allowed;
>  	}
> @@ -6745,7 +6756,38 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
>  
>  static int __init cmdline_parse_hugetlb_cma(char *p)
>  {
> -	hugetlb_cma_size = memparse(p, &p);
> +	int nid, count = 0;
> +	unsigned long tmp;
> +	char *s = p;
> +
> +	while (*s) {
> +		if (sscanf(s, "%lu%n", &tmp, &count) != 1)
> +			break;
> +
> +		if (s[count] == ':') {
> +			nid = tmp;
> +			if (nid < 0 || nid >= MAX_NUMNODES)
> +				break;
> +
> +			s += count + 1;
> +			tmp = memparse(s, &s);
> +			hugetlb_cma_size_in_node[nid] = tmp;
> +			hugetlb_cma_size += tmp;
> +
> +			/*
> +			 * Skip the separator if have one, otherwise
> +			 * break the parsing.
> +			 */
> +			if (*s == ',')
> +				s++;
> +			else
> +				break;
> +		} else {
> +			hugetlb_cma_size = memparse(p, &p);
> +			break;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
> @@ -6754,6 +6796,7 @@ static int __init cmdline_parse_hugetlb_cma(char *p)
>  void __init hugetlb_cma_reserve(int order)
>  {
>  	unsigned long size, reserved, per_node;
> +	bool node_specific_cma_alloc = false;
>  	int nid;
>  
>  	cma_reserve_called = true;
> @@ -6761,26 +6804,61 @@ void __init hugetlb_cma_reserve(int order)
>  	if (!hugetlb_cma_size)
>  		return;
>  
> +	for (nid = 0; nid < MAX_NUMNODES; nid++) {
> +		if (hugetlb_cma_size_in_node[nid] == 0)
> +			continue;
> +
> +		if (!node_state(nid, N_ONLINE)) {
> +			pr_warn("hugetlb_cma: invalid node %d specified\n", nid);
> +			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
> +			hugetlb_cma_size_in_node[nid] = 0;
> +			continue;
> +		}
> +
> +		if (hugetlb_cma_size_in_node[nid] < (PAGE_SIZE << order)) {
> +			pr_warn("hugetlb_cma: cma area of node %d should be at least %lu MiB\n",
> +				nid, (PAGE_SIZE << order) / SZ_1M);
> +			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
> +			hugetlb_cma_size_in_node[nid] = 0;
> +		} else {
> +			node_specific_cma_alloc = true;
> +		}
> +	}
> +
> +	/* Validate the CMA size again in case some invalid nodes specified. */
> +	if (!hugetlb_cma_size)
> +		return;
> +
>  	if (hugetlb_cma_size < (PAGE_SIZE << order)) {
>  		pr_warn("hugetlb_cma: cma area should be at least %lu MiB\n",
>  			(PAGE_SIZE << order) / SZ_1M);
>  		return;
>  	}

The series "hugetlb: add demote/split page functionality"
https://lore.kernel.org/linux-mm/20211007181918.136982-1-mike.kravetz@oracle.com/T/#mcb25f5edaa235b93dd0d0b8fb81ba15f0317feeb
is in Andrew's tree and has modified the above to set hugetlb_cma_size
to 0 before returning.

Code in that series uses the varialbe hugetlb_cma_size to determine if
CMA was reserved and can possibly be used for huge pages.  If no CMA is
reserved in this routine, it must be set to 0.

The code below should be fine as it checks 'reserved' at the end of
routine and sets hugetlb_cma_size to zero if !reserved before returning.

Mostly wanted to point out the context conflict with Andrew's tree.  He
or you will need to fix this for the patch to apply.
-- 
Mike Kravetz

>  
> -	/*
> -	 * If 3 GB area is requested on a machine with 4 numa nodes,
> -	 * let's allocate 1 GB on first three nodes and ignore the last one.
> -	 */
> -	per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes);
> -	pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n",
> -		hugetlb_cma_size / SZ_1M, per_node / SZ_1M);
> +	if (!node_specific_cma_alloc) {
> +		/*
> +		 * If 3 GB area is requested on a machine with 4 numa nodes,
> +		 * let's allocate 1 GB on first three nodes and ignore the last one.
> +		 */
> +		per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes);
> +		pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n",
> +			hugetlb_cma_size / SZ_1M, per_node / SZ_1M);
> +	}
>  
>  	reserved = 0;
>  	for_each_node_state(nid, N_ONLINE) {
>  		int res;
>  		char name[CMA_MAX_NAME];
>  
> -		size = min(per_node, hugetlb_cma_size - reserved);
> +		if (node_specific_cma_alloc) {
> +			if (hugetlb_cma_size_in_node[nid] == 0)
> +				continue;
> +
> +			size = hugetlb_cma_size_in_node[nid];
> +		} else {
> +			size = min(per_node, hugetlb_cma_size - reserved);
> +		}
> +
>  		size = round_up(size, PAGE_SIZE << order);
>  
>  		snprintf(name, sizeof(name), "hugetlb%d", nid);
> @@ -6799,6 +6877,8 @@ void __init hugetlb_cma_reserve(int order)
>  			continue;
>  		}
>  
> +		if (node_specific_cma_alloc)
> +			node_set(nid, hugetlb_cma_nodes_allowed);
>  		reserved += size;
>  		pr_info("hugetlb_cma: reserved %lu MiB on node %d\n",
>  			size / SZ_1M, nid);
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] hugetlb: Support node specified when using cma for gigantic hugepages
  2021-10-14 21:48 ` Mike Kravetz
@ 2021-10-15  1:15   ` Baolin Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Baolin Wang @ 2021-10-15  1:15 UTC (permalink / raw)
  To: Mike Kravetz, akpm
  Cc: mhocko, guro, corbet, yaozhenguo1, linux-mm, linux-kernel, linux-doc



On 2021/10/15 5:48, Mike Kravetz wrote:
> On 10/13/21 11:08 PM, Baolin Wang wrote:
>> Now the size of CMA area for gigantic hugepages runtime allocation is
>> balanced for all online nodes, but we also want to specify the size of
>> CMA per-node, or only one node in some cases, which are similar with
>> commit 86acc55c3d32 ("hugetlbfs: extend the definition of hugepages
>> parameter to support node allocation")[1].
> 
> I would not include the commit hash here.  IIUC, this can change as it
> is moved to Linus' tree in the next merge window.

Sure.

> 
>> For example, on some multi-nodes systems, each node's memory can be
>> different, allocating the same size of CMA for each node is not suitable
>> for the low-memory nodes. Meanwhile some workloads like DPDK mentioned by
>> Zhenguo in patch [1] only need hugepages in one node.
>>
>> On the other hand, we have some machines with multiple types of memory,
>> like DRAM and PMEM (persistent memory). On this system, we may want to
>> specify all the hugepages only on DRAM node, or specify the proportion
>> of DRAM node and PMEM node, to tuning the performance of the workloads.
>>
>> Thus this patch adds node format for 'hugetlb_cma' parameter to support
>> specifying the size of CMA per-node. An example is as follows:
>>
>> hugetlb_cma=0:5G,2:5G
>>
>> which means allocating 5G size of CMA area on node 0 and node 2
>> respectively. And the users should use the node specific sysfs file to
>> allocate the gigantic hugepages if specified the CMA size on that node.
>>
>> [1]
>> https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> Changes from v1:
>>   - Update the commit log.
>>   - Avoid changing the behavior for 'balanced' gigantic huge page pool
>>   allocations.
>>   - Catch the invalid node specified in hugetlb_cma_reserve().
>>   - Validate the size of CMA for each node in hugetlb_cma_reserve().
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt |  6 +-
>>   mm/hugetlb.c                                    | 98 ++++++++++++++++++++++---
>>   2 files changed, 93 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 3ad8e9d0..a147faa5 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1587,8 +1587,10 @@
>>   			registers.  Default set by CONFIG_HPET_MMAP_DEFAULT.
>>   
>>   	hugetlb_cma=	[HW,CMA] The size of a CMA area used for allocation
>> -			of gigantic hugepages.
>> -			Format: nn[KMGTPE]
>> +			of gigantic hugepages. Or using node format, the size
>> +			of a CMA area per node can be specified.
>> +			Format: nn[KMGTPE] or (node format)
>> +				<node>:nn[KMGTPE][,<node>:nn[KMGTPE]]
>>   
>>   			Reserve a CMA area of given size and allocate gigantic
>>   			hugepages using the CMA allocator. If enabled, the
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 6d2f4c2..ac9afc2 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -50,6 +50,7 @@
>>   
>>   #ifdef CONFIG_CMA
>>   static struct cma *hugetlb_cma[MAX_NUMNODES];
>> +static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
>>   static bool hugetlb_cma_page(struct page *page, unsigned int order)
>>   {
>>   	return cma_pages_valid(hugetlb_cma[page_to_nid(page)], page,
>> @@ -62,6 +63,7 @@ static bool hugetlb_cma_page(struct page *page, unsigned int order)
>>   }
>>   #endif
>>   static unsigned long hugetlb_cma_size __initdata;
>> +static nodemask_t hugetlb_cma_nodes_allowed = NODE_MASK_NONE;
>>   
>>   /*
>>    * Minimum page order among possible hugepage sizes, set to a proper value
>> @@ -3508,7 +3510,16 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
>>   		/*
>>   		 * Node specific request.  count adjustment happens in
>>   		 * set_max_huge_pages() after acquiring hugetlb_lock.
>> +		 *
>> +		 * If we've specified the size of CMA area per node for
>> +		 * gigantic hugepages, should catch the warning if the
>> +		 * nid is not in the 'hugetlb_cma_nodes_allowed' nodemask.
>>   		 */
>> +		if (hstate_is_gigantic(h) &&
>> +		    !nodes_empty(hugetlb_cma_nodes_allowed) &&
>> +		    !node_isset(nid, hugetlb_cma_nodes_allowed))
>> +			pr_warn("hugetlb_cma: no reservation on this node %d\n", nid);
>> +
> 
> I would prefer to drop this code and hugetlb_cma_nodes_allowed.  Why?
> 
> CMA is an alternative allocation mechanism for gigantic pages.  The
> allocator will fall back to the normal allocator (alloc_contig_pages) if
> allocation from CMA fails.

Yes.

> This warning implies that the user 'forgot' to reserve CMA on the
> specified node, or is perhaps allocating gigantic pages on the wrong
> node.  We can not be sure this is the case.
> 
> I agree that in most cases when a user requests node specific CMA
> reservations, they will likely want to perform gigantic page allocations
> on the same nodes.  However, that may not always be the case and in such
> cases the warning could be confusing.
> 
> We do not print warnings today when allocating huge pages via the
> proc/sysfs interfaces.  We should not add one unless there is a very
> good reason.

OK. Will remove this in next version.

> 
>>   		init_nodemask_of_node(&nodes_allowed, nid);
>>   		n_mask = &nodes_allowed;
>>   	}
>> @@ -6745,7 +6756,38 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
>>   
>>   static int __init cmdline_parse_hugetlb_cma(char *p)
>>   {
>> -	hugetlb_cma_size = memparse(p, &p);
>> +	int nid, count = 0;
>> +	unsigned long tmp;
>> +	char *s = p;
>> +
>> +	while (*s) {
>> +		if (sscanf(s, "%lu%n", &tmp, &count) != 1)
>> +			break;
>> +
>> +		if (s[count] == ':') {
>> +			nid = tmp;
>> +			if (nid < 0 || nid >= MAX_NUMNODES)
>> +				break;
>> +
>> +			s += count + 1;
>> +			tmp = memparse(s, &s);
>> +			hugetlb_cma_size_in_node[nid] = tmp;
>> +			hugetlb_cma_size += tmp;
>> +
>> +			/*
>> +			 * Skip the separator if have one, otherwise
>> +			 * break the parsing.
>> +			 */
>> +			if (*s == ',')
>> +				s++;
>> +			else
>> +				break;
>> +		} else {
>> +			hugetlb_cma_size = memparse(p, &p);
>> +			break;
>> +		}
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> @@ -6754,6 +6796,7 @@ static int __init cmdline_parse_hugetlb_cma(char *p)
>>   void __init hugetlb_cma_reserve(int order)
>>   {
>>   	unsigned long size, reserved, per_node;
>> +	bool node_specific_cma_alloc = false;
>>   	int nid;
>>   
>>   	cma_reserve_called = true;
>> @@ -6761,26 +6804,61 @@ void __init hugetlb_cma_reserve(int order)
>>   	if (!hugetlb_cma_size)
>>   		return;
>>   
>> +	for (nid = 0; nid < MAX_NUMNODES; nid++) {
>> +		if (hugetlb_cma_size_in_node[nid] == 0)
>> +			continue;
>> +
>> +		if (!node_state(nid, N_ONLINE)) {
>> +			pr_warn("hugetlb_cma: invalid node %d specified\n", nid);
>> +			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
>> +			hugetlb_cma_size_in_node[nid] = 0;
>> +			continue;
>> +		}
>> +
>> +		if (hugetlb_cma_size_in_node[nid] < (PAGE_SIZE << order)) {
>> +			pr_warn("hugetlb_cma: cma area of node %d should be at least %lu MiB\n",
>> +				nid, (PAGE_SIZE << order) / SZ_1M);
>> +			hugetlb_cma_size -= hugetlb_cma_size_in_node[nid];
>> +			hugetlb_cma_size_in_node[nid] = 0;
>> +		} else {
>> +			node_specific_cma_alloc = true;
>> +		}
>> +	}
>> +
>> +	/* Validate the CMA size again in case some invalid nodes specified. */
>> +	if (!hugetlb_cma_size)
>> +		return;
>> +
>>   	if (hugetlb_cma_size < (PAGE_SIZE << order)) {
>>   		pr_warn("hugetlb_cma: cma area should be at least %lu MiB\n",
>>   			(PAGE_SIZE << order) / SZ_1M);
>>   		return;
>>   	}
> 
> The series "hugetlb: add demote/split page functionality"
> https://lore.kernel.org/linux-mm/20211007181918.136982-1-mike.kravetz@oracle.com/T/#mcb25f5edaa235b93dd0d0b8fb81ba15f0317feeb
> is in Andrew's tree and has modified the above to set hugetlb_cma_size
> to 0 before returning.
> 
> Code in that series uses the varialbe hugetlb_cma_size to determine if
> CMA was reserved and can possibly be used for huge pages.  If no CMA is
> reserved in this routine, it must be set to 0.
> 
> The code below should be fine as it checks 'reserved' at the end of
> routine and sets hugetlb_cma_size to zero if !reserved before returning.
> 
> Mostly wanted to point out the context conflict with Andrew's tree.  He
> or you will need to fix this for the patch to apply.

Thanks, I will rebase the code in next version.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-15  1:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14  6:08 [PATCH v2] hugetlb: Support node specified when using cma for gigantic hugepages Baolin Wang
2021-10-14 21:48 ` Mike Kravetz
2021-10-15  1:15   ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).