linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org, Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma
Date: Tue, 10 Mar 2020 11:33:47 -0700	[thread overview]
Message-ID: <4b78a8a9-7b5a-eb62-acaa-2677e615bea1@oracle.com> (raw)
In-Reply-To: <20200310180558.GD85000@carbon.dhcp.thefacebook.com>

On 3/10/20 11:05 AM, Roman Gushchin wrote:
> On Tue, Mar 10, 2020 at 10:27:01AM -0700, Mike Kravetz wrote:
>> On 3/9/20 5:25 PM, Roman Gushchin wrote:
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index a74262c71484..ceeb06ddfd41 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -16,6 +16,7 @@
>>>  #include <linux/pci.h>
>>>  #include <linux/root_dev.h>
>>>  #include <linux/sfi.h>
>>> +#include <linux/hugetlb.h>
>>>  #include <linux/tboot.h>
>>>  #include <linux/usb/xhci-dbgp.h>
>>>  
>>> @@ -1158,6 +1159,8 @@ void __init setup_arch(char **cmdline_p)
>>>  	initmem_init();
>>>  	dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);
>>>  
>>> +	hugetlb_cma_reserve();
>>> +
>>
>> I know this is called from arch specific code here to fit in with the timing
>> of CMA setup/reservation calls.  However, there really is nothing architecture
>> specific about this functionality.  It would be great IMO if we could make
>> this architecture independent.  However, I can not think of a straight forward
>> way to do this.
> 
> I agree. Unfortunately I have no better idea than having an arch-dependent hook.
> 
>>
>>>  	/*
>>>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>>>  	 * won't consume hotpluggable memory.
>> <snip>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> <snip>
>>> +void __init hugetlb_cma_reserve(void)
>>> +{
>>> +	unsigned long totalpages = 0;
>>> +	unsigned long start_pfn, end_pfn;
>>> +	phys_addr_t size;
>>> +	int nid, i, res;
>>> +
>>> +	if (!hugetlb_cma_size && !hugetlb_cma_percent)
>>> +		return;
>>> +
>>> +	if (hugetlb_cma_percent) {
>>> +		for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
>>> +				       NULL)
>>> +			totalpages += end_pfn - start_pfn;
>>> +
>>> +		size = PAGE_SIZE * (hugetlb_cma_percent * 100 * totalpages) /
>>> +			10000UL;
>>> +	} else {
>>> +		size = hugetlb_cma_size;
>>> +	}
>>> +
>>> +	pr_info("hugetlb_cma: reserve %llu, %llu per node\n", size,
>>> +		size / nr_online_nodes);
>>> +
>>> +	size /= nr_online_nodes;
>>> +
>>> +	for_each_node_state(nid, N_ONLINE) {
>>> +		unsigned long min_pfn = 0, max_pfn = 0;
>>> +
>>> +		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
>>> +			if (!min_pfn)
>>> +				min_pfn = start_pfn;
>>> +			max_pfn = end_pfn;
>>> +		}
>>> +
>>> +		res = cma_declare_contiguous(PFN_PHYS(min_pfn), size,
>>> +					     PFN_PHYS(max_pfn), (1UL << 30),
>>
>> The alignment is hard coded for x86 gigantic page size.  If this supports
>> more architectures or becomes arch independent we will need to determine
>> what this alignment should be.  Perhaps an arch specific call back to get
>> the alignment for gigantic pages.  That will require a little thought as
>> some arch's support multiple gigantic page sizes.
> 
> Good point!
> Should we take the biggest possible size as a reference?
> Or the smallest (larger than MAX_ORDER)?

As mentioned, it is pretty simple for architectures like x86 that only
have one gigantic page size.  Just a random thought, but since
hugetlb_cma_reserve is called from arch specific code perhaps the arch
code could pass in at least alignment as an argument to this function?
That way we can somewhat push the issue to the architectures.  For example,
power supports 16GB gigantic page size but I believe they are allocated
via firmware somehow.  So, they would not pass 16G as alignment.  In this
case setup of the CMA area is somewhat architecture dependent.  So, perhaps
the call from arch specific code is not such a bad idea.

With that in mind, we may want some type of coordination between arch
specific and independent code.  Right now, cmdline_parse_hugetlb_cma
is will accept a hugetlb_cma command line option without complaint even
if the architecture does not call hugetlb_cma_reserve.

Just a nit, but cma_declare_contiguous if going to round up size by alignment.  So, the actual reserved size may not match what is printed with,
+		pr_info("hugetlb_cma: successfully reserved %llu on node %d\n",
+			size, nid);

I found this out by testing code and specifying hugetlb_cma=2M.  Messages
in log were:
	kernel: hugetlb_cma: reserve 2097152, 1048576 per node
	kernel: hugetlb_cma: successfully reserved 1048576 on node 0
	kernel: hugetlb_cma: successfully reserved 1048576 on node 1
But, it really reserved 1GB per node.
-- 
Mike Kravetz


  parent reply	other threads:[~2020-03-10 18:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10  0:25 [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma Roman Gushchin
2020-03-10  0:30 ` Roman Gushchin
2020-03-10  8:45 ` Michal Hocko
2020-03-10 17:25   ` Roman Gushchin
2020-03-10 17:37     ` Michal Hocko
2020-03-16  1:08       ` Rik van Riel
2020-03-10 17:38   ` Mike Kravetz
2020-03-10 17:42     ` Michal Hocko
2020-03-10  9:01 ` Michal Hocko
2020-03-10 17:30   ` Roman Gushchin
2020-03-10 17:39     ` Michal Hocko
2020-03-10 17:58       ` Roman Gushchin
2020-03-10 17:27 ` Mike Kravetz
2020-03-10 18:05   ` Roman Gushchin
2020-03-10 18:22     ` Rik van Riel
2020-03-10 18:33     ` Mike Kravetz [this message]
2020-03-10 18:54       ` Andreas Schaufler
2020-03-10 18:56         ` Roman Gushchin
2020-03-10 19:00           ` Andreas Schaufler
2020-03-10 19:19       ` Roman Gushchin
2020-03-10 19:36         ` Michal Hocko
2020-03-10 19:46           ` Rik van Riel
2020-03-10 20:11             ` Mike Kravetz
2020-03-10 20:15               ` Rik van Riel
2020-03-10 20:29                 ` Mike Kravetz
2020-03-10 20:38                   ` Rik van Riel
2020-03-10 20:29                 ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b78a8a9-7b5a-eb62-acaa-2677e615bea1@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).