linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma
Date: Tue, 10 Mar 2020 12:19:06 -0700	[thread overview]
Message-ID: <20200310191906.GA96999@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <4b78a8a9-7b5a-eb62-acaa-2677e615bea1@oracle.com>

On Tue, Mar 10, 2020 at 11:33:47AM -0700, Mike Kravetz wrote:
> On 3/10/20 11:05 AM, Roman Gushchin wrote:
> > On Tue, Mar 10, 2020 at 10:27:01AM -0700, Mike Kravetz wrote:
> >> On 3/9/20 5:25 PM, Roman Gushchin wrote:
> >>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >>> index a74262c71484..ceeb06ddfd41 100644
> >>> --- a/arch/x86/kernel/setup.c
> >>> +++ b/arch/x86/kernel/setup.c
> >>> @@ -16,6 +16,7 @@
> >>>  #include <linux/pci.h>
> >>>  #include <linux/root_dev.h>
> >>>  #include <linux/sfi.h>
> >>> +#include <linux/hugetlb.h>
> >>>  #include <linux/tboot.h>
> >>>  #include <linux/usb/xhci-dbgp.h>
> >>>  
> >>> @@ -1158,6 +1159,8 @@ void __init setup_arch(char **cmdline_p)
> >>>  	initmem_init();
> >>>  	dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);
> >>>  
> >>> +	hugetlb_cma_reserve();
> >>> +
> >>
> >> I know this is called from arch specific code here to fit in with the timing
> >> of CMA setup/reservation calls.  However, there really is nothing architecture
> >> specific about this functionality.  It would be great IMO if we could make
> >> this architecture independent.  However, I can not think of a straight forward
> >> way to do this.
> > 
> > I agree. Unfortunately I have no better idea than having an arch-dependent hook.
> > 
> >>
> >>>  	/*
> >>>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
> >>>  	 * won't consume hotpluggable memory.
> >> <snip>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> <snip>
> >>> +void __init hugetlb_cma_reserve(void)
> >>> +{
> >>> +	unsigned long totalpages = 0;
> >>> +	unsigned long start_pfn, end_pfn;
> >>> +	phys_addr_t size;
> >>> +	int nid, i, res;
> >>> +
> >>> +	if (!hugetlb_cma_size && !hugetlb_cma_percent)
> >>> +		return;
> >>> +
> >>> +	if (hugetlb_cma_percent) {
> >>> +		for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
> >>> +				       NULL)
> >>> +			totalpages += end_pfn - start_pfn;
> >>> +
> >>> +		size = PAGE_SIZE * (hugetlb_cma_percent * 100 * totalpages) /
> >>> +			10000UL;
> >>> +	} else {
> >>> +		size = hugetlb_cma_size;
> >>> +	}
> >>> +
> >>> +	pr_info("hugetlb_cma: reserve %llu, %llu per node\n", size,
> >>> +		size / nr_online_nodes);
> >>> +
> >>> +	size /= nr_online_nodes;
> >>> +
> >>> +	for_each_node_state(nid, N_ONLINE) {
> >>> +		unsigned long min_pfn = 0, max_pfn = 0;
> >>> +
> >>> +		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> >>> +			if (!min_pfn)
> >>> +				min_pfn = start_pfn;
> >>> +			max_pfn = end_pfn;
> >>> +		}
> >>> +
> >>> +		res = cma_declare_contiguous(PFN_PHYS(min_pfn), size,
> >>> +					     PFN_PHYS(max_pfn), (1UL << 30),
> >>
> >> The alignment is hard coded for x86 gigantic page size.  If this supports
> >> more architectures or becomes arch independent we will need to determine
> >> what this alignment should be.  Perhaps an arch specific call back to get
> >> the alignment for gigantic pages.  That will require a little thought as
> >> some arch's support multiple gigantic page sizes.
> > 
> > Good point!
> > Should we take the biggest possible size as a reference?
> > Or the smallest (larger than MAX_ORDER)?
> 
> As mentioned, it is pretty simple for architectures like x86 that only
> have one gigantic page size.  Just a random thought, but since
> hugetlb_cma_reserve is called from arch specific code perhaps the arch
> code could pass in at least alignment as an argument to this function?
> That way we can somewhat push the issue to the architectures.  For example,
> power supports 16GB gigantic page size but I believe they are allocated
> via firmware somehow.  So, they would not pass 16G as alignment.  In this
> case setup of the CMA area is somewhat architecture dependent.  So, perhaps
> the call from arch specific code is not such a bad idea.
> 
> With that in mind, we may want some type of coordination between arch
> specific and independent code.  Right now, cmdline_parse_hugetlb_cma
> is will accept a hugetlb_cma command line option without complaint even
> if the architecture does not call hugetlb_cma_reserve.
> 
> Just a nit, but cma_declare_contiguous if going to round up size by alignment.  So, the actual reserved size may not match what is printed with,
> +		pr_info("hugetlb_cma: successfully reserved %llu on node %d\n",
> +			size, nid);
> 
> I found this out by testing code and specifying hugetlb_cma=2M.  Messages
> in log were:
> 	kernel: hugetlb_cma: reserve 2097152, 1048576 per node
> 	kernel: hugetlb_cma: successfully reserved 1048576 on node 0
> 	kernel: hugetlb_cma: successfully reserved 1048576 on node 1
> But, it really reserved 1GB per node.

Good point! In the passed size is too small to cover a single huge page,
we should probably print a warning and bail out.

Will fix in the next version.

Thanks!


  parent reply	other threads:[~2020-03-10 19:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10  0:25 [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma Roman Gushchin
2020-03-10  0:30 ` Roman Gushchin
2020-03-10  8:45 ` Michal Hocko
2020-03-10 17:25   ` Roman Gushchin
2020-03-10 17:37     ` Michal Hocko
2020-03-16  1:08       ` Rik van Riel
2020-03-10 17:38   ` Mike Kravetz
2020-03-10 17:42     ` Michal Hocko
2020-03-10  9:01 ` Michal Hocko
2020-03-10 17:30   ` Roman Gushchin
2020-03-10 17:39     ` Michal Hocko
2020-03-10 17:58       ` Roman Gushchin
2020-03-10 17:27 ` Mike Kravetz
2020-03-10 18:05   ` Roman Gushchin
2020-03-10 18:22     ` Rik van Riel
2020-03-10 18:33     ` Mike Kravetz
2020-03-10 18:54       ` Andreas Schaufler
2020-03-10 18:56         ` Roman Gushchin
2020-03-10 19:00           ` Andreas Schaufler
2020-03-10 19:19       ` Roman Gushchin [this message]
2020-03-10 19:36         ` Michal Hocko
2020-03-10 19:46           ` Rik van Riel
2020-03-10 20:11             ` Mike Kravetz
2020-03-10 20:15               ` Rik van Riel
2020-03-10 20:29                 ` Mike Kravetz
2020-03-10 20:38                   ` Rik van Riel
2020-03-10 20:29                 ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310191906.GA96999@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).