From: Andreas Schaufler <andreas.schaufler@gmx.de>
To: Mike Kravetz <mike.kravetz@oracle.com>, Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, kernel-team@fb.com,
linux-kernel@vger.kernel.org, Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma
Date: Tue, 10 Mar 2020 19:54:58 +0100 [thread overview]
Message-ID: <1b5ada08-1baf-4f2d-3ecd-b940e7ee0e0d@gmx.de> (raw)
In-Reply-To: <4b78a8a9-7b5a-eb62-acaa-2677e615bea1@oracle.com>
Hi all,
sorry for bumping into this discussion, but this is exactly the feature
I am currently looking for on ARM64. You may browse under
linux-arm-msm@vger.kernel.org for a message I posted there recently.
I read somewhere in this thread that it should be easily portable to
other architectures than x86. I am ready to give it a try, but I would
need a few pointers where to hook in in the ARM64 code.
Thanks and best regards
Andreas
On 10.03.2020 19:33, Mike Kravetz wrote:
> On 3/10/20 11:05 AM, Roman Gushchin wrote:
>> On Tue, Mar 10, 2020 at 10:27:01AM -0700, Mike Kravetz wrote:
>>> On 3/9/20 5:25 PM, Roman Gushchin wrote:
>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>> index a74262c71484..ceeb06ddfd41 100644
>>>> --- a/arch/x86/kernel/setup.c
>>>> +++ b/arch/x86/kernel/setup.c
>>>> @@ -16,6 +16,7 @@
>>>> #include <linux/pci.h>
>>>> #include <linux/root_dev.h>
>>>> #include <linux/sfi.h>
>>>> +#include <linux/hugetlb.h>
>>>> #include <linux/tboot.h>
>>>> #include <linux/usb/xhci-dbgp.h>
>>>>
>>>> @@ -1158,6 +1159,8 @@ void __init setup_arch(char **cmdline_p)
>>>> initmem_init();
>>>> dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);
>>>>
>>>> + hugetlb_cma_reserve();
>>>> +
>>>
>>> I know this is called from arch specific code here to fit in with the timing
>>> of CMA setup/reservation calls. However, there really is nothing architecture
>>> specific about this functionality. It would be great IMO if we could make
>>> this architecture independent. However, I can not think of a straight forward
>>> way to do this.
>>
>> I agree. Unfortunately I have no better idea than having an arch-dependent hook.
>>
>>>
>>>> /*
>>>> * Reserve memory for crash kernel after SRAT is parsed so that it
>>>> * won't consume hotpluggable memory.
>>> <snip>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> <snip>
>>>> +void __init hugetlb_cma_reserve(void)
>>>> +{
>>>> + unsigned long totalpages = 0;
>>>> + unsigned long start_pfn, end_pfn;
>>>> + phys_addr_t size;
>>>> + int nid, i, res;
>>>> +
>>>> + if (!hugetlb_cma_size && !hugetlb_cma_percent)
>>>> + return;
>>>> +
>>>> + if (hugetlb_cma_percent) {
>>>> + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
>>>> + NULL)
>>>> + totalpages += end_pfn - start_pfn;
>>>> +
>>>> + size = PAGE_SIZE * (hugetlb_cma_percent * 100 * totalpages) /
>>>> + 10000UL;
>>>> + } else {
>>>> + size = hugetlb_cma_size;
>>>> + }
>>>> +
>>>> + pr_info("hugetlb_cma: reserve %llu, %llu per node\n", size,
>>>> + size / nr_online_nodes);
>>>> +
>>>> + size /= nr_online_nodes;
>>>> +
>>>> + for_each_node_state(nid, N_ONLINE) {
>>>> + unsigned long min_pfn = 0, max_pfn = 0;
>>>> +
>>>> + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
>>>> + if (!min_pfn)
>>>> + min_pfn = start_pfn;
>>>> + max_pfn = end_pfn;
>>>> + }
>>>> +
>>>> + res = cma_declare_contiguous(PFN_PHYS(min_pfn), size,
>>>> + PFN_PHYS(max_pfn), (1UL << 30),
>>>
>>> The alignment is hard coded for x86 gigantic page size. If this supports
>>> more architectures or becomes arch independent we will need to determine
>>> what this alignment should be. Perhaps an arch specific call back to get
>>> the alignment for gigantic pages. That will require a little thought as
>>> some arch's support multiple gigantic page sizes.
>>
>> Good point!
>> Should we take the biggest possible size as a reference?
>> Or the smallest (larger than MAX_ORDER)?
>
> As mentioned, it is pretty simple for architectures like x86 that only
> have one gigantic page size. Just a random thought, but since
> hugetlb_cma_reserve is called from arch specific code perhaps the arch
> code could pass in at least alignment as an argument to this function?
> That way we can somewhat push the issue to the architectures. For example,
> power supports 16GB gigantic page size but I believe they are allocated
> via firmware somehow. So, they would not pass 16G as alignment. In this
> case setup of the CMA area is somewhat architecture dependent. So, perhaps
> the call from arch specific code is not such a bad idea.
>
> With that in mind, we may want some type of coordination between arch
> specific and independent code. Right now, cmdline_parse_hugetlb_cma
> is will accept a hugetlb_cma command line option without complaint even
> if the architecture does not call hugetlb_cma_reserve.
>
> Just a nit, but cma_declare_contiguous if going to round up size by alignment. So, the actual reserved size may not match what is printed with,
> + pr_info("hugetlb_cma: successfully reserved %llu on node %d\n",
> + size, nid);
>
> I found this out by testing code and specifying hugetlb_cma=2M. Messages
> in log were:
> kernel: hugetlb_cma: reserve 2097152, 1048576 per node
> kernel: hugetlb_cma: successfully reserved 1048576 on node 0
> kernel: hugetlb_cma: successfully reserved 1048576 on node 1
> But, it really reserved 1GB per node.
>
next prev parent reply other threads:[~2020-03-10 18:55 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-10 0:25 [PATCH v2] mm: hugetlb: optionally allocate gigantic hugepages using cma Roman Gushchin
2020-03-10 0:30 ` Roman Gushchin
2020-03-10 8:45 ` Michal Hocko
2020-03-10 17:25 ` Roman Gushchin
2020-03-10 17:37 ` Michal Hocko
2020-03-16 1:08 ` Rik van Riel
2020-03-10 17:38 ` Mike Kravetz
2020-03-10 17:42 ` Michal Hocko
2020-03-10 9:01 ` Michal Hocko
2020-03-10 17:30 ` Roman Gushchin
2020-03-10 17:39 ` Michal Hocko
2020-03-10 17:58 ` Roman Gushchin
2020-03-10 17:27 ` Mike Kravetz
2020-03-10 18:05 ` Roman Gushchin
2020-03-10 18:22 ` Rik van Riel
2020-03-10 18:33 ` Mike Kravetz
2020-03-10 18:54 ` Andreas Schaufler [this message]
2020-03-10 18:56 ` Roman Gushchin
2020-03-10 19:00 ` Andreas Schaufler
2020-03-10 19:19 ` Roman Gushchin
2020-03-10 19:36 ` Michal Hocko
2020-03-10 19:46 ` Rik van Riel
2020-03-10 20:11 ` Mike Kravetz
2020-03-10 20:15 ` Rik van Riel
2020-03-10 20:29 ` Mike Kravetz
2020-03-10 20:38 ` Rik van Riel
2020-03-10 20:29 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b5ada08-1baf-4f2d-3ecd-b940e7ee0e0d@gmx.de \
--to=andreas.schaufler@gmx.de \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).