From: Hushijie <hushijie3@huawei.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "will@kernel.org" <will@kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Nixiaoming <nixiaoming@huawei.com>,
"wangxu (AE)" <wangxu72@huawei.com>,
"Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@huawei.com>,
yangerkun <yangerkun@huawei.com>,
"Wangle (RTOS FAE)" <wangle6@huawei.com>,
"Chengang (L)" <cg.chen@huawei.com>,
"Chenjie (K)" <chenjie6@huawei.com>,
"Huangjianhui (Alex)" <alex.huangjianhui@huawei.com>
Subject: Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
Date: Sat, 16 May 2020 07:47:35 +0000 [thread overview]
Message-ID: <87d618c778584d2386c0f0c81be6319b@huawei.com> (raw)
>On 5/14/20 7:31 AM, Shijie Hu wrote:
>> Here is a final patch to solve that hugetlb_get_unmapped_area() can't
>> get unmapped area below mmap base for huge pages based on a few previous
>> discussions and patches from me.
>>
>> I'm so sorry. When sending v2 and v3 patches, I forget to cc:
>> linux-mm@kvack.org and linux-kernel@vger.kernel.org. No records of these
>> two patches found on the https://lkml.org/lkml/.
>>
>> Patch V1: https://lkml.org/lkml/2020/4/27/440
>> Patch V4: https://lkml.org/lkml/2020/5/13/980
>>
>> Changes in V2:
>> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
>> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
>> changing core code.
>> * Add mmap_is_legacy() function, and only fall back to the bottom-up
>> function on no-legacy mode.
>>
>> Changes in V3:
>> * Add *bottomup() and *topdown() two function, and check if
>> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
>> checking mmap_is_legacy() to determine which function should be used.
>>
>> Changes in V4:
>> * Follow the suggestions of Will and Mike, move back this patch to core
>> code.
>>
>> Changes in V5:
>> * Fix kbuild test error.
>>
>> ------
>>
>> In a 32-bit program, running on arm64 architecture. When the address
>> space below mmap base is completely exhausted, shmat() for huge pages
>> will return ENOMEM, but shmat() for normal pages can still success on
>> no-legacy mode. This seems not fair.
>>
>> For normal pages, get_unmapped_area() calling flows are:
>> => mm->get_unmapped_area()
>> if on legacy mode,
>> => arch_get_unmapped_area()
>> => vm_unmapped_area()
>> if on no-legacy mode,
>> => arch_get_unmapped_area_topdown()
>> => vm_unmapped_area()
>>
>> For huge pages, get_unmapped_area() calling flows are:
>> => file->f_op->get_unmapped_area()
>> => hugetlb_get_unmapped_area()
>> => vm_unmapped_area()
>>
>> To solve this issue, we only need to make hugetlb_get_unmapped_area() take
>> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
>> two functions, and check current mm->get_unmapped_area() to decide which
>> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
>> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
>> routine.
>>
>> Signed-off-by: Shijie Hu <hushijie3@huawei.com>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> ---
>> fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 54 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 991c60c7ffe0..61418380f492 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -38,6 +38,7 @@
>> #include <linux/uio.h>
>>
>> #include <linux/uaccess.h>
>> +#include <linux/sched/mm.h>
>>
>> static const struct super_operations hugetlbfs_ops;
>> static const struct address_space_operations hugetlbfs_aops;
>> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>
>> #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>> static unsigned long
>> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
>> + unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> + struct hstate *h = hstate_file(file);
>> + struct vm_unmapped_area_info info;
>> +
>> + info.flags = 0;
>> + info.length = len;
>> + info.low_limit = current->mm->mmap_base;
>> + info.high_limit = TASK_SIZE;
>> + info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> + info.align_offset = 0;
>> + return vm_unmapped_area(&info);
>> +}
>> +
>> +static unsigned long
>> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
>> + unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> + struct hstate *h = hstate_file(file);
>> + struct vm_unmapped_area_info info;
>> +
>> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> + info.length = len;
>> + info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>> + info.high_limit = current->mm->mmap_base;
>> + info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> + info.align_offset = 0;
>> + addr = vm_unmapped_area(&info);
>> +
>> + /*
>> + * A failed mmap() very likely causes application failure,
>> + * so fall back to the bottom-up function here. This scenario
>> + * can happen with large stack limits and large mmap()
>> + * allocations.
>> + */
>> + if (unlikely(offset_in_page(addr))) {
>> + VM_BUG_ON(addr != -ENOMEM);
>> + info.flags = 0;
>> + info.low_limit = current->mm->mmap_base;
>> + info.high_limit = TASK_SIZE;
>> + addr = vm_unmapped_area(&info);
>> + }
>> +
>> + return addr;
>> +}
>> +
>> +static unsigned long
>> hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> unsigned long len, unsigned long pgoff, unsigned long flags)
>> {
>> struct mm_struct *mm = current->mm;
>> struct vm_area_struct *vma;
>> struct hstate *h = hstate_file(file);
>> - struct vm_unmapped_area_info info;
>>
>> if (len & ~huge_page_mask(h))
>> return -EINVAL;
>> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> return addr;
>> }
>>
>> - info.flags = 0;
>> - info.length = len;
>> - info.low_limit = TASK_UNMAPPED_BASE;
>> - info.high_limit = TASK_SIZE;
>> - info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> - info.align_offset = 0;
>> - return vm_unmapped_area(&info);
>> + if (mm->get_unmapped_area == arch_get_unmapped_area)
>> + return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>> + pgoff, flags);
>> + return hugetlb_get_unmapped_area_topdown(file, addr, len,
>> + pgoff, flags);
>
>I like this code using the value of mm->get_unmapped_area to determine
>which routine to call. It is used by a few architectures. However, I
>noticed that on at least one architecture (powerpc) mm->get_unmapped_area
>may be assigned to routines other than arch_get_unmapped_area or
>arch_get_unmapped_area_topdown. In such a case, we would call the 'new'
>topdown routine. I would prefer that we call the bottomup routine in this
>default case.
>
>In reality, this does not impact powerpc as that architecture has it's
>own hugetlb_get_unmapped_area routine.
>
Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*()
when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns
true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This
patch actually has no effect on powerpc.
>Because of this, I suggest we add a comment above this code and switch
>the if/else order. For example,
>
>+ /*
>+ * Use mm->get_unmapped_area value as a hint to use topdown routine.
>+ * If architectures have special needs, they should define their own
>+ * version of hugetlb_get_unmapped_area.
>+ */
>+ if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>+ return hugetlb_get_unmapped_area_topdown(file, addr, len,
>+ pgoff, flags);
>+ return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>+ pgoff, flags);
>
>Thoughts?
>--
>Mike Kravetz
>
I agree with you. It's clever to switch the if/else order. If there is such
a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor
arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the
bottomup routine as the default behavior.
May I put this code and comment you show above into patch v6 and add
"Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>" to it?
--
Shijie Hu
>
>> }
>> #endif
>>
>>
next reply other threads:[~2020-05-16 7:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-16 7:47 Hushijie [this message]
2020-05-18 4:16 ` [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs Mike Kravetz
-- strict thread matches above, loose matches on Subject: below --
2020-05-18 6:23 Hushijie
2020-05-14 14:31 Shijie Hu
2020-05-15 18:44 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d618c778584d2386c0f0c81be6319b@huawei.com \
--to=hushijie3@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alex.huangjianhui@huawei.com \
--cc=cg.chen@huawei.com \
--cc=chenjie6@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nixiaoming@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=wangle6@huawei.com \
--cc=wangxu72@huawei.com \
--cc=will@kernel.org \
--cc=yangerkun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).