linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
@ 2020-05-14 14:31 Shijie Hu
  2020-05-15 18:44 ` Mike Kravetz
  0 siblings, 1 reply; 5+ messages in thread
From: Shijie Hu @ 2020-05-14 14:31 UTC (permalink / raw)
  To: mike.kravetz
  Cc: will, akpm, linux-mm, linux-kernel, nixiaoming, wangxu72,
	wangkefeng.wang, yangerkun, wangle6, cg.chen, chenjie6,
	alex.huangjianhui

Here is a final patch to solve that hugetlb_get_unmapped_area() can't
get unmapped area below mmap base for huge pages based on a few previous
discussions and patches from me.

I'm so sorry. When sending v2 and v3 patches, I forget to cc:
linux-mm@kvack.org and linux-kernel@vger.kernel.org. No records of these
two patches found on the https://lkml.org/lkml/.

Patch V1: https://lkml.org/lkml/2020/4/27/440
Patch V4: https://lkml.org/lkml/2020/5/13/980

Changes in V2:
* Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
changing core code.
* Add mmap_is_legacy() function, and only fall back to the bottom-up
function on no-legacy mode.

Changes in V3:
* Add *bottomup() and *topdown() two function, and check if
mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
checking mmap_is_legacy() to determine which function should be used.

Changes in V4:
* Follow the suggestions of Will and Mike, move back this patch to core
code.

Changes in V5:
* Fix kbuild test error.

------

In a 32-bit program, running on arm64 architecture. When the address
space below mmap base is completely exhausted, shmat() for huge pages
will return ENOMEM, but shmat() for normal pages can still success on
no-legacy mode. This seems not fair.

For normal pages, get_unmapped_area() calling flows are:
    => mm->get_unmapped_area()
	if on legacy mode,
	    => arch_get_unmapped_area()
			=> vm_unmapped_area()
	if on no-legacy mode,
	    => arch_get_unmapped_area_topdown()
			=> vm_unmapped_area()

For huge pages, get_unmapped_area() calling flows are:
    => file->f_op->get_unmapped_area()
		=> hugetlb_get_unmapped_area()
			=> vm_unmapped_area()

To solve this issue, we only need to make hugetlb_get_unmapped_area() take
the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
two functions, and check current mm->get_unmapped_area() to decide which
one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
routine.

Signed-off-by: Shijie Hu <hushijie3@huawei.com>
Reported-by: kbuild test robot <lkp@intel.com>
---
 fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 8 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 991c60c7ffe0..61418380f492 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -38,6 +38,7 @@
 #include <linux/uio.h>
 
 #include <linux/uaccess.h>
+#include <linux/sched/mm.h>
 
 static const struct super_operations hugetlbfs_ops;
 static const struct address_space_operations hugetlbfs_aops;
@@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 static unsigned long
+hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	struct hstate *h = hstate_file(file);
+	struct vm_unmapped_area_info info;
+
+	info.flags = 0;
+	info.length = len;
+	info.low_limit = current->mm->mmap_base;
+	info.high_limit = TASK_SIZE;
+	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+	info.align_offset = 0;
+	return vm_unmapped_area(&info);
+}
+
+static unsigned long
+hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	struct hstate *h = hstate_file(file);
+	struct vm_unmapped_area_info info;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = current->mm->mmap_base;
+	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+	info.align_offset = 0;
+	addr = vm_unmapped_area(&info);
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	if (unlikely(offset_in_page(addr))) {
+		VM_BUG_ON(addr != -ENOMEM);
+		info.flags = 0;
+		info.low_limit = current->mm->mmap_base;
+		info.high_limit = TASK_SIZE;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
+static unsigned long
 hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	struct hstate *h = hstate_file(file);
-	struct vm_unmapped_area_info info;
 
 	if (len & ~huge_page_mask(h))
 		return -EINVAL;
@@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 			return addr;
 	}
 
-	info.flags = 0;
-	info.length = len;
-	info.low_limit = TASK_UNMAPPED_BASE;
-	info.high_limit = TASK_SIZE;
-	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
-	info.align_offset = 0;
-	return vm_unmapped_area(&info);
+	if (mm->get_unmapped_area == arch_get_unmapped_area)
+		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+				pgoff, flags);
+	return hugetlb_get_unmapped_area_topdown(file, addr, len,
+			pgoff, flags);
 }
 #endif
 
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  2020-05-14 14:31 [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs Shijie Hu
@ 2020-05-15 18:44 ` Mike Kravetz
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Kravetz @ 2020-05-15 18:44 UTC (permalink / raw)
  To: Shijie Hu
  Cc: will, akpm, linux-mm, linux-kernel, nixiaoming, wangxu72,
	wangkefeng.wang, yangerkun, wangle6, cg.chen, chenjie6,
	alex.huangjianhui

On 5/14/20 7:31 AM, Shijie Hu wrote:
> Here is a final patch to solve that hugetlb_get_unmapped_area() can't
> get unmapped area below mmap base for huge pages based on a few previous
> discussions and patches from me.
> 
> I'm so sorry. When sending v2 and v3 patches, I forget to cc:
> linux-mm@kvack.org and linux-kernel@vger.kernel.org. No records of these
> two patches found on the https://lkml.org/lkml/.
> 
> Patch V1: https://lkml.org/lkml/2020/4/27/440
> Patch V4: https://lkml.org/lkml/2020/5/13/980
> 
> Changes in V2:
> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
> changing core code.
> * Add mmap_is_legacy() function, and only fall back to the bottom-up
> function on no-legacy mode.
> 
> Changes in V3:
> * Add *bottomup() and *topdown() two function, and check if
> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
> checking mmap_is_legacy() to determine which function should be used.
> 
> Changes in V4:
> * Follow the suggestions of Will and Mike, move back this patch to core
> code.
> 
> Changes in V5:
> * Fix kbuild test error.
> 
> ------
> 
> In a 32-bit program, running on arm64 architecture. When the address
> space below mmap base is completely exhausted, shmat() for huge pages
> will return ENOMEM, but shmat() for normal pages can still success on
> no-legacy mode. This seems not fair.
> 
> For normal pages, get_unmapped_area() calling flows are:
>     => mm->get_unmapped_area()
> 	if on legacy mode,
> 	    => arch_get_unmapped_area()
> 			=> vm_unmapped_area()
> 	if on no-legacy mode,
> 	    => arch_get_unmapped_area_topdown()
> 			=> vm_unmapped_area()
> 
> For huge pages, get_unmapped_area() calling flows are:
>     => file->f_op->get_unmapped_area()
> 		=> hugetlb_get_unmapped_area()
> 			=> vm_unmapped_area()
> 
> To solve this issue, we only need to make hugetlb_get_unmapped_area() take
> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
> two functions, and check current mm->get_unmapped_area() to decide which
> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
> routine.
> 
> Signed-off-by: Shijie Hu <hushijie3@huawei.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> ---
>  fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 991c60c7ffe0..61418380f492 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -38,6 +38,7 @@
>  #include <linux/uio.h>
>  
>  #include <linux/uaccess.h>
> +#include <linux/sched/mm.h>
>  
>  static const struct super_operations hugetlbfs_ops;
>  static const struct address_space_operations hugetlbfs_aops;
> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>  static unsigned long
> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	struct hstate *h = hstate_file(file);
> +	struct vm_unmapped_area_info info;
> +
> +	info.flags = 0;
> +	info.length = len;
> +	info.low_limit = current->mm->mmap_base;
> +	info.high_limit = TASK_SIZE;
> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> +	info.align_offset = 0;
> +	return vm_unmapped_area(&info);
> +}
> +
> +static unsigned long
> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	struct hstate *h = hstate_file(file);
> +	struct vm_unmapped_area_info info;
> +
> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> +	info.length = len;
> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> +	info.high_limit = current->mm->mmap_base;
> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> +	info.align_offset = 0;
> +	addr = vm_unmapped_area(&info);
> +
> +	/*
> +	 * A failed mmap() very likely causes application failure,
> +	 * so fall back to the bottom-up function here. This scenario
> +	 * can happen with large stack limits and large mmap()
> +	 * allocations.
> +	 */
> +	if (unlikely(offset_in_page(addr))) {
> +		VM_BUG_ON(addr != -ENOMEM);
> +		info.flags = 0;
> +		info.low_limit = current->mm->mmap_base;
> +		info.high_limit = TASK_SIZE;
> +		addr = vm_unmapped_area(&info);
> +	}
> +
> +	return addr;
> +}
> +
> +static unsigned long
>  hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  		unsigned long len, unsigned long pgoff, unsigned long flags)
>  {
>  	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma;
>  	struct hstate *h = hstate_file(file);
> -	struct vm_unmapped_area_info info;
>  
>  	if (len & ~huge_page_mask(h))
>  		return -EINVAL;
> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  			return addr;
>  	}
>  
> -	info.flags = 0;
> -	info.length = len;
> -	info.low_limit = TASK_UNMAPPED_BASE;
> -	info.high_limit = TASK_SIZE;
> -	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> -	info.align_offset = 0;
> -	return vm_unmapped_area(&info);
> +	if (mm->get_unmapped_area == arch_get_unmapped_area)
> +		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> +				pgoff, flags);
> +	return hugetlb_get_unmapped_area_topdown(file, addr, len,
> +			pgoff, flags);

I like this code using the value of mm->get_unmapped_area to determine
which routine to call.  It is used by a few architectures.   However, I
noticed that on at least one architecture (powerpc) mm->get_unmapped_area
may be assigned to routines other than arch_get_unmapped_area or
arch_get_unmapped_area_topdown.  In such a case, we would call the 'new'
topdown routine.  I would prefer that we call the bottomup routine in this
default case.

In reality, this does not impact powerpc as that architecture has it's
own hugetlb_get_unmapped_area routine.

Because of this, I suggest we add a comment above this code and switch
the if/else order.  For example,

+       /*
+        * Use mm->get_unmapped_area value as a hint to use topdown routine.
+        * If architectures have special needs, they should define their own
+        * version of hugetlb_get_unmapped_area.
+        */
+       if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+               return hugetlb_get_unmapped_area_topdown(file, addr, len,
+                               pgoff, flags);
+       return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+                       pgoff, flags);


Thoughts?
-- 
Mike Kravetz


>  }
>  #endif
>  
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
@ 2020-05-18  6:23 Hushijie
  0 siblings, 0 replies; 5+ messages in thread
From: Hushijie @ 2020-05-18  6:23 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: will, akpm, linux-mm, linux-kernel, Nixiaoming, wangxu (AE),
	Wangkefeng (OS Kernel Lab), yangerkun, Wangle (RTOS FAE),
	Chengang (L), Chenjie (K), Huangjianhui (Alex)

>On 5/16/20 12:47 AM, Hushijie wrote:
>>> On 5/14/20 7:31 AM, Shijie Hu wrote:
>>>> +	if (mm->get_unmapped_area == arch_get_unmapped_area)
>>>> +		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>>>> +				pgoff, flags);
>>>> +	return hugetlb_get_unmapped_area_topdown(file, addr, len,
>>>> +			pgoff, flags);
>>>
>>> I like this code using the value of mm->get_unmapped_area to determine
>>> which routine to call.  It is used by a few architectures.   However, I
>>> noticed that on at least one architecture (powerpc) mm->get_unmapped_area
>>> may be assigned to routines other than arch_get_unmapped_area or
>>> arch_get_unmapped_area_topdown.  In such a case, we would call the 'new'
>>> topdown routine.  I would prefer that we call the bottomup routine in this
>>> default case.
>>>
>>> In reality, this does not impact powerpc as that architecture has it's
>>> own hugetlb_get_unmapped_area routine.
>>>
>> 
>> Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*() 
>> when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns 
>> true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This
>> patch actually has no effect on powerpc.
>> 
>>> Because of this, I suggest we add a comment above this code and switch
>>> the if/else order.  For example,
>>>
>>> +       /*
>>> +        * Use mm->get_unmapped_area value as a hint to use topdown routine.
>>> +        * If architectures have special needs, they should define their own
>>> +        * version of hugetlb_get_unmapped_area.
>>> +        */
>>> +       if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>>> +               return hugetlb_get_unmapped_area_topdown(file, addr, len,
>>> +                               pgoff, flags);
>>> +       return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>>> +                       pgoff, flags);
>>>
>>> Thoughts?
>>> -- 
>>> Mike Kravetz
>>>
>> I agree with you. It's clever to switch the if/else order. If there is such
>> a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor
>> arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the
>> bottomup routine as the default behavior.
>> 
>> May I put this code and comment you show above into patch v6 and add 
>> "Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>" to it?
>
>Feel free to add this code and my Signed-off-by.
>
>I assume this still works for your use case.  Correct?
>-- 
>Mike Kravetz
>

Yes, It still works for our use case.

Thanks for your replies and suggestions, I will submit patch v6 later.
--
Shijie Hu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  2020-05-16  7:47 Hushijie
@ 2020-05-18  4:16 ` Mike Kravetz
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Kravetz @ 2020-05-18  4:16 UTC (permalink / raw)
  To: Hushijie
  Cc: will, akpm, linux-mm, linux-kernel, Nixiaoming, wangxu (AE),
	Wangkefeng (OS Kernel Lab), yangerkun, Wangle (RTOS FAE),
	Chengang (L), Chenjie (K), Huangjianhui (Alex)

On 5/16/20 12:47 AM, Hushijie wrote:
>> On 5/14/20 7:31 AM, Shijie Hu wrote:
>>> +	if (mm->get_unmapped_area == arch_get_unmapped_area)
>>> +		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>>> +				pgoff, flags);
>>> +	return hugetlb_get_unmapped_area_topdown(file, addr, len,
>>> +			pgoff, flags);
>>
>> I like this code using the value of mm->get_unmapped_area to determine
>> which routine to call.  It is used by a few architectures.   However, I
>> noticed that on at least one architecture (powerpc) mm->get_unmapped_area
>> may be assigned to routines other than arch_get_unmapped_area or
>> arch_get_unmapped_area_topdown.  In such a case, we would call the 'new'
>> topdown routine.  I would prefer that we call the bottomup routine in this
>> default case.
>>
>> In reality, this does not impact powerpc as that architecture has it's
>> own hugetlb_get_unmapped_area routine.
>>
> 
> Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*() 
> when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns 
> true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This
> patch actually has no effect on powerpc.
> 
>> Because of this, I suggest we add a comment above this code and switch
>> the if/else order.  For example,
>>
>> +       /*
>> +        * Use mm->get_unmapped_area value as a hint to use topdown routine.
>> +        * If architectures have special needs, they should define their own
>> +        * version of hugetlb_get_unmapped_area.
>> +        */
>> +       if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>> +               return hugetlb_get_unmapped_area_topdown(file, addr, len,
>> +                               pgoff, flags);
>> +       return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>> +                       pgoff, flags);
>>
>> Thoughts?
>> -- 
>> Mike Kravetz
>>
> I agree with you. It's clever to switch the if/else order. If there is such
> a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor
> arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the
> bottomup routine as the default behavior.
> 
> May I put this code and comment you show above into patch v6 and add 
> "Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>" to it?

Feel free to add this code and my Signed-off-by.

I assume this still works for your use case.  Correct?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
@ 2020-05-16  7:47 Hushijie
  2020-05-18  4:16 ` Mike Kravetz
  0 siblings, 1 reply; 5+ messages in thread
From: Hushijie @ 2020-05-16  7:47 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: will, akpm, linux-mm, linux-kernel, Nixiaoming, wangxu (AE),
	Wangkefeng (OS Kernel Lab), yangerkun, Wangle (RTOS FAE),
	Chengang (L), Chenjie (K), Huangjianhui (Alex)

>On 5/14/20 7:31 AM, Shijie Hu wrote:
>> Here is a final patch to solve that hugetlb_get_unmapped_area() can't
>> get unmapped area below mmap base for huge pages based on a few previous
>> discussions and patches from me.
>> 
>> I'm so sorry. When sending v2 and v3 patches, I forget to cc:
>> linux-mm@kvack.org and linux-kernel@vger.kernel.org. No records of these
>> two patches found on the https://lkml.org/lkml/.
>> 
>> Patch V1: https://lkml.org/lkml/2020/4/27/440
>> Patch V4: https://lkml.org/lkml/2020/5/13/980
>> 
>> Changes in V2:
>> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
>> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
>> changing core code.
>> * Add mmap_is_legacy() function, and only fall back to the bottom-up
>> function on no-legacy mode.
>> 
>> Changes in V3:
>> * Add *bottomup() and *topdown() two function, and check if
>> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
>> checking mmap_is_legacy() to determine which function should be used.
>> 
>> Changes in V4:
>> * Follow the suggestions of Will and Mike, move back this patch to core
>> code.
>> 
>> Changes in V5:
>> * Fix kbuild test error.
>> 
>> ------
>> 
>> In a 32-bit program, running on arm64 architecture. When the address
>> space below mmap base is completely exhausted, shmat() for huge pages
>> will return ENOMEM, but shmat() for normal pages can still success on
>> no-legacy mode. This seems not fair.
>> 
>> For normal pages, get_unmapped_area() calling flows are:
>>     => mm->get_unmapped_area()
>> 	if on legacy mode,
>> 	    => arch_get_unmapped_area()
>> 			=> vm_unmapped_area()
>> 	if on no-legacy mode,
>> 	    => arch_get_unmapped_area_topdown()
>> 			=> vm_unmapped_area()
>> 
>> For huge pages, get_unmapped_area() calling flows are:
>>     => file->f_op->get_unmapped_area()
>> 		=> hugetlb_get_unmapped_area()
>> 			=> vm_unmapped_area()
>> 
>> To solve this issue, we only need to make hugetlb_get_unmapped_area() take
>> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
>> two functions, and check current mm->get_unmapped_area() to decide which
>> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
>> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
>> routine.
>> 
>> Signed-off-by: Shijie Hu <hushijie3@huawei.com>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> ---
>>  fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 54 insertions(+), 8 deletions(-)
>> 
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 991c60c7ffe0..61418380f492 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -38,6 +38,7 @@
>>  #include <linux/uio.h>
>>  
>>  #include <linux/uaccess.h>
>> +#include <linux/sched/mm.h>
>>  
>>  static const struct super_operations hugetlbfs_ops;
>>  static const struct address_space_operations hugetlbfs_aops;
>> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  
>>  #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>>  static unsigned long
>> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
>> +		unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> +	struct hstate *h = hstate_file(file);
>> +	struct vm_unmapped_area_info info;
>> +
>> +	info.flags = 0;
>> +	info.length = len;
>> +	info.low_limit = current->mm->mmap_base;
>> +	info.high_limit = TASK_SIZE;
>> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> +	info.align_offset = 0;
>> +	return vm_unmapped_area(&info);
>> +}
>> +
>> +static unsigned long
>> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
>> +		unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> +	struct hstate *h = hstate_file(file);
>> +	struct vm_unmapped_area_info info;
>> +
>> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> +	info.length = len;
>> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>> +	info.high_limit = current->mm->mmap_base;
>> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> +	info.align_offset = 0;
>> +	addr = vm_unmapped_area(&info);
>> +
>> +	/*
>> +	 * A failed mmap() very likely causes application failure,
>> +	 * so fall back to the bottom-up function here. This scenario
>> +	 * can happen with large stack limits and large mmap()
>> +	 * allocations.
>> +	 */
>> +	if (unlikely(offset_in_page(addr))) {
>> +		VM_BUG_ON(addr != -ENOMEM);
>> +		info.flags = 0;
>> +		info.low_limit = current->mm->mmap_base;
>> +		info.high_limit = TASK_SIZE;
>> +		addr = vm_unmapped_area(&info);
>> +	}
>> +
>> +	return addr;
>> +}
>> +
>> +static unsigned long
>>  hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>>  		unsigned long len, unsigned long pgoff, unsigned long flags)
>>  {
>>  	struct mm_struct *mm = current->mm;
>>  	struct vm_area_struct *vma;
>>  	struct hstate *h = hstate_file(file);
>> -	struct vm_unmapped_area_info info;
>>  
>>  	if (len & ~huge_page_mask(h))
>>  		return -EINVAL;
>> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>>  			return addr;
>>  	}
>>  
>> -	info.flags = 0;
>> -	info.length = len;
>> -	info.low_limit = TASK_UNMAPPED_BASE;
>> -	info.high_limit = TASK_SIZE;
>> -	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> -	info.align_offset = 0;
>> -	return vm_unmapped_area(&info);
>> +	if (mm->get_unmapped_area == arch_get_unmapped_area)
>> +		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>> +				pgoff, flags);
>> +	return hugetlb_get_unmapped_area_topdown(file, addr, len,
>> +			pgoff, flags);
>
>I like this code using the value of mm->get_unmapped_area to determine
>which routine to call.  It is used by a few architectures.   However, I
>noticed that on at least one architecture (powerpc) mm->get_unmapped_area
>may be assigned to routines other than arch_get_unmapped_area or
>arch_get_unmapped_area_topdown.  In such a case, we would call the 'new'
>topdown routine.  I would prefer that we call the bottomup routine in this
>default case.
>
>In reality, this does not impact powerpc as that architecture has it's
>own hugetlb_get_unmapped_area routine.
>

Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*() 
when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns 
true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This
patch actually has no effect on powerpc.

>Because of this, I suggest we add a comment above this code and switch
>the if/else order.  For example,
>
>+       /*
>+        * Use mm->get_unmapped_area value as a hint to use topdown routine.
>+        * If architectures have special needs, they should define their own
>+        * version of hugetlb_get_unmapped_area.
>+        */
>+       if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>+               return hugetlb_get_unmapped_area_topdown(file, addr, len,
>+                               pgoff, flags);
>+       return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>+                       pgoff, flags);
>
>Thoughts?
>-- 
>Mike Kravetz
>
I agree with you. It's clever to switch the if/else order. If there is such
a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor
arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the
bottomup routine as the default behavior.

May I put this code and comment you show above into patch v6 and add 
"Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>" to it?
-- 
Shijie Hu

>
>>  }
>>  #endif
>>  
>> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-18  6:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 14:31 [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs Shijie Hu
2020-05-15 18:44 ` Mike Kravetz
2020-05-16  7:47 Hushijie
2020-05-18  4:16 ` Mike Kravetz
2020-05-18  6:23 Hushijie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).