Re: [PATCH] mm: allow checking length for hugetlb mapping in mmap()

From: "Li Xinhai" <lixinhai.lxh@gmail.com>
To: "John Hubbard" <jhubbard@nvidia.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: "Linux API" <linux-api@vger.kernel.org>,
	akpm <akpm@linux-foundation.org>,
	"Mike Kravetz" <mike.kravetz@oracle.com>
Subject: Re: [PATCH] mm: allow checking length for hugetlb mapping in mmap()
Date: Sun, 29 Mar 2020 16:09:36 +0800	[thread overview]
Message-ID: <2020032916093522557671@gmail.com> (raw)
In-Reply-To: f8b5b647-9041-8127-925c-1c8dcb508f24@nvidia.com

On 2020-03-29 at 11:53 John Hubbard wrote:
>On 3/28/20 8:08 PM, Li Xinhai wrote:
>> In current code, the vma related call of hugetlb mapping, except mmap,
>> are all consider not correctly aligned length as invalid parameter,
>> including mprotect,munmap, mlock, etc., by checking through
>> hugetlb_vm_op_split. So, user will see failure, after successfully call
>> mmap, although using same length parameter to other mapping syscall.
>>
>> It is desirable for all hugetlb mapping calls have consistent behavior,
>> without mmap as exception(which round up length to align underlying
>> hugepage size). In current Documentation/admin-guide/mm/hugetlbpage.rst,
>> the description is:
>> "
>> Syscalls that operate on memory backed by hugetlb pages only have their
>> lengths aligned to the native page size of the processor; they will
>> normally fail with errno set to EINVAL or exclude hugetlb pages that
>> extend beyond the length if not hugepage aligned. For example, munmap(2)
>> will fail if memory is backed by a hugetlb page and the length is smaller
>> than the hugepage size.
>> "
>> which express the consistent behavior.
>
>
>Missing here is a description of what the patch actually does...
> 

right, more statement can be added like:
"
After this patch, all hugetlb mapping related syscall wil only align
length parameter to the native page size of the processor. For mmap(),
hugetlb_get_unmmaped_area() will set errno to EINVAL if length is not
aligned to underlying hugepage size.
"

>>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> ---
>> changes:
>> 0. patch which introduce new flag for mmap()
>>     The new flag should be avoided.
>> https://lore.kernel.org/linux-mm/1585313944-8627-1-git-send-email-lixinhai.lxh@gmail.com/
>>
>>   mm/mmap.c | 8 --------
>>   1 file changed, 8 deletions(-)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index d681a20..b2aa102 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1560,20 +1560,12 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
>>   file = fget(fd);
>>   if (!file)
>>   return -EBADF;
>> -	if (is_file_hugepages(file))
>> -	len = ALIGN(len, huge_page_size(hstate_file(file)));
>
>
>...and it looks like this is simply removing the forced alignment. And not adding
>any error case for non-aligned cases. So now I'm not immediately sure what happens if a
>non-aligned address is passed in.
>
>I would have expected to see either error checking or an ALIGN call here, but now both
>are gone, so I'm lost and confused. :)
> 

After this patch, the alignement will only on "native page size of the processor" as done in
do_mmap(). Then, following the code path, checking further by hugetlb_get_unmmaped_area()
according to underlying hugepage size.

>
>thanks,
>--
>John Hubbard
>NVIDIA
>
>>   retval = -EINVAL;
>>   if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>>   goto out_fput;
>>   } else if (flags & MAP_HUGETLB) {
>>   struct user_struct *user = NULL;
>> -	struct hstate *hs;
>>  
>> -	hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>> -	if (!hs)
>> -	return -EINVAL;
>> -
>> -	len = ALIGN(len, huge_page_size(hs));
>>   /*
>>   * VM_NORESERVE is used because the reservations will be
>>   * taken when vm_ops->mmap() is called
>>