linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
@ 2017-04-11 22:51 Mike Kravetz
  2017-04-12  3:13 ` Hillf Danton
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mike Kravetz @ 2017-04-11 22:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Vegard Nossum, Dmitry Vyukov, Hillf Danton, Michal Hocko,
	Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi,
	Andrew Morton, Mike Kravetz

If mmap() maps a file, it can be passed an offset into the file at
which the mapping is to start.  Offset could be a negative value when
represented as a loff_t.  The offset plus length will be used to
update the file size (i_size) which is also a loff_t.  Validate the
value of offset and offset + length to make sure they do not overflow
and appear as negative.

Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied.  Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.
To reproduce:
mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Resulted in,
kernel BUG at mm/hugetlb.c:742!
Call Trace:
 hugetlbfs_evict_inode+0x80/0xa0
 ? hugetlbfs_setattr+0x3c0/0x3c0
 evict+0x24a/0x620
 iput+0x48f/0x8c0
 dentry_unlink_inode+0x31f/0x4d0
 __dentry_kill+0x292/0x5e0
 dput+0x730/0x830
 __fput+0x438/0x720
 ____fput+0x1a/0x20
 task_work_run+0xfe/0x180
 exit_to_usermode_loop+0x133/0x150
 syscall_return_slowpath+0x184/0x1c0
 entry_SYSCALL_64_fastpath+0xab/0xad

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 7163fe0..dde8613 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
+	/*
+	 * Offset passed to mmap (before page shift) could have been
+	 * negative when represented as a (l)off_t.
+	 */
+	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+		return -EINVAL;
+
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
+	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+	/* check for overflow */
+	if (len < vma_len)
+		return -EINVAL;
 
 	inode_lock(inode);
 	file_accessed(file);
 
 	ret = -ENOMEM;
-	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-
 	if (hugetlb_reserve_pages(inode,
 				vma->vm_pgoff >> huge_page_order(h),
 				len >> huge_page_shift(h), vma,
@@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 	ret = 0;
 	if (vma->vm_flags & VM_WRITE && inode->i_size < len)
-		inode->i_size = len;
+		i_size_write(inode, len);
 out:
 	inode_unlock(inode);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-11 22:51 [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap Mike Kravetz
@ 2017-04-12  3:13 ` Hillf Danton
  2017-04-12  8:58 ` Vegard Nossum
  2017-04-14  3:32 ` Naoya Horiguchi
  2 siblings, 0 replies; 7+ messages in thread
From: Hillf Danton @ 2017-04-12  3:13 UTC (permalink / raw)
  To: 'Mike Kravetz', linux-mm, linux-kernel
  Cc: 'Vegard Nossum', 'Dmitry Vyukov',
	'Michal Hocko', 'Kirill A . Shutemov',
	'Andrey Ryabinin', 'Naoya Horiguchi',
	'Andrew Morton'

On April 12, 2017 6:52 AM Mike Kravetz wrote: 
> 
> If mmap() maps a file, it can be passed an offset into the file at
> which the mapping is to start.  Offset could be a negative value when
> represented as a loff_t.  The offset plus length will be used to
> update the file size (i_size) which is also a loff_t.  Validate the
> value of offset and offset + length to make sure they do not overflow
> and appear as negative.
> 
> Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
> region_abort if region_chg fails") applied.  Prior to this commit, the
> overflow would still occur but we would luckily return ENOMEM.
> To reproduce:
> mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
> 
> Resulted in,
> kernel BUG at mm/hugetlb.c:742!
> Call Trace:
>  hugetlbfs_evict_inode+0x80/0xa0
>  ? hugetlbfs_setattr+0x3c0/0x3c0
>  evict+0x24a/0x620
>  iput+0x48f/0x8c0
>  dentry_unlink_inode+0x31f/0x4d0
>  __dentry_kill+0x292/0x5e0
>  dput+0x730/0x830
>  __fput+0x438/0x720
>  ____fput+0x1a/0x20
>  task_work_run+0xfe/0x180
>  exit_to_usermode_loop+0x133/0x150
>  syscall_return_slowpath+0x184/0x1c0
>  entry_SYSCALL_64_fastpath+0xab/0xad
> 
> Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---

Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-11 22:51 [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap Mike Kravetz
  2017-04-12  3:13 ` Hillf Danton
@ 2017-04-12  8:58 ` Vegard Nossum
  2017-04-12 20:04   ` Mike Kravetz
  2017-04-14  3:32 ` Naoya Horiguchi
  2 siblings, 1 reply; 7+ messages in thread
From: Vegard Nossum @ 2017-04-12  8:58 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Linux Memory Management List, LKML, Dmitry Vyukov, Hillf Danton,
	Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Naoya Horiguchi, Andrew Morton

On 12 April 2017 at 00:51, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> If mmap() maps a file, it can be passed an offset into the file at
> which the mapping is to start.  Offset could be a negative value when
> represented as a loff_t.  The offset plus length will be used to
> update the file size (i_size) which is also a loff_t.  Validate the
> value of offset and offset + length to make sure they do not overflow
> and appear as negative.
>
> Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
> region_abort if region_chg fails") applied.  Prior to this commit, the
> overflow would still occur but we would luckily return ENOMEM.
> To reproduce:
> mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
>
> Resulted in,
> kernel BUG at mm/hugetlb.c:742!
> Call Trace:
>  hugetlbfs_evict_inode+0x80/0xa0
>  ? hugetlbfs_setattr+0x3c0/0x3c0
>  evict+0x24a/0x620
>  iput+0x48f/0x8c0
>  dentry_unlink_inode+0x31f/0x4d0
>  __dentry_kill+0x292/0x5e0
>  dput+0x730/0x830
>  __fput+0x438/0x720
>  ____fput+0x1a/0x20
>  task_work_run+0xfe/0x180
>  exit_to_usermode_loop+0x133/0x150
>  syscall_return_slowpath+0x184/0x1c0
>  entry_SYSCALL_64_fastpath+0xab/0xad
>
> Reported-by: Vegard Nossum <vegard.nossum@gmail.com>

Please use <vegard.nossum@oracle.com> if possible :-)

> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 7163fe0..dde8613 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>         vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>         vma->vm_ops = &hugetlb_vm_ops;
>
> +       /*
> +        * Offset passed to mmap (before page shift) could have been
> +        * negative when represented as a (l)off_t.
> +        */
> +       if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +               return -EINVAL;
> +

This is strictly speaking undefined behaviour in C and would get
flagged by e.g. UBSAN. The kernel does compile with
-fno-strict-overflow when supported, though, so maybe it's more of a
theoretical issue.

Another thing: wouldn't we want to detect all truncations, not just
the ones that happen to end up negative?

For example (with -fno-strict-overflow), (0x12345678 << 12) ==
0x45678000, which is still a positive integer, but obviously
truncated.

We can easily avoid the UB by moving the cast out (since ->vm_pgoff is
unsigned and unsigned shifts are always defined IIRC), but that still
doesn't reliably detect the positive-result truncation/overflow.

>         if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>                 return -EINVAL;
>
>         vma_len = (loff_t)(vma->vm_end - vma->vm_start);
> +       len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> +       /* check for overflow */
> +       if (len < vma_len)
> +               return -EINVAL;

Also strictly speaking UB. You can avoid it by casting vma_len to
unsigned and dropping the loff_t cast, but it's admittedly somewhat
verbose. There also isn't an "unsigned loff_t" AFAIK, but don't we
have some helpers to safely check for overflows? Surely this isn't the
only place that does loff_t arithmetic.

>
>         inode_lock(inode);
>         file_accessed(file);
>
>         ret = -ENOMEM;
> -       len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> -
>         if (hugetlb_reserve_pages(inode,
>                                 vma->vm_pgoff >> huge_page_order(h),
>                                 len >> huge_page_shift(h), vma,
> @@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>
>         ret = 0;
>         if (vma->vm_flags & VM_WRITE && inode->i_size < len)
> -               inode->i_size = len;
> +               i_size_write(inode, len);
>  out:
>         inode_unlock(inode);

This hunk seems a bit out of place in the sense that I don't see how
it relates to the overflow checking. I think this either belongs in a
separate patch or it deserves a mention in the changelog.


Vegard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-12  8:58 ` Vegard Nossum
@ 2017-04-12 20:04   ` Mike Kravetz
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2017-04-12 20:04 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Memory Management List, LKML, Dmitry Vyukov, Hillf Danton,
	Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Naoya Horiguchi, Andrew Morton

On 04/12/2017 01:58 AM, Vegard Nossum wrote:
> On 12 April 2017 at 00:51, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>> If mmap() maps a file, it can be passed an offset into the file at
>> which the mapping is to start.  Offset could be a negative value when
>> represented as a loff_t.  The offset plus length will be used to
>> update the file size (i_size) which is also a loff_t.  Validate the
>> value of offset and offset + length to make sure they do not overflow
>> and appear as negative.
>>
>> Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
>> region_abort if region_chg fails") applied.  Prior to this commit, the
>> overflow would still occur but we would luckily return ENOMEM.
>> To reproduce:
>> mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
>>
>> Resulted in,
>> kernel BUG at mm/hugetlb.c:742!
>> Call Trace:
>>  hugetlbfs_evict_inode+0x80/0xa0
>>  ? hugetlbfs_setattr+0x3c0/0x3c0
>>  evict+0x24a/0x620
>>  iput+0x48f/0x8c0
>>  dentry_unlink_inode+0x31f/0x4d0
>>  __dentry_kill+0x292/0x5e0
>>  dput+0x730/0x830
>>  __fput+0x438/0x720
>>  ____fput+0x1a/0x20
>>  task_work_run+0xfe/0x180
>>  exit_to_usermode_loop+0x133/0x150
>>  syscall_return_slowpath+0x184/0x1c0
>>  entry_SYSCALL_64_fastpath+0xab/0xad
>>
>> Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
> 
> Please use <vegard.nossum@oracle.com> if possible :-)
> 
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  fs/hugetlbfs/inode.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 7163fe0..dde8613 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>         vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>>         vma->vm_ops = &hugetlb_vm_ops;
>>
>> +       /*
>> +        * Offset passed to mmap (before page shift) could have been
>> +        * negative when represented as a (l)off_t.
>> +        */
>> +       if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +               return -EINVAL;
>> +
> 
> This is strictly speaking undefined behaviour in C and would get
> flagged by e.g. UBSAN. The kernel does compile with
> -fno-strict-overflow when supported, though, so maybe it's more of a
> theoretical issue.
> 
> Another thing: wouldn't we want to detect all truncations, not just
> the ones that happen to end up negative?
> 
> For example (with -fno-strict-overflow), (0x12345678 << 12) ==
> 0x45678000, which is still a positive integer, but obviously
> truncated.
> 
> We can easily avoid the UB by moving the cast out (since ->vm_pgoff is
> unsigned and unsigned shifts are always defined IIRC), but that still
> doesn't reliably detect the positive-result truncation/overflow.

The value in vm_pgoff was indirectly provided by the user.  This is the
'off_t offset' value provided in the mmap system call.  Before, getting
to this hugetlbfs mmap routine the value is shifted right (>> PAGE_SHIFT)
so that the value in bytes provided by the user is converted to a page
offset.  This shift right is done with an unsigned type so there is no
sign extension.  As a result, I do not think we have to worry about
anything but the negative check here.  Let me know if my thinking is
not valid.

As for the undefined behavior, I guess we can do the shift on the unsigned
type and then cast to loff_t.  The reason for using loff_t is that this
value may eventually be used in the calculation of a value assigned to
i_size which is of this type.

In the patch, I just used the same cast/assignment that previously existed
in the code.  It could be changed to,

if (((loff_t)(vma->vm_pgoff << PAGE_SHIFT)) < 0)

>>         if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>>                 return -EINVAL;
>>
>>         vma_len = (loff_t)(vma->vm_end - vma->vm_start);
>> +       len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
>> +       /* check for overflow */
>> +       if (len < vma_len)
>> +               return -EINVAL;
> 
> Also strictly speaking UB. You can avoid it by casting vma_len to
> unsigned and dropping the loff_t cast, but it's admittedly somewhat
> verbose. There also isn't an "unsigned loff_t" AFAIK, but don't we
> have some helpers to safely check for overflows? Surely this isn't the
> only place that does loff_t arithmetic.

I came up empty in my search for helpers.  Actually, I spent more time
trying to figure out how this was handled in other filesystems.  Then I
quickly discovered that hugetlbfs is 'special' and appears to be the
only one which has to deal with this situation.

The only 'similar code' is in the vfs layer when the offset argument to
fallocate is validated.  There it is a simple check for negative value.

I am not sure if it make much sense to eliminate the shifting of signed
values in this patch.  Certainly, this is strictly UB as you say.  But,
after calculating these values the loff_t values are once again shifted
only a few lines later.  I'm afraid this happens is several places in
this code.

>>
>>         inode_lock(inode);
>>         file_accessed(file);
>>
>>         ret = -ENOMEM;
>> -       len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
>> -
>>         if (hugetlb_reserve_pages(inode,
>>                                 vma->vm_pgoff >> huge_page_order(h),
>>                                 len >> huge_page_shift(h), vma,
>> @@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>
>>         ret = 0;
>>         if (vma->vm_flags & VM_WRITE && inode->i_size < len)
>> -               inode->i_size = len;
>> +               i_size_write(inode, len);
>>  out:
>>         inode_unlock(inode);
> 
> This hunk seems a bit out of place in the sense that I don't see how
> it relates to the overflow checking. I think this either belongs in a
> separate patch or it deserves a mention in the changelog.

When looking at this bug, I was really concerned that i_size might be set
to a negative value.  It was my hope that the helper routine inode_newsize_ok
would validate the value, but it does not.  I left in the use of this helper,
since the code was already being changed.  I can change as people see fit.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-11 22:51 [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap Mike Kravetz
  2017-04-12  3:13 ` Hillf Danton
  2017-04-12  8:58 ` Vegard Nossum
@ 2017-04-14  3:32 ` Naoya Horiguchi
  2017-04-15 22:58   ` Mike Kravetz
  2 siblings, 1 reply; 7+ messages in thread
From: Naoya Horiguchi @ 2017-04-14  3:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Vegard Nossum, Dmitry Vyukov,
	Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Andrew Morton

On Tue, Apr 11, 2017 at 03:51:58PM -0700, Mike Kravetz wrote:
...
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 7163fe0..dde8613 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
> +	/*
> +	 * Offset passed to mmap (before page shift) could have been
> +	 * negative when represented as a (l)off_t.
> +	 */
> +	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +		return -EINVAL;
> +
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
>  	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
> +	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> +	/* check for overflow */
> +	if (len < vma_len)
> +		return -EINVAL;

Andrew sent this patch to Linus today, so I know it's a little too late, but
I think that getting len directly from vma like below might be a simpler fix.

  len = (loff_t)(vma->vm_end - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT)); 

This shouldn't overflow because vma->vm_{end|start|pgoff} are unsigned long,
but if worried you can add VM_BUG_ON_VMA(len < 0, vma).

Thanks,
Naoya Horiguchi

>  
>  	inode_lock(inode);
>  	file_accessed(file);
>  
>  	ret = -ENOMEM;
> -	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> -
>  	if (hugetlb_reserve_pages(inode,
>  				vma->vm_pgoff >> huge_page_order(h),
>  				len >> huge_page_shift(h), vma,
> @@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  	ret = 0;
>  	if (vma->vm_flags & VM_WRITE && inode->i_size < len)
> -		inode->i_size = len;
> +		i_size_write(inode, len);
>  out:
>  	inode_unlock(inode);
>  
> -- 
> 2.7.4
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-14  3:32 ` Naoya Horiguchi
@ 2017-04-15 22:58   ` Mike Kravetz
  2017-04-16 23:43     ` Naoya Horiguchi
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2017-04-15 22:58 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Vegard Nossum, Dmitry Vyukov,
	Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Andrew Morton

On 04/13/2017 08:32 PM, Naoya Horiguchi wrote:
> On Tue, Apr 11, 2017 at 03:51:58PM -0700, Mike Kravetz wrote:
> ...
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 7163fe0..dde8613 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>>  	vma->vm_ops = &hugetlb_vm_ops;
>>  
>> +	/*
>> +	 * Offset passed to mmap (before page shift) could have been
>> +	 * negative when represented as a (l)off_t.
>> +	 */
>> +	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +		return -EINVAL;
>> +
>>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>>  		return -EINVAL;
>>  
>>  	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
>> +	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
>> +	/* check for overflow */
>> +	if (len < vma_len)
>> +		return -EINVAL;
> 
> Andrew sent this patch to Linus today, so I know it's a little too late, but
> I think that getting len directly from vma like below might be a simpler fix.
> 
>   len = (loff_t)(vma->vm_end - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT)); 
> 
> This shouldn't overflow because vma->vm_{end|start|pgoff} are unsigned long,
> but if worried you can add VM_BUG_ON_VMA(len < 0, vma).

Thanks Naoya,

I am pretty sure the checks are necessary.  You are correct in that
vma->vm_{end|start|pgoff} are unsigned long.  However,  pgoff can be
a REALLY big value that becomes negative when shifted.

Note that pgoff is simply the off_t offset value passed from the user cast
to unsigned long and shifted right by PAGE_SHIFT.  There is nothing to
prevent a user from passing a 'signed' negative value.  In the reproducer
provided, the value passed from user space is 0x8000000000000000ULL.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap
  2017-04-15 22:58   ` Mike Kravetz
@ 2017-04-16 23:43     ` Naoya Horiguchi
  0 siblings, 0 replies; 7+ messages in thread
From: Naoya Horiguchi @ 2017-04-16 23:43 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Vegard Nossum, Dmitry Vyukov,
	Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Andrew Morton

On Sat, Apr 15, 2017 at 03:58:59PM -0700, Mike Kravetz wrote:
> On 04/13/2017 08:32 PM, Naoya Horiguchi wrote:
> > On Tue, Apr 11, 2017 at 03:51:58PM -0700, Mike Kravetz wrote:
> > ...
> >> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> >> index 7163fe0..dde8613 100644
> >> --- a/fs/hugetlbfs/inode.c
> >> +++ b/fs/hugetlbfs/inode.c
> >> @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> >>  	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
> >>  	vma->vm_ops = &hugetlb_vm_ops;
> >>  
> >> +	/*
> >> +	 * Offset passed to mmap (before page shift) could have been
> >> +	 * negative when represented as a (l)off_t.
> >> +	 */
> >> +	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> >> +		return -EINVAL;
> >> +
> >>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> >>  		return -EINVAL;
> >>  
> >>  	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
> >> +	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> >> +	/* check for overflow */
> >> +	if (len < vma_len)
> >> +		return -EINVAL;
> > 
> > Andrew sent this patch to Linus today, so I know it's a little too late, but
> > I think that getting len directly from vma like below might be a simpler fix.
> > 
> >   len = (loff_t)(vma->vm_end - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT)); 
> > 
> > This shouldn't overflow because vma->vm_{end|start|pgoff} are unsigned long,
> > but if worried you can add VM_BUG_ON_VMA(len < 0, vma).
> 
> Thanks Naoya,
> 
> I am pretty sure the checks are necessary.  You are correct in that
> vma->vm_{end|start|pgoff} are unsigned long.  However,  pgoff can be
> a REALLY big value that becomes negative when shifted.
> 
> Note that pgoff is simply the off_t offset value passed from the user cast
> to unsigned long and shifted right by PAGE_SHIFT.  There is nothing to
> prevent a user from passing a 'signed' negative value.  In the reproducer
> provided, the value passed from user space is 0x8000000000000000ULL.

OK, thank you for explanation. You're right.

- Naoya

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-16 23:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 22:51 [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap Mike Kravetz
2017-04-12  3:13 ` Hillf Danton
2017-04-12  8:58 ` Vegard Nossum
2017-04-12 20:04   ` Mike Kravetz
2017-04-14  3:32 ` Naoya Horiguchi
2017-04-15 22:58   ` Mike Kravetz
2017-04-16 23:43     ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).