linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs: check for pgoff value overflow
       [not found] <20180306133135.4dc344e478d98f0e29f47698@linux-foundation.org>
@ 2018-03-07 23:59 ` Mike Kravetz
  2018-03-08  1:35   ` Yisheng Xie
  2018-03-08 21:05 ` [PATCH v2] " Mike Kravetz
  2018-03-09  0:27 ` [PATCH v3] " Mike Kravetz
  2 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-03-07 23:59 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton, Mike Kravetz

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..cb288dec5564 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-07 23:59 ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
@ 2018-03-08  1:35   ` Yisheng Xie
  2018-03-08  4:25     ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Yisheng Xie @ 2018-03-08  1:35 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton

Hi Mike,

On 2018/3/8 7:59, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..cb288dec5564 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>  		return -EINVAL;

This seems still no the right fix, taking the following case as an example:
 mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
 remap_file_pages(0x20a00000, 0x600000, 0, 0x0020001000000000, 0);

You should just check the highest PAGE_SHIFT+1 bits of pgoff in you want check
at this point, right?

However, region_chg makes me a litter puzzle that when its return value < 0, sometime
adds_in_progress is added like this case, while sometime it is not. so why not just
change at the beginning of region_chg ?
	if (f > t)
		return -EINVAL;

Thanks
Yisheng
>  
> +	/* must be huge page aligned */
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-08  1:35   ` Yisheng Xie
@ 2018-03-08  4:25     ` Mike Kravetz
  2018-03-08 21:03       ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-03-08  4:25 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton

On 03/07/2018 05:35 PM, Yisheng Xie wrote:
> Hi Mike,
> 
> On 2018/3/8 7:59, Mike Kravetz wrote:
>> A vma with vm_pgoff large enough to overflow a loff_t type when
>> converted to a byte offset can be passed via the remap_file_pages
>> system call.  The hugetlbfs mmap routine uses the byte offset to
>> calculate reservations and file size.
>>
>> A sequence such as:
>>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
>> will result in the following when task exits/file closed,
>>   kernel BUG at mm/hugetlb.c:749!
>> Call Trace:
>>   hugetlbfs_evict_inode+0x2f/0x40
>>   evict+0xcb/0x190
>>   __dentry_kill+0xcb/0x150
>>   __fput+0x164/0x1e0
>>   task_work_run+0x84/0xa0
>>   exit_to_usermode_loop+0x7d/0x80
>>   do_syscall_64+0x18b/0x190
>>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> The overflowed pgoff value causes hugetlbfs to try to set up a
>> mapping with a negative range (end < start) that leaves invalid
>> state which causes the BUG.
>>
>> Reported-by: Nic Losby <blurbdust@gmail.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  fs/hugetlbfs/inode.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 8fe1b0aa2896..cb288dec5564 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  	vma->vm_ops = &hugetlb_vm_ops;
>>  
>>  	/*
>> -	 * Offset passed to mmap (before page shift) could have been
>> -	 * negative when represented as a (l)off_t.
>> +	 * page based offset in vm_pgoff could be sufficiently large to
>> +	 * overflow a (l)off_t when converted to byte offset.
>>  	 */
>> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>>  		return -EINVAL;
> 
> This seems still no the right fix, taking the following case as an example:
>  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>  remap_file_pages(0x20a00000, 0x600000, 0, 0x0020001000000000, 0);
> 
> You should just check the highest PAGE_SHIFT+1 bits of pgoff in you want check
> at this point, right?

Yes, thank you!
That would be the correct check and also much simpler.  Something like,

	unsigned long ovfl_mask;

	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
		      (PAGE_SHIFT + 1));
	if (vma->vm_pgoff & ovfl_mask)
		return -EINVAL;



> However, region_chg makes me a litter puzzle that when its return value < 0, sometime
> adds_in_progress is added like this case, while sometime it is not. so why not just
> change at the beginning of region_chg ?
> 	if (f > t)
> 		return -EINVAL;

If region_chg returns a value < 0, this indicates an error and adds_in_progress
should not be incremented.  In the case of this bug, region_chg was passed
values where f > t.  Of course, this should never happen.  But, because it
assumed f <= t, it returned a negative count needed huge page reservations.
The calling code interpreted the negative value as an error and a subsequent
region_add or region_abort.

I am not opposed to adding the suggested "if (f > t)".  However, the
region tracking routines are simple helpers only used by the hugetlbfs
code and the assumption is that they are being called correctly.  As
such, I would prefer to leave off the check.  But, this is the second
time they have been called incorrectly due to insufficient argument
checking.  If we do add this to region_chg, I would also add the check
to all region_* routines for consistency.

I will send out a V2 of this patch tomorrow with the corrected overflow
checking and possibly checks added to the region_* routines.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-08  4:25     ` Mike Kravetz
@ 2018-03-08 21:03       ` Mike Kravetz
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-03-08 21:03 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Andrew Morton

On 03/07/2018 08:25 PM, Mike Kravetz wrote:
> On 03/07/2018 05:35 PM, Yisheng Xie wrote:
>> However, region_chg makes me a litter puzzle that when its return value < 0, sometime
>> adds_in_progress is added like this case, while sometime it is not. so why not just
>> change at the beginning of region_chg ?
>> 	if (f > t)
>> 		return -EINVAL;
> 
> If region_chg returns a value < 0, this indicates an error and adds_in_progress
> should not be incremented.  In the case of this bug, region_chg was passed
> values where f > t.  Of course, this should never happen.  But, because it
> assumed f <= t, it returned a negative count needed huge page reservations.
> The calling code interpreted the negative value as an error and a subsequent
> region_add or region_abort.
> 
> I am not opposed to adding the suggested "if (f > t)".  However, the
> region tracking routines are simple helpers only used by the hugetlbfs
> code and the assumption is that they are being called correctly.  As
> such, I would prefer to leave off the check.  But, this is the second
> time they have been called incorrectly due to insufficient argument
> checking.  If we do add this to region_chg, I would also add the check
> to all region_* routines for consistency.

I really did not want to add the (f > t) check to the region_* routines.
As mentioned we should never encounter this condition.  Adding the check
here says that we missed discovering an error at higher levels.  Therefore,
I went back and examined the callers of region_chg.  There are only 2:
hugetlb_reserve_pages and __vma_reservation_common.  hugetlb_reserve_pages
is called to set up a reservation for a mapping.  __vma_reservation_common
is called to check on an existing reservation, and only operates on a
single huge page.  With this in mind, a check in hugetlb_reserve_pages
would be sufficient.  Therefore, I added an explicit check to that routine
and printed a warning if ever encountered.

> I will send out a V2 of this patch tomorrow with the corrected overflow
> checking and possibly checks added to the region_* routines.

v2 will be sent shortly.  In v2 I Cc stable as this is an issue for
stable branches as well.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] hugetlbfs: check for pgoff value overflow
       [not found] <20180306133135.4dc344e478d98f0e29f47698@linux-foundation.org>
  2018-03-07 23:59 ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
@ 2018-03-08 21:05 ` Mike Kravetz
  2018-03-08 22:15   ` Andrew Morton
  2018-03-09  0:27 ` [PATCH v3] " Mike Kravetz
  2 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-03-08 21:05 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Yisheng Xie,
	Andrew Morton, Mike Kravetz, stable

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

The previous overflow fix to this code was incomplete and did not
take the remap_file_pages system call into account.

Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
Changes in v2
  * Use bitmask for overflow check as suggested by Yisheng Xie
  * Add explicit (from > to) check when setting up reservations
  * Cc stable

 fs/hugetlbfs/inode.c | 11 ++++++++---
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..dafffa6affae 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
+	unsigned long ovfl_mask;
 	loff_t len, vma_len;
 	int ret;
 	struct hstate *h = hstate_file(file);
@@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
+	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
+		       (PAGE_SHIFT + 1));
+	if (vma->vm_pgoff & ovfl_mask)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..8eeade0a0b7a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+		VM_WARN(1, "%s called with a negative range\n", __func__);
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 21:05 ` [PATCH v2] " Mike Kravetz
@ 2018-03-08 22:15   ` Andrew Morton
  2018-03-08 23:37     ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2018-03-08 22:15 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu,  8 Mar 2018 13:05:02 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct inode *inode = file_inode(file);
> +	unsigned long ovfl_mask;
>  	loff_t len, vma_len;
>  	int ret;
>  	struct hstate *h = hstate_file(file);
> @@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
> +	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
> +		       (PAGE_SHIFT + 1));

That's a compile-time constant.  The compiler will indeed generate an
immediate load, but I think it would be better to make the code look
more like we know that it's a constant, if you get what I mean. 
Something like

/*
 * If a pgoff_t is to be converted to a byte index, this is the max value it
 * can have to avoid overflow in that conversion.
 */
#define PGOFF_T_MAX	<long string of crap>

And I bet that this constant could be used elsewhere - surely it's a
very common thing to be checking for.


Also, the expression seems rather complicated.  Why are we adding 1 to
PAGE_SHIFT?  Isn't there a logical way of using PAGE_MASK?

The resulting constant is 0xfff8000000000000 on 64-bit.  We could just
use along the lines of

	1UL << (BITS_PER_LONG - PAGE_SHIFT - 1)

But why the -1?  We should be able to handle a pgoff_t of
0xfff0000000000000 in this code?


Also, we later to

	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
	/* check for overflow */
	if (len < vma_len)
		return -EINVAL;

which is ungainly: even if we passed the PGOFF_T_MAX test, there can
still be an overflow which we still must check for.  Is that avoidable?
Probably not...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 22:15   ` Andrew Morton
@ 2018-03-08 23:37     ` Mike Kravetz
  2018-03-08 23:53       ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-03-08 23:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On 03/08/2018 02:15 PM, Andrew Morton wrote:
> On Thu,  8 Mar 2018 13:05:02 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> A vma with vm_pgoff large enough to overflow a loff_t type when
>> converted to a byte offset can be passed via the remap_file_pages
>> system call.  The hugetlbfs mmap routine uses the byte offset to
>> calculate reservations and file size.
>>
>> A sequence such as:
>>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
>> will result in the following when task exits/file closed,
>>   kernel BUG at mm/hugetlb.c:749!
>> Call Trace:
>>   hugetlbfs_evict_inode+0x2f/0x40
>>   evict+0xcb/0x190
>>   __dentry_kill+0xcb/0x150
>>   __fput+0x164/0x1e0
>>   task_work_run+0x84/0xa0
>>   exit_to_usermode_loop+0x7d/0x80
>>   do_syscall_64+0x18b/0x190
>>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> The overflowed pgoff value causes hugetlbfs to try to set up a
>> mapping with a negative range (end < start) that leaves invalid
>> state which causes the BUG.
>>
>> The previous overflow fix to this code was incomplete and did not
>> take the remap_file_pages system call into account.
>>
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
>>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  {
>>  	struct inode *inode = file_inode(file);
>> +	unsigned long ovfl_mask;
>>  	loff_t len, vma_len;
>>  	int ret;
>>  	struct hstate *h = hstate_file(file);
>> @@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  	vma->vm_ops = &hugetlb_vm_ops;
>>  
>>  	/*
>> -	 * Offset passed to mmap (before page shift) could have been
>> -	 * negative when represented as a (l)off_t.
>> +	 * page based offset in vm_pgoff could be sufficiently large to
>> +	 * overflow a (l)off_t when converted to byte offset.
>>  	 */
>> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
>> +	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
>> +		       (PAGE_SHIFT + 1));
> 
> That's a compile-time constant.  The compiler will indeed generate an
> immediate load, but I think it would be better to make the code look
> more like we know that it's a constant, if you get what I mean. 
> Something like
> 
> /*
>  * If a pgoff_t is to be converted to a byte index, this is the max value it
>  * can have to avoid overflow in that conversion.
>  */
> #define PGOFF_T_MAX	<long string of crap>

Ok

> And I bet that this constant could be used elsewhere - surely it's a
> very common thing to be checking for.
> 
> 
> Also, the expression seems rather complicated.  Why are we adding 1 to
> PAGE_SHIFT?  Isn't there a logical way of using PAGE_MASK?

The + 1 is there because this value will eventually be converted to
a loff_t which is signed.  So, we need to take that sign bit into
account or we could end up with a negative value.

For PAGE_SHIFT == 12, PAGE_MASK is 0xfffffffffffff000.  Our target
mask is  0xfff8000000000000 (for the sign bit).  So, we could do
PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1)

This legacy hugetlbfs code may be a little different than other areas
in the use of loff_t.  When doing some previous work in this area, I
did not find enough common used to make this more general purpose.  See,
https://lkml.org/lkml/2017/4/12/793

> The resulting constant is 0xfff8000000000000 on 64-bit.  We could just
> use along the lines of
> 
> 	1UL << (BITS_PER_LONG - PAGE_SHIFT - 1)

Ah yes, BITS_PER_LONG is better than (sizeof(unsigned long) * BITS_PER_BYTE

> But why the -1?  We should be able to handle a pgoff_t of
> 0xfff0000000000000 in this code?

I'm not exactly sure what you are asking/suggesting here and in the line
above.  It is because of the conversion to a signed value that we have to
go with 0xfff8000000000000 instead of 0xfff0000000000000.

Here are a couple options for computing the mask.  I changed the name
you suggested to make it more obvious that the mask is being used to
check for loff_t overflow.

If we want to explicitly comptue the mask as in code above.
#define PGOFF_LOFFT_MAX \
	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))

Or, we use PAGE_MASK
#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

In either case, we need a big comment explaining the mask and
how we have that extra bit +/- 1 because the offset will be converted
to a signed value.
	
> Also, we later to
> 
> 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> 	/* check for overflow */
> 	if (len < vma_len)
> 		return -EINVAL;
> 
> which is ungainly: even if we passed the PGOFF_T_MAX test, there can
> still be an overflow which we still must check for.  Is that avoidable?
> Probably not...

Yes, it is required.  That check takes into account the length argument
which is added to page offset.  So, yes you can pass the first check and
fail this one.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 23:37     ` Mike Kravetz
@ 2018-03-08 23:53       ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2018-03-08 23:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu, 8 Mar 2018 15:37:57 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Here are a couple options for computing the mask.  I changed the name
> you suggested to make it more obvious that the mask is being used to
> check for loff_t overflow.
> 
> If we want to explicitly comptue the mask as in code above.
> #define PGOFF_LOFFT_MAX \
> 	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> 
> Or, we use PAGE_MASK
> #define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

Sounds good.

> In either case, we need a big comment explaining the mask and
> how we have that extra bit +/- 1 because the offset will be converted
> to a signed value.

Yup.

> > Also, we later to
> > 
> > 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> > 	/* check for overflow */
> > 	if (len < vma_len)
> > 		return -EINVAL;
> > 
> > which is ungainly: even if we passed the PGOFF_T_MAX test, there can
> > still be an overflow which we still must check for.  Is that avoidable?
> > Probably not...
> 
> Yes, it is required.  That check takes into account the length argument
> which is added to page offset.  So, yes you can pass the first check and
> fail this one.

Well I was sort of wondering if both checks could be done in a single
operation, but I guess not.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] hugetlbfs: check for pgoff value overflow
       [not found] <20180306133135.4dc344e478d98f0e29f47698@linux-foundation.org>
  2018-03-07 23:59 ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
  2018-03-08 21:05 ` [PATCH v2] " Mike Kravetz
@ 2018-03-09  0:27 ` Mike Kravetz
  2018-03-09  1:05   ` Andrew Morton
  2018-03-16 10:17   ` Michal Hocko
  2 siblings, 2 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-03-09  0:27 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Yisheng Xie,
	Andrew Morton, Mike Kravetz, stable

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

The previous overflow fix to this code was incomplete and did not
take the remap_file_pages system call into account.

Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
Changes in v3
  * Use a simpler mask computation as suggested by Andrew Morton
Changes in v2
  * Use bitmask for overflow check as suggested by Yisheng Xie
  * Add explicit (from > to) check when setting up reservations
  * Cc stable

 fs/hugetlbfs/inode.c | 16 +++++++++++++---
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..e46117dc006a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
 	pagevec_reinit(pvec);
 }
 
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls.  This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value.  The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
+
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
@@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..8eeade0a0b7a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+		VM_WARN(1, "%s called with a negative range\n", __func__);
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-09  0:27 ` [PATCH v3] " Mike Kravetz
@ 2018-03-09  1:05   ` Andrew Morton
  2018-03-16 10:17   ` Michal Hocko
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2018-03-09  1:05 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu,  8 Mar 2018 16:27:26 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	struct resv_map *resv_map;
>  	long gbl_reserve;
>  
> +	/* This should never happen */
> +	if (from > to) {
> +		VM_WARN(1, "%s called with a negative range\n", __func__);
> +		return -EINVAL;
> +	}

This is tidier, and that comment is a bit too obvious to live ;)

--- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow-v3-fix
+++ a/mm/hugetlb.c
@@ -18,6 +18,7 @@
 #include <linux/bootmem.h>
 #include <linux/sysfs.h>
 #include <linux/slab.h>
+#include <linux/mmdebug.h>
 #include <linux/sched/signal.h>
 #include <linux/rmap.h>
 #include <linux/string_helpers.h>
@@ -4374,11 +4375,8 @@ int hugetlb_reserve_pages(struct inode *
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
-	/* This should never happen */
-	if (from > to) {
-		VM_WARN(1, "%s called with a negative range\n", __func__);
+	if (VM_WARN(from > to, "%s called with a negative range\n", __func__))
 		return -EINVAL;
-	}
 
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-09  0:27 ` [PATCH v3] " Mike Kravetz
  2018-03-09  1:05   ` Andrew Morton
@ 2018-03-16 10:17   ` Michal Hocko
  2018-03-16 16:19     ` Mike Kravetz
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2018-03-16 10:17 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
> Cc: <stable@vger.kernel.org>
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

OK, looks good to me. Hairy but seems to be the easiest way around this.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> Changes in v3
>   * Use a simpler mask computation as suggested by Andrew Morton
> Changes in v2
>   * Use bitmask for overflow check as suggested by Yisheng Xie
>   * Add explicit (from > to) check when setting up reservations
>   * Cc stable
> 
>  fs/hugetlbfs/inode.c | 16 +++++++++++++---
>  mm/hugetlb.c         |  6 ++++++
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..e46117dc006a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
>  	pagevec_reinit(pvec);
>  }
>  
> +/*
> + * Mask used when checking the page offset value passed in via system
> + * calls.  This value will be converted to a loff_t which is signed.
> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> + * bit into account.
> + */
> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
> +
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct inode *inode = file_inode(file);
> @@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
>  		return -EINVAL;
>  
> +	/* must be huge page aligned */
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7c204e3d132b..8eeade0a0b7a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	struct resv_map *resv_map;
>  	long gbl_reserve;
>  
> +	/* This should never happen */
> +	if (from > to) {
> +		VM_WARN(1, "%s called with a negative range\n", __func__);
> +		return -EINVAL;
> +	}
> +
>  	/*
>  	 * Only apply hugepage reservation if asked. At fault time, an
>  	 * attempt will be made for VM_NORESERVE to allocate a page
> -- 
> 2.13.6

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-16 10:17   ` Michal Hocko
@ 2018-03-16 16:19     ` Mike Kravetz
  2018-03-16 16:53       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-03-16 16:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On 03/16/2018 03:17 AM, Michal Hocko wrote:
> On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> 
> OK, looks good to me. Hairy but seems to be the easiest way around this.
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
<snip>
>> +/*
>> + * Mask used when checking the page offset value passed in via system
>> + * calls.  This value will be converted to a loff_t which is signed.
>> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
>> + * value.  The extra bit (- 1 in the shift value) is to take the sign
>> + * bit into account.
>> + */
>> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

Thanks Michal,

However, kbuild found a problem with this definition on certain configs.
Consider a config where,
BITS_PER_LONG = 32 (32bit config)
PAGE_SHIFT = 16 (64K pages)
This results in the negative shift value.
Not something I would not immediately think of, but a valid config.

The definition has been changed to,
#define PGOFF_LOFFT_MAX \
	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))

as discussed here,
http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-16 16:19     ` Mike Kravetz
@ 2018-03-16 16:53       ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2018-03-16 16:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On Fri 16-03-18 09:19:07, Mike Kravetz wrote:
> On 03/16/2018 03:17 AM, Michal Hocko wrote:
> > On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> > 
> > OK, looks good to me. Hairy but seems to be the easiest way around this.
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> <snip>
> >> +/*
> >> + * Mask used when checking the page offset value passed in via system
> >> + * calls.  This value will be converted to a loff_t which is signed.
> >> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> >> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> >> + * bit into account.
> >> + */
> >> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
> 
> Thanks Michal,
> 
> However, kbuild found a problem with this definition on certain configs.
> Consider a config where,
> BITS_PER_LONG = 32 (32bit config)
> PAGE_SHIFT = 16 (64K pages)
> This results in the negative shift value.
> Not something I would not immediately think of, but a valid config.

Well, 64K pages on 32b doesn't sound even remotely sane to me but what
ever.

> The definition has been changed to,
> #define PGOFF_LOFFT_MAX \
> 	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> 
> as discussed here,
> http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com

This looks more wild but seems correct as well. You can keep my acked-by

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-03-16 16:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180306133135.4dc344e478d98f0e29f47698@linux-foundation.org>
2018-03-07 23:59 ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
2018-03-08  1:35   ` Yisheng Xie
2018-03-08  4:25     ` Mike Kravetz
2018-03-08 21:03       ` Mike Kravetz
2018-03-08 21:05 ` [PATCH v2] " Mike Kravetz
2018-03-08 22:15   ` Andrew Morton
2018-03-08 23:37     ` Mike Kravetz
2018-03-08 23:53       ` Andrew Morton
2018-03-09  0:27 ` [PATCH v3] " Mike Kravetz
2018-03-09  1:05   ` Andrew Morton
2018-03-16 10:17   ` Michal Hocko
2018-03-16 16:19     ` Mike Kravetz
2018-03-16 16:53       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).