All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
       [not found] <bug-199037-27@https.bugzilla.kernel.org/>
@ 2018-03-06 21:31 ` Andrew Morton
  2018-03-06 21:46   ` Mike Kravetz
                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Andrew Morton @ 2018-03-06 21:31 UTC (permalink / raw)
  To: Kirill A. Shutemov, Mike Kravetz, Michal Hocko
  Cc: bugzilla-daemon, linux-mm, blurbdust


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 06 Mar 2018 21:11:50 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=199037
> 
>             Bug ID: 199037
>            Summary: Kernel bug at mm/hugetlb.c:741
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.16.0-rc3
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: blurbdust@gmail.com
>         Regression: No
> 
> Created attachment 274595
>   --> https://bugzilla.kernel.org/attachment.cgi?id=274595&action=edit
> crash.c
> 
> Hello,
> I apologize as this is my first time reporting a bug. When I compile and run
> the attached file it crashes the latest kernel running in QEMU. Call trace
> here: https://pastebin.com/1mMQvH0E
> 
> Let me know if you have any questions.
> 

Thanks for the report.

That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().

Do you know if earlier kernel versions are affected?

It looks quite bisectable.  Does the crash happen every time the test
program is run?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
  2018-03-06 21:31 ` [Bug 199037] New: Kernel bug at mm/hugetlb.c:741 Andrew Morton
@ 2018-03-06 21:46   ` Mike Kravetz
  2018-03-07  0:31     ` Mike Kravetz
  2018-03-07 23:59   ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-06 21:46 UTC (permalink / raw)
  To: Andrew Morton, Kirill A. Shutemov, Michal Hocko
  Cc: bugzilla-daemon, linux-mm, blurbdust

On 03/06/2018 01:31 PM, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 06 Mar 2018 21:11:50 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=199037
>>
>>             Bug ID: 199037
>>            Summary: Kernel bug at mm/hugetlb.c:741
>>            Product: Memory Management
>>            Version: 2.5
>>     Kernel Version: 4.16.0-rc3
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Page Allocator
>>           Assignee: akpm@linux-foundation.org
>>           Reporter: blurbdust@gmail.com
>>         Regression: No
>>
>> Created attachment 274595
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=274595&action=edit
>> crash.c
>>
>> Hello,
>> I apologize as this is my first time reporting a bug. When I compile and run
>> the attached file it crashes the latest kernel running in QEMU. Call trace
>> here: https://pastebin.com/1mMQvH0E
>>
>> Let me know if you have any questions.
>>
> 
> Thanks for the report.
> 
> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
> 
> Do you know if earlier kernel versions are affected?
> 
> It looks quite bisectable.  Does the crash happen every time the test
> program is run?

I'll take a look.  There was a previous bug in this area:
ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
  2018-03-06 21:46   ` Mike Kravetz
@ 2018-03-07  0:31     ` Mike Kravetz
  2018-03-07  2:49       ` Nic Losby
  2018-03-07  4:19       ` Mike Kravetz
  0 siblings, 2 replies; 21+ messages in thread
From: Mike Kravetz @ 2018-03-07  0:31 UTC (permalink / raw)
  To: Andrew Morton, Kirill A. Shutemov, Michal Hocko
  Cc: bugzilla-daemon, linux-mm, blurbdust

On 03/06/2018 01:46 PM, Mike Kravetz wrote:
> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>
>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>
>> Do you know if earlier kernel versions are affected?
>>
>> It looks quite bisectable.  Does the crash happen every time the test
>> program is run?
> 
> I'll take a look.  There was a previous bug in this area:
> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails

This is similar to the issue addressed in 045c7a3f ("fix offset overflow
in hugetlbfs mmap").  The problem here is that the pgoff argument passed
to remap_file_pages() is 0x20000000000000.  In the process of converting
this to a page offset and putting it in vm_pgoff, and then converting back
to bytes to compute mapping length we end up with 0.  We ultimately end
up passing (from,to) page offsets into hugetlbfs where from is greater
than to. :( This confuses the heck out the the huge page reservation code
as the 'negative' range looks like an error and we never complete the
reservation process and leave the 'adds_in_progress'.

This issue has existed for a long time.  The VM_BUG_ON just happens to
catch the situation which was previously not reported or had some other
side effect.  Commit 045c7a3f tried to catch these overflow issues when
converting types, but obviously missed this one.  I can easily add a test
for this specific value/condition, but want to think about it a little
more and see if there is a better way to catch all of these.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
  2018-03-07  0:31     ` Mike Kravetz
@ 2018-03-07  2:49       ` Nic Losby
  2018-03-07  4:19       ` Mike Kravetz
  1 sibling, 0 replies; 21+ messages in thread
From: Nic Losby @ 2018-03-07  2:49 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Kirill A. Shutemov, Michal Hocko, bugzilla-daemon,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]

Awesome. Let me know if you need anything else from me. I can keep testing
kernel versions if requested.

Getting a CVE is something that is high on my bucket list. Even though this
is only Denial of Service at best, what are the chances this would be
assigned a CVE?

On Tue, Mar 6, 2018 at 6:31 PM, Mike Kravetz <mike.kravetz@oracle.com>
wrote:

> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
> > On 03/06/2018 01:31 PM, Andrew Morton wrote:
> >>
> >> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
> >>
> >> Do you know if earlier kernel versions are affected?
> >>
> >> It looks quite bisectable.  Does the crash happen every time the test
> >> program is run?
> >
> > I'll take a look.  There was a previous bug in this area:
> > ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
>
> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
> to remap_file_pages() is 0x20000000000000.  In the process of converting
> this to a page offset and putting it in vm_pgoff, and then converting back
> to bytes to compute mapping length we end up with 0.  We ultimately end
> up passing (from,to) page offsets into hugetlbfs where from is greater
> than to. :( This confuses the heck out the the huge page reservation code
> as the 'negative' range looks like an error and we never complete the
> reservation process and leave the 'adds_in_progress'.
>
> This issue has existed for a long time.  The VM_BUG_ON just happens to
> catch the situation which was previously not reported or had some other
> side effect.  Commit 045c7a3f tried to catch these overflow issues when
> converting types, but obviously missed this one.  I can easily add a test
> for this specific value/condition, but want to think about it a little
> more and see if there is a better way to catch all of these.
>
> --
> Mike Kravetz
>

[-- Attachment #2: Type: text/html, Size: 2548 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
  2018-03-07  0:31     ` Mike Kravetz
  2018-03-07  2:49       ` Nic Losby
@ 2018-03-07  4:19       ` Mike Kravetz
  2018-03-07 16:39         ` Mike Kravetz
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-07  4:19 UTC (permalink / raw)
  To: Andrew Morton, Kirill A. Shutemov, Michal Hocko
  Cc: bugzilla-daemon, linux-mm, blurbdust

On 03/06/2018 04:31 PM, Mike Kravetz wrote:
> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
>> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>>
>>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>>
>>> Do you know if earlier kernel versions are affected?
>>>
>>> It looks quite bisectable.  Does the crash happen every time the test
>>> program is run?
>>
>> I'll take a look.  There was a previous bug in this area:
>> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
> 
> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
> to remap_file_pages() is 0x20000000000000.  In the process of converting
> this to a page offset and putting it in vm_pgoff, and then converting back
> to bytes to compute mapping length we end up with 0.  We ultimately end
> up passing (from,to) page offsets into hugetlbfs where from is greater
> than to. :( This confuses the heck out the the huge page reservation code
> as the 'negative' range looks like an error and we never complete the
> reservation process and leave the 'adds_in_progress'.
> 
> This issue has existed for a long time.  The VM_BUG_ON just happens to
> catch the situation which was previously not reported or had some other
> side effect.  Commit 045c7a3f tried to catch these overflow issues when
> converting types, but obviously missed this one.  I can easily add a test
> for this specific value/condition, but want to think about it a little
> more and see if there is a better way to catch all of these.

Well, I instrumented hugetlbfs_file_mmap when called via the remap_file_pages
system call path.  Upon entry, vma->vm_pgoff is 0x20000000000000 which is
the same as the value of the argument pgoff passed to the system call.
vm_pgoff really should be a page offset (i.e. 0x20000000000000 >> PAGE_SHIFT).
So, there is also an issue earlier in the remap_file_pages system call
sequence.

For mmap(), there are architecture specific system call entry points that
do the 'offset >> PAGE_SHIFT' before passing on the value to arch independent
routines.  For remap_file_pages, it looks like sparc is the only arch which
has such a routine.  I know remap_file_pages is deprecated, but could it
really be broken that badly on all architectures but sparc?  Perhaps nobody
really uses it?

To fix, we could add arch specific entry points for all architectures.  But,
that seems like a bunch of effort for a system call that perhaps nobody is
using.  The other option is to remove the sparc entry point, and do the
'pgoff >> PAGE_SHIFT' in the arch independent code.

Thoughts?
-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 199037] New: Kernel bug at mm/hugetlb.c:741
  2018-03-07  4:19       ` Mike Kravetz
@ 2018-03-07 16:39         ` Mike Kravetz
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Kravetz @ 2018-03-07 16:39 UTC (permalink / raw)
  To: Andrew Morton, Kirill A. Shutemov, Michal Hocko
  Cc: bugzilla-daemon, linux-mm, blurbdust

On 03/06/2018 08:19 PM, Mike Kravetz wrote:
> On 03/06/2018 04:31 PM, Mike Kravetz wrote:
>> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
>>> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>>>
>>>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>>>
>>>> Do you know if earlier kernel versions are affected?
>>>>
>>>> It looks quite bisectable.  Does the crash happen every time the test
>>>> program is run?
>>>
>>> I'll take a look.  There was a previous bug in this area:
>>> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
>>
>> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
>> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
>> to remap_file_pages() is 0x20000000000000.  In the process of converting
>> this to a page offset and putting it in vm_pgoff, and then converting back
>> to bytes to compute mapping length we end up with 0.  We ultimately end
>> up passing (from,to) page offsets into hugetlbfs where from is greater
>> than to. :( This confuses the heck out the the huge page reservation code
>> as the 'negative' range looks like an error and we never complete the
>> reservation process and leave the 'adds_in_progress'.
>>
>> This issue has existed for a long time.  The VM_BUG_ON just happens to
>> catch the situation which was previously not reported or had some other
>> side effect.  Commit 045c7a3f tried to catch these overflow issues when
>> converting types, but obviously missed this one.  I can easily add a test
>> for this specific value/condition, but want to think about it a little
>> more and see if there is a better way to catch all of these.
> 
> Well, I instrumented hugetlbfs_file_mmap when called via the remap_file_pages
> system call path.  Upon entry, vma->vm_pgoff is 0x20000000000000 which is
> the same as the value of the argument pgoff passed to the system call.
> vm_pgoff really should be a page offset (i.e. 0x20000000000000 >> PAGE_SHIFT).
> So, there is also an issue earlier in the remap_file_pages system call
> sequence.

My mistake.  The pgoff argument to remap_file_pages is a page offset in page
size units.  So, there should be no '>> PAGE_SHIFT' of the argument.

The hugetlbfs code wants to convert vm_pgoff to a byte offset by
'<< PAGE_SHIFT'.  This is what overflows and gets us into trouble.

My first thought is to simply check for this overflow in remap_file_pages.
Other code within the kernel converts vm_pgoff to a byte offset and I am
not sure they could handle/expect an overflow.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-06 21:31 ` [Bug 199037] New: Kernel bug at mm/hugetlb.c:741 Andrew Morton
  2018-03-06 21:46   ` Mike Kravetz
@ 2018-03-07 23:59   ` Mike Kravetz
  2018-03-08  0:57     ` Nic Losby
  2018-03-08  1:35       ` Yisheng Xie
  2018-03-08 21:05   ` [PATCH v2] " Mike Kravetz
  2018-03-09  0:27   ` [PATCH v3] " Mike Kravetz
  3 siblings, 2 replies; 21+ messages in thread
From: Mike Kravetz @ 2018-03-07 23:59 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton, Mike Kravetz

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..cb288dec5564 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-07 23:59   ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
@ 2018-03-08  0:57     ` Nic Losby
  2018-03-08  1:35       ` Yisheng Xie
  1 sibling, 0 replies; 21+ messages in thread
From: Nic Losby @ 2018-03-08  0:57 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Hillf Danton, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]

Confirmed as patched! The kernel no longer crashes after the patch was
applied. Thank you for your time!

On Wed, Mar 7, 2018 at 5:59 PM, Mike Kravetz <mike.kravetz@oracle.com>
wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
>
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
>
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..cb288dec5564 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file,
> struct vm_area_struct *vma)
>         vma->vm_ops = &hugetlb_vm_ops;
>
>         /*
> -        * Offset passed to mmap (before page shift) could have been
> -        * negative when represented as a (l)off_t.
> +        * page based offset in vm_pgoff could be sufficiently large to
> +        * overflow a (l)off_t when converted to byte offset.
>          */
> -       if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +       if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>                 return -EINVAL;
>
> +       /* must be huge page aligned */
>         if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>                 return -EINVAL;
>
> --
> 2.13.6
>
>

[-- Attachment #2: Type: text/html, Size: 2983 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-07 23:59   ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
@ 2018-03-08  1:35       ` Yisheng Xie
  2018-03-08  1:35       ` Yisheng Xie
  1 sibling, 0 replies; 21+ messages in thread
From: Yisheng Xie @ 2018-03-08  1:35 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton

Hi Mike,

On 2018/3/8 7:59, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..cb288dec5564 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>  		return -EINVAL;

This seems still no the right fix, taking the following case as an example:
 mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
 remap_file_pages(0x20a00000, 0x600000, 0, 0x0020001000000000, 0);

You should just check the highest PAGE_SHIFT+1 bits of pgoff in you want check
at this point, right?

However, region_chg makes me a litter puzzle that when its return value < 0, sometime
adds_in_progress is added like this case, while sometime it is not. so why not just
change at the beginning of region_chg ?
	if (f > t)
		return -EINVAL;

Thanks
Yisheng
>  
> +	/* must be huge page aligned */
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
@ 2018-03-08  1:35       ` Yisheng Xie
  0 siblings, 0 replies; 21+ messages in thread
From: Yisheng Xie @ 2018-03-08  1:35 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton

Hi Mike,

On 2018/3/8 7:59, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..cb288dec5564 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>  		return -EINVAL;

This seems still no the right fix, taking the following case as an example:
 mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
 remap_file_pages(0x20a00000, 0x600000, 0, 0x0020001000000000, 0);

You should just check the highest PAGE_SHIFT+1 bits of pgoff in you want check
at this point, right?

However, region_chg makes me a litter puzzle that when its return value < 0, sometime
adds_in_progress is added like this case, while sometime it is not. so why not just
change at the beginning of region_chg ?
	if (f > t)
		return -EINVAL;

Thanks
Yisheng
>  
> +	/* must be huge page aligned */
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-08  1:35       ` Yisheng Xie
  (?)
@ 2018-03-08  4:25       ` Mike Kravetz
  2018-03-08 21:03         ` Mike Kravetz
  -1 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-08  4:25 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Hillf Danton, Nic Losby,
	Andrew Morton

On 03/07/2018 05:35 PM, Yisheng Xie wrote:
> Hi Mike,
> 
> On 2018/3/8 7:59, Mike Kravetz wrote:
>> A vma with vm_pgoff large enough to overflow a loff_t type when
>> converted to a byte offset can be passed via the remap_file_pages
>> system call.  The hugetlbfs mmap routine uses the byte offset to
>> calculate reservations and file size.
>>
>> A sequence such as:
>>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
>> will result in the following when task exits/file closed,
>>   kernel BUG at mm/hugetlb.c:749!
>> Call Trace:
>>   hugetlbfs_evict_inode+0x2f/0x40
>>   evict+0xcb/0x190
>>   __dentry_kill+0xcb/0x150
>>   __fput+0x164/0x1e0
>>   task_work_run+0x84/0xa0
>>   exit_to_usermode_loop+0x7d/0x80
>>   do_syscall_64+0x18b/0x190
>>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> The overflowed pgoff value causes hugetlbfs to try to set up a
>> mapping with a negative range (end < start) that leaves invalid
>> state which causes the BUG.
>>
>> Reported-by: Nic Losby <blurbdust@gmail.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  fs/hugetlbfs/inode.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 8fe1b0aa2896..cb288dec5564 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -127,12 +127,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  	vma->vm_ops = &hugetlb_vm_ops;
>>  
>>  	/*
>> -	 * Offset passed to mmap (before page shift) could have been
>> -	 * negative when represented as a (l)off_t.
>> +	 * page based offset in vm_pgoff could be sufficiently large to
>> +	 * overflow a (l)off_t when converted to byte offset.
>>  	 */
>> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +	if (vma->vm_pgoff && ((loff_t)vma->vm_pgoff << PAGE_SHIFT) <= 0)
>>  		return -EINVAL;
> 
> This seems still no the right fix, taking the following case as an example:
>  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>  remap_file_pages(0x20a00000, 0x600000, 0, 0x0020001000000000, 0);
> 
> You should just check the highest PAGE_SHIFT+1 bits of pgoff in you want check
> at this point, right?

Yes, thank you!
That would be the correct check and also much simpler.  Something like,

	unsigned long ovfl_mask;

	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
		      (PAGE_SHIFT + 1));
	if (vma->vm_pgoff & ovfl_mask)
		return -EINVAL;



> However, region_chg makes me a litter puzzle that when its return value < 0, sometime
> adds_in_progress is added like this case, while sometime it is not. so why not just
> change at the beginning of region_chg ?
> 	if (f > t)
> 		return -EINVAL;

If region_chg returns a value < 0, this indicates an error and adds_in_progress
should not be incremented.  In the case of this bug, region_chg was passed
values where f > t.  Of course, this should never happen.  But, because it
assumed f <= t, it returned a negative count needed huge page reservations.
The calling code interpreted the negative value as an error and a subsequent
region_add or region_abort.

I am not opposed to adding the suggested "if (f > t)".  However, the
region tracking routines are simple helpers only used by the hugetlbfs
code and the assumption is that they are being called correctly.  As
such, I would prefer to leave off the check.  But, this is the second
time they have been called incorrectly due to insufficient argument
checking.  If we do add this to region_chg, I would also add the check
to all region_* routines for consistency.

I will send out a V2 of this patch tomorrow with the corrected overflow
checking and possibly checks added to the region_* routines.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] hugetlbfs: check for pgoff value overflow
  2018-03-08  4:25       ` Mike Kravetz
@ 2018-03-08 21:03         ` Mike Kravetz
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Kravetz @ 2018-03-08 21:03 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Andrew Morton

On 03/07/2018 08:25 PM, Mike Kravetz wrote:
> On 03/07/2018 05:35 PM, Yisheng Xie wrote:
>> However, region_chg makes me a litter puzzle that when its return value < 0, sometime
>> adds_in_progress is added like this case, while sometime it is not. so why not just
>> change at the beginning of region_chg ?
>> 	if (f > t)
>> 		return -EINVAL;
> 
> If region_chg returns a value < 0, this indicates an error and adds_in_progress
> should not be incremented.  In the case of this bug, region_chg was passed
> values where f > t.  Of course, this should never happen.  But, because it
> assumed f <= t, it returned a negative count needed huge page reservations.
> The calling code interpreted the negative value as an error and a subsequent
> region_add or region_abort.
> 
> I am not opposed to adding the suggested "if (f > t)".  However, the
> region tracking routines are simple helpers only used by the hugetlbfs
> code and the assumption is that they are being called correctly.  As
> such, I would prefer to leave off the check.  But, this is the second
> time they have been called incorrectly due to insufficient argument
> checking.  If we do add this to region_chg, I would also add the check
> to all region_* routines for consistency.

I really did not want to add the (f > t) check to the region_* routines.
As mentioned we should never encounter this condition.  Adding the check
here says that we missed discovering an error at higher levels.  Therefore,
I went back and examined the callers of region_chg.  There are only 2:
hugetlb_reserve_pages and __vma_reservation_common.  hugetlb_reserve_pages
is called to set up a reservation for a mapping.  __vma_reservation_common
is called to check on an existing reservation, and only operates on a
single huge page.  With this in mind, a check in hugetlb_reserve_pages
would be sufficient.  Therefore, I added an explicit check to that routine
and printed a warning if ever encountered.

> I will send out a V2 of this patch tomorrow with the corrected overflow
> checking and possibly checks added to the region_* routines.

v2 will be sent shortly.  In v2 I Cc stable as this is an issue for
stable branches as well.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-06 21:31 ` [Bug 199037] New: Kernel bug at mm/hugetlb.c:741 Andrew Morton
  2018-03-06 21:46   ` Mike Kravetz
  2018-03-07 23:59   ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
@ 2018-03-08 21:05   ` Mike Kravetz
  2018-03-08 22:15     ` Andrew Morton
  2018-03-09  0:27   ` [PATCH v3] " Mike Kravetz
  3 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-08 21:05 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Yisheng Xie,
	Andrew Morton, Mike Kravetz, stable

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

The previous overflow fix to this code was incomplete and did not
take the remap_file_pages system call into account.

Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
Changes in v2
  * Use bitmask for overflow check as suggested by Yisheng Xie
  * Add explicit (from > to) check when setting up reservations
  * Cc stable

 fs/hugetlbfs/inode.c | 11 ++++++++---
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..dafffa6affae 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
+	unsigned long ovfl_mask;
 	loff_t len, vma_len;
 	int ret;
 	struct hstate *h = hstate_file(file);
@@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
+	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
+		       (PAGE_SHIFT + 1));
+	if (vma->vm_pgoff & ovfl_mask)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..8eeade0a0b7a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+		VM_WARN(1, "%s called with a negative range\n", __func__);
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 21:05   ` [PATCH v2] " Mike Kravetz
@ 2018-03-08 22:15     ` Andrew Morton
  2018-03-08 23:37       ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2018-03-08 22:15 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu,  8 Mar 2018 13:05:02 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct inode *inode = file_inode(file);
> +	unsigned long ovfl_mask;
>  	loff_t len, vma_len;
>  	int ret;
>  	struct hstate *h = hstate_file(file);
> @@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
> +	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
> +		       (PAGE_SHIFT + 1));

That's a compile-time constant.  The compiler will indeed generate an
immediate load, but I think it would be better to make the code look
more like we know that it's a constant, if you get what I mean. 
Something like

/*
 * If a pgoff_t is to be converted to a byte index, this is the max value it
 * can have to avoid overflow in that conversion.
 */
#define PGOFF_T_MAX	<long string of crap>

And I bet that this constant could be used elsewhere - surely it's a
very common thing to be checking for.


Also, the expression seems rather complicated.  Why are we adding 1 to
PAGE_SHIFT?  Isn't there a logical way of using PAGE_MASK?

The resulting constant is 0xfff8000000000000 on 64-bit.  We could just
use along the lines of

	1UL << (BITS_PER_LONG - PAGE_SHIFT - 1)

But why the -1?  We should be able to handle a pgoff_t of
0xfff0000000000000 in this code?


Also, we later to

	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
	/* check for overflow */
	if (len < vma_len)
		return -EINVAL;

which is ungainly: even if we passed the PGOFF_T_MAX test, there can
still be an overflow which we still must check for.  Is that avoidable?
Probably not...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 22:15     ` Andrew Morton
@ 2018-03-08 23:37       ` Mike Kravetz
  2018-03-08 23:53         ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-08 23:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On 03/08/2018 02:15 PM, Andrew Morton wrote:
> On Thu,  8 Mar 2018 13:05:02 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> A vma with vm_pgoff large enough to overflow a loff_t type when
>> converted to a byte offset can be passed via the remap_file_pages
>> system call.  The hugetlbfs mmap routine uses the byte offset to
>> calculate reservations and file size.
>>
>> A sequence such as:
>>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
>> will result in the following when task exits/file closed,
>>   kernel BUG at mm/hugetlb.c:749!
>> Call Trace:
>>   hugetlbfs_evict_inode+0x2f/0x40
>>   evict+0xcb/0x190
>>   __dentry_kill+0xcb/0x150
>>   __fput+0x164/0x1e0
>>   task_work_run+0x84/0xa0
>>   exit_to_usermode_loop+0x7d/0x80
>>   do_syscall_64+0x18b/0x190
>>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> The overflowed pgoff value causes hugetlbfs to try to set up a
>> mapping with a negative range (end < start) that leaves invalid
>> state which causes the BUG.
>>
>> The previous overflow fix to this code was incomplete and did not
>> take the remap_file_pages system call into account.
>>
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -111,6 +111,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
>>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  {
>>  	struct inode *inode = file_inode(file);
>> +	unsigned long ovfl_mask;
>>  	loff_t len, vma_len;
>>  	int ret;
>>  	struct hstate *h = hstate_file(file);
>> @@ -127,12 +128,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>  	vma->vm_ops = &hugetlb_vm_ops;
>>  
>>  	/*
>> -	 * Offset passed to mmap (before page shift) could have been
>> -	 * negative when represented as a (l)off_t.
>> +	 * page based offset in vm_pgoff could be sufficiently large to
>> +	 * overflow a (l)off_t when converted to byte offset.
>>  	 */
>> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
>> +	ovfl_mask = (1UL << (PAGE_SHIFT + 1)) - 1;
>> +	ovfl_mask <<= ((sizeof(unsigned long) * BITS_PER_BYTE) -
>> +		       (PAGE_SHIFT + 1));
> 
> That's a compile-time constant.  The compiler will indeed generate an
> immediate load, but I think it would be better to make the code look
> more like we know that it's a constant, if you get what I mean. 
> Something like
> 
> /*
>  * If a pgoff_t is to be converted to a byte index, this is the max value it
>  * can have to avoid overflow in that conversion.
>  */
> #define PGOFF_T_MAX	<long string of crap>

Ok

> And I bet that this constant could be used elsewhere - surely it's a
> very common thing to be checking for.
> 
> 
> Also, the expression seems rather complicated.  Why are we adding 1 to
> PAGE_SHIFT?  Isn't there a logical way of using PAGE_MASK?

The + 1 is there because this value will eventually be converted to
a loff_t which is signed.  So, we need to take that sign bit into
account or we could end up with a negative value.

For PAGE_SHIFT == 12, PAGE_MASK is 0xfffffffffffff000.  Our target
mask is  0xfff8000000000000 (for the sign bit).  So, we could do
PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1)

This legacy hugetlbfs code may be a little different than other areas
in the use of loff_t.  When doing some previous work in this area, I
did not find enough common used to make this more general purpose.  See,
https://lkml.org/lkml/2017/4/12/793

> The resulting constant is 0xfff8000000000000 on 64-bit.  We could just
> use along the lines of
> 
> 	1UL << (BITS_PER_LONG - PAGE_SHIFT - 1)

Ah yes, BITS_PER_LONG is better than (sizeof(unsigned long) * BITS_PER_BYTE

> But why the -1?  We should be able to handle a pgoff_t of
> 0xfff0000000000000 in this code?

I'm not exactly sure what you are asking/suggesting here and in the line
above.  It is because of the conversion to a signed value that we have to
go with 0xfff8000000000000 instead of 0xfff0000000000000.

Here are a couple options for computing the mask.  I changed the name
you suggested to make it more obvious that the mask is being used to
check for loff_t overflow.

If we want to explicitly comptue the mask as in code above.
#define PGOFF_LOFFT_MAX \
	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))

Or, we use PAGE_MASK
#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

In either case, we need a big comment explaining the mask and
how we have that extra bit +/- 1 because the offset will be converted
to a signed value.
	
> Also, we later to
> 
> 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> 	/* check for overflow */
> 	if (len < vma_len)
> 		return -EINVAL;
> 
> which is ungainly: even if we passed the PGOFF_T_MAX test, there can
> still be an overflow which we still must check for.  Is that avoidable?
> Probably not...

Yes, it is required.  That check takes into account the length argument
which is added to page offset.  So, yes you can pass the first check and
fail this one.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] hugetlbfs: check for pgoff value overflow
  2018-03-08 23:37       ` Mike Kravetz
@ 2018-03-08 23:53         ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2018-03-08 23:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu, 8 Mar 2018 15:37:57 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Here are a couple options for computing the mask.  I changed the name
> you suggested to make it more obvious that the mask is being used to
> check for loff_t overflow.
> 
> If we want to explicitly comptue the mask as in code above.
> #define PGOFF_LOFFT_MAX \
> 	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> 
> Or, we use PAGE_MASK
> #define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

Sounds good.

> In either case, we need a big comment explaining the mask and
> how we have that extra bit +/- 1 because the offset will be converted
> to a signed value.

Yup.

> > Also, we later to
> > 
> > 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> > 	/* check for overflow */
> > 	if (len < vma_len)
> > 		return -EINVAL;
> > 
> > which is ungainly: even if we passed the PGOFF_T_MAX test, there can
> > still be an overflow which we still must check for.  Is that avoidable?
> > Probably not...
> 
> Yes, it is required.  That check takes into account the length argument
> which is added to page offset.  So, yes you can pass the first check and
> fail this one.

Well I was sort of wondering if both checks could be done in a single
operation, but I guess not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-06 21:31 ` [Bug 199037] New: Kernel bug at mm/hugetlb.c:741 Andrew Morton
                     ` (2 preceding siblings ...)
  2018-03-08 21:05   ` [PATCH v2] " Mike Kravetz
@ 2018-03-09  0:27   ` Mike Kravetz
  2018-03-09  1:05     ` Andrew Morton
  2018-03-16 10:17     ` Michal Hocko
  3 siblings, 2 replies; 21+ messages in thread
From: Mike Kravetz @ 2018-03-09  0:27 UTC (permalink / raw)
  To: linux-mm, linux-kernel, bugzilla-daemon
  Cc: Michal Hocko, Kirill A . Shutemov, Nic Losby, Yisheng Xie,
	Andrew Morton, Mike Kravetz, stable

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

The previous overflow fix to this code was incomplete and did not
take the remap_file_pages system call into account.

Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
Changes in v3
  * Use a simpler mask computation as suggested by Andrew Morton
Changes in v2
  * Use bitmask for overflow check as suggested by Yisheng Xie
  * Add explicit (from > to) check when setting up reservations
  * Cc stable

 fs/hugetlbfs/inode.c | 16 +++++++++++++---
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..e46117dc006a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
 	pagevec_reinit(pvec);
 }
 
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls.  This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value.  The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
+
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
@@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..8eeade0a0b7a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+		VM_WARN(1, "%s called with a negative range\n", __func__);
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-09  0:27   ` [PATCH v3] " Mike Kravetz
@ 2018-03-09  1:05     ` Andrew Morton
  2018-03-16 10:17     ` Michal Hocko
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2018-03-09  1:05 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Michal Hocko,
	Kirill A . Shutemov, Nic Losby, Yisheng Xie, stable

On Thu,  8 Mar 2018 16:27:26 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	struct resv_map *resv_map;
>  	long gbl_reserve;
>  
> +	/* This should never happen */
> +	if (from > to) {
> +		VM_WARN(1, "%s called with a negative range\n", __func__);
> +		return -EINVAL;
> +	}

This is tidier, and that comment is a bit too obvious to live ;)

--- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow-v3-fix
+++ a/mm/hugetlb.c
@@ -18,6 +18,7 @@
 #include <linux/bootmem.h>
 #include <linux/sysfs.h>
 #include <linux/slab.h>
+#include <linux/mmdebug.h>
 #include <linux/sched/signal.h>
 #include <linux/rmap.h>
 #include <linux/string_helpers.h>
@@ -4374,11 +4375,8 @@ int hugetlb_reserve_pages(struct inode *
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
-	/* This should never happen */
-	if (from > to) {
-		VM_WARN(1, "%s called with a negative range\n", __func__);
+	if (VM_WARN(from > to, "%s called with a negative range\n", __func__))
 		return -EINVAL;
-	}
 
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-09  0:27   ` [PATCH v3] " Mike Kravetz
  2018-03-09  1:05     ` Andrew Morton
@ 2018-03-16 10:17     ` Michal Hocko
  2018-03-16 16:19       ` Mike Kravetz
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2018-03-16 10:17 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
> Cc: <stable@vger.kernel.org>
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

OK, looks good to me. Hairy but seems to be the easiest way around this.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> Changes in v3
>   * Use a simpler mask computation as suggested by Andrew Morton
> Changes in v2
>   * Use bitmask for overflow check as suggested by Yisheng Xie
>   * Add explicit (from > to) check when setting up reservations
>   * Cc stable
> 
>  fs/hugetlbfs/inode.c | 16 +++++++++++++---
>  mm/hugetlb.c         |  6 ++++++
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..e46117dc006a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
>  	pagevec_reinit(pvec);
>  }
>  
> +/*
> + * Mask used when checking the page offset value passed in via system
> + * calls.  This value will be converted to a loff_t which is signed.
> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> + * bit into account.
> + */
> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
> +
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct inode *inode = file_inode(file);
> @@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	/*
> -	 * Offset passed to mmap (before page shift) could have been
> -	 * negative when represented as a (l)off_t.
> +	 * page based offset in vm_pgoff could be sufficiently large to
> +	 * overflow a (l)off_t when converted to byte offset.
>  	 */
> -	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
>  		return -EINVAL;
>  
> +	/* must be huge page aligned */
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>  		return -EINVAL;
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7c204e3d132b..8eeade0a0b7a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	struct resv_map *resv_map;
>  	long gbl_reserve;
>  
> +	/* This should never happen */
> +	if (from > to) {
> +		VM_WARN(1, "%s called with a negative range\n", __func__);
> +		return -EINVAL;
> +	}
> +
>  	/*
>  	 * Only apply hugepage reservation if asked. At fault time, an
>  	 * attempt will be made for VM_NORESERVE to allocate a page
> -- 
> 2.13.6

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-16 10:17     ` Michal Hocko
@ 2018-03-16 16:19       ` Mike Kravetz
  2018-03-16 16:53         ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2018-03-16 16:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On 03/16/2018 03:17 AM, Michal Hocko wrote:
> On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> 
> OK, looks good to me. Hairy but seems to be the easiest way around this.
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
<snip>
>> +/*
>> + * Mask used when checking the page offset value passed in via system
>> + * calls.  This value will be converted to a loff_t which is signed.
>> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
>> + * value.  The extra bit (- 1 in the shift value) is to take the sign
>> + * bit into account.
>> + */
>> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))

Thanks Michal,

However, kbuild found a problem with this definition on certain configs.
Consider a config where,
BITS_PER_LONG = 32 (32bit config)
PAGE_SHIFT = 16 (64K pages)
This results in the negative shift value.
Not something I would not immediately think of, but a valid config.

The definition has been changed to,
#define PGOFF_LOFFT_MAX \
	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))

as discussed here,
http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] hugetlbfs: check for pgoff value overflow
  2018-03-16 16:19       ` Mike Kravetz
@ 2018-03-16 16:53         ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2018-03-16 16:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, bugzilla-daemon, Kirill A . Shutemov,
	Nic Losby, Yisheng Xie, Andrew Morton, stable

On Fri 16-03-18 09:19:07, Mike Kravetz wrote:
> On 03/16/2018 03:17 AM, Michal Hocko wrote:
> > On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> > 
> > OK, looks good to me. Hairy but seems to be the easiest way around this.
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> <snip>
> >> +/*
> >> + * Mask used when checking the page offset value passed in via system
> >> + * calls.  This value will be converted to a loff_t which is signed.
> >> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> >> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> >> + * bit into account.
> >> + */
> >> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
> 
> Thanks Michal,
> 
> However, kbuild found a problem with this definition on certain configs.
> Consider a config where,
> BITS_PER_LONG = 32 (32bit config)
> PAGE_SHIFT = 16 (64K pages)
> This results in the negative shift value.
> Not something I would not immediately think of, but a valid config.

Well, 64K pages on 32b doesn't sound even remotely sane to me but what
ever.

> The definition has been changed to,
> #define PGOFF_LOFFT_MAX \
> 	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> 
> as discussed here,
> http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com

This looks more wild but seems correct as well. You can keep my acked-by

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-03-16 16:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-199037-27@https.bugzilla.kernel.org/>
2018-03-06 21:31 ` [Bug 199037] New: Kernel bug at mm/hugetlb.c:741 Andrew Morton
2018-03-06 21:46   ` Mike Kravetz
2018-03-07  0:31     ` Mike Kravetz
2018-03-07  2:49       ` Nic Losby
2018-03-07  4:19       ` Mike Kravetz
2018-03-07 16:39         ` Mike Kravetz
2018-03-07 23:59   ` [PATCH] hugetlbfs: check for pgoff value overflow Mike Kravetz
2018-03-08  0:57     ` Nic Losby
2018-03-08  1:35     ` Yisheng Xie
2018-03-08  1:35       ` Yisheng Xie
2018-03-08  4:25       ` Mike Kravetz
2018-03-08 21:03         ` Mike Kravetz
2018-03-08 21:05   ` [PATCH v2] " Mike Kravetz
2018-03-08 22:15     ` Andrew Morton
2018-03-08 23:37       ` Mike Kravetz
2018-03-08 23:53         ` Andrew Morton
2018-03-09  0:27   ` [PATCH v3] " Mike Kravetz
2018-03-09  1:05     ` Andrew Morton
2018-03-16 10:17     ` Michal Hocko
2018-03-16 16:19       ` Mike Kravetz
2018-03-16 16:53         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.