[PATCH RESEND] mm/hugetlb: Don't call region_abort if region

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
@ 2017-03-29 21:08 Mike Kravetz
  2017-03-29 21:17 ` Andrew Morton
  2017-04-10 21:38 ` Vegard Nossum
  0 siblings, 2 replies; 6+ messages in thread
From: Mike Kravetz @ 2017-03-29 21:08 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel
  Cc: Dmitry Vyukov, Hillf Danton, Michal Hocko, Kirill A . Shutemov,
	Andrey Ryabinin, Naoya Horiguchi, Mike Kravetz

Resending because of typo in Andrew's e-mail when first sent

Changes to hugetlbfs reservation maps is a two step process.  The first
step is a call to region_chg to determine what needs to be changed, and
prepare that change.  This should be followed by a call to call to
region_add to commit the change, or region_abort to abort the change.

The error path in hugetlb_reserve_pages called region_abort after a
failed call to region_chg.  As a result, the adds_in_progress counter
in the reservation map is off by 1.  This is caught by a VM_BUG_ON
in resv_map_release when the reservation map is freed.

syzkaller fuzzer found this bug, that resulted in the following:

 kernel BUG at mm/hugetlb.c:742!
 Call Trace:
  hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
  evict+0x481/0x920 fs/inode.c:553
  iput_final fs/inode.c:1515 [inline]
  iput+0x62b/0xa20 fs/inode.c:1542
  hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
  newseg+0x422/0xd30 ipc/shm.c:575
  ipcget_new ipc/util.c:285 [inline]
  ipcget+0x21e/0x580 ipc/util.c:639
  SYSC_shmget ipc/shm.c:673 [inline]
  SyS_shmget+0x158/0x230 ipc/shm.c:657
  entry_SYSCALL_64_fastpath+0x1f/0xc2
 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 mm/hugetlb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c7025c1..c65d45c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
 	return 0;
 out_err:
 	if (!vma || vma->vm_flags & VM_MAYSHARE)
-		region_abort(resv_map, from, to);
+		/* Don't call region_abort if region_chg failed */
+		if (chg >= 0)
+			region_abort(resv_map, from, to);
 	if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		kref_put(&resv_map->refs, resv_map_release);
 	return ret;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
  2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz
@ 2017-03-29 21:17 ` Andrew Morton
  2017-03-30 12:28   ` Dmitry Vyukov
  2017-04-10 21:38 ` Vegard Nossum
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2017-03-29 21:17 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Dmitry Vyukov, Hillf Danton,
	Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Naoya Horiguchi

On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Resending because of typo in Andrew's e-mail when first sent
> 
> Changes to hugetlbfs reservation maps is a two step process.  The first
> step is a call to region_chg to determine what needs to be changed, and
> prepare that change.  This should be followed by a call to call to
> region_add to commit the change, or region_abort to abort the change.
> 
> The error path in hugetlb_reserve_pages called region_abort after a
> failed call to region_chg.  As a result, the adds_in_progress counter
> in the reservation map is off by 1.  This is caught by a VM_BUG_ON
> in resv_map_release when the reservation map is freed.
> 
> syzkaller fuzzer found this bug, that resulted in the following:

I'll change the above to

: syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
: that resulted in the following:

it's important, because this bug won't be triggered (at all easily, at
least) in real-world workloads.

>  kernel BUG at mm/hugetlb.c:742!
>  Call Trace:
>   hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
>   evict+0x481/0x920 fs/inode.c:553
>   iput_final fs/inode.c:1515 [inline]
>   iput+0x62b/0xa20 fs/inode.c:1542
>   hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
>   newseg+0x422/0xd30 ipc/shm.c:575
>   ipcget_new ipc/util.c:285 [inline]
>   ipcget+0x21e/0x580 ipc/util.c:639
>   SYSC_shmget ipc/shm.c:673 [inline]
>   SyS_shmget+0x158/0x230 ipc/shm.c:657
>   entry_SYSCALL_64_fastpath+0x1f/0xc2
>  RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
  2017-03-29 21:17 ` Andrew Morton
@ 2017-03-30 12:28   ` Dmitry Vyukov
  2017-03-30 20:20     ` Mike Kravetz
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Vyukov @ 2017-03-30 12:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, linux-mm, LKML, Hillf Danton, Michal Hocko,
	Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi

On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> Resending because of typo in Andrew's e-mail when first sent
>>
>> Changes to hugetlbfs reservation maps is a two step process.  The first
>> step is a call to region_chg to determine what needs to be changed, and
>> prepare that change.  This should be followed by a call to call to
>> region_add to commit the change, or region_abort to abort the change.
>>
>> The error path in hugetlb_reserve_pages called region_abort after a
>> failed call to region_chg.  As a result, the adds_in_progress counter
>> in the reservation map is off by 1.  This is caught by a VM_BUG_ON
>> in resv_map_release when the reservation map is freed.
>>
>> syzkaller fuzzer found this bug, that resulted in the following:
>
> I'll change the above to
>
> : syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
> : that resulted in the following:
>
> it's important, because this bug won't be triggered (at all easily, at
> least) in real-world workloads.

I wonder if memory-constrained cgroups make such bugs much easier to trigger.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
  2017-03-30 12:28   ` Dmitry Vyukov
@ 2017-03-30 20:20     ` Mike Kravetz
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Kravetz @ 2017-03-30 20:20 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrew Morton
  Cc: linux-mm, LKML, Hillf Danton, Michal Hocko, Kirill A . Shutemov,
	Andrey Ryabinin, Naoya Horiguchi

On 03/30/2017 05:28 AM, Dmitry Vyukov wrote:
> On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>>>
>>> syzkaller fuzzer found this bug, that resulted in the following:
>>
>> I'll change the above to
>>
>> : syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
>> : that resulted in the following:
>>
>> it's important, because this bug won't be triggered (at all easily, at
>> least) in real-world workloads.
> 
> I wonder if memory-constrained cgroups make such bugs much easier to trigger.
> 

I think you might expose some bugs with memory-constrained cgroups.  However,
it is unlikely you could trigger this bug using that method.

In this bug the injected kmalloc failure was for a 32 byte allocation.  My
guess is that it would be very very unlikely/lucky to have the allocations
done by other routines on the stack succeed, and have this 32 byte allocation
fail.   

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
  2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz
  2017-03-29 21:17 ` Andrew Morton
@ 2017-04-10 21:38 ` Vegard Nossum
  2017-04-10 23:37   ` Mike Kravetz
  1 sibling, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2017-04-10 21:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Linux Memory Management List, LKML, Dmitry Vyukov,
	Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Naoya Horiguchi

On 29 March 2017 at 23:08, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> Changes to hugetlbfs reservation maps is a two step process.  The first
> step is a call to region_chg to determine what needs to be changed, and
> prepare that change.  This should be followed by a call to call to
> region_add to commit the change, or region_abort to abort the change.
>
> The error path in hugetlb_reserve_pages called region_abort after a
> failed call to region_chg.  As a result, the adds_in_progress counter
> in the reservation map is off by 1.  This is caught by a VM_BUG_ON
> in resv_map_release when the reservation map is freed.
>
> syzkaller fuzzer found this bug, that resulted in the following:
>
>  kernel BUG at mm/hugetlb.c:742!
>  Call Trace:
>   hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
>   evict+0x481/0x920 fs/inode.c:553
>   iput_final fs/inode.c:1515 [inline]
>   iput+0x62b/0xa20 fs/inode.c:1542
>   hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
>   newseg+0x422/0xd30 ipc/shm.c:575
>   ipcget_new ipc/util.c:285 [inline]
>   ipcget+0x21e/0x580 ipc/util.c:639
>   SYSC_shmget ipc/shm.c:673 [inline]
>   SyS_shmget+0x158/0x230 ipc/shm.c:657
>   entry_SYSCALL_64_fastpath+0x1f/0xc2
>  RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
> ---
>  mm/hugetlb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c7025c1..c65d45c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
>         return 0;
>  out_err:
>         if (!vma || vma->vm_flags & VM_MAYSHARE)
> -               region_abort(resv_map, from, to);
> +               /* Don't call region_abort if region_chg failed */
> +               if (chg >= 0)
> +                       region_abort(resv_map, from, to);
>         if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>                 kref_put(&resv_map->refs, resv_map_release);
>         return ret;

Hi guys,

I'm running into this on latest linus/master:

kernel BUG at mm/hugetlb.c:742!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff880064f30dc0 task.stack: ffff880065b38000
RIP: 0010:resv_map_release+0x1cb/0x270
RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287
RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000
RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410
RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418
FS:  00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0
Call Trace:
 hugetlbfs_evict_inode+0x80/0xa0
 ? hugetlbfs_setattr+0x3c0/0x3c0
 evict+0x24a/0x620
 iput+0x48f/0x8c0
 dentry_unlink_inode+0x31f/0x4d0
 __dentry_kill+0x292/0x5e0
 dput+0x730/0x830
 __fput+0x438/0x720
 ____fput+0x1a/0x20
 task_work_run+0xfe/0x180
 exit_to_usermode_loop+0x133/0x150
 syscall_return_slowpath+0x184/0x1c0
 entry_SYSCALL_64_fastpath+0xab/0xad

To reproduce:

mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Curiously enough, it's the patch from this thread (i.e. commit
ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it,
according to git bisect. Reverting the commit from linus/master fixes
the problem.

Also found by syzcaller (no fault injections this time).


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails
  2017-04-10 21:38 ` Vegard Nossum
@ 2017-04-10 23:37   ` Mike Kravetz
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Kravetz @ 2017-04-10 23:37 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Andrew Morton, Linux Memory Management List, LKML, Dmitry Vyukov,
	Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin,
	Naoya Horiguchi

On 04/10/2017 02:38 PM, Vegard Nossum wrote:
> On 29 March 2017 at 23:08, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>> Changes to hugetlbfs reservation maps is a two step process.  The first
>> step is a call to region_chg to determine what needs to be changed, and
>> prepare that change.  This should be followed by a call to call to
>> region_add to commit the change, or region_abort to abort the change.
>>
>> The error path in hugetlb_reserve_pages called region_abort after a
>> failed call to region_chg.  As a result, the adds_in_progress counter
>> in the reservation map is off by 1.  This is caught by a VM_BUG_ON
>> in resv_map_release when the reservation map is freed.
>>
>> syzkaller fuzzer found this bug, that resulted in the following:
>>
>>  kernel BUG at mm/hugetlb.c:742!
>>  Call Trace:
>>   hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
>>   evict+0x481/0x920 fs/inode.c:553
>>   iput_final fs/inode.c:1515 [inline]
>>   iput+0x62b/0xa20 fs/inode.c:1542
>>   hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
>>   newseg+0x422/0xd30 ipc/shm.c:575
>>   ipcget_new ipc/util.c:285 [inline]
>>   ipcget+0x21e/0x580 ipc/util.c:639
>>   SYSC_shmget ipc/shm.c:673 [inline]
>>   SyS_shmget+0x158/0x230 ipc/shm.c:657
>>   entry_SYSCALL_64_fastpath+0x1f/0xc2
>>  RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
>>
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
>> ---
>>  mm/hugetlb.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index c7025c1..c65d45c 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
>>         return 0;
>>  out_err:
>>         if (!vma || vma->vm_flags & VM_MAYSHARE)
>> -               region_abort(resv_map, from, to);
>> +               /* Don't call region_abort if region_chg failed */
>> +               if (chg >= 0)
>> +                       region_abort(resv_map, from, to);
>>         if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>>                 kref_put(&resv_map->refs, resv_map_release);
>>         return ret;
> 
> Hi guys,
> 
> I'm running into this on latest linus/master:
> 
> kernel BUG at mm/hugetlb.c:742!
> invalid opcode: 0000 [#1] SMP KASAN
> CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> task: ffff880064f30dc0 task.stack: ffff880065b38000
> RIP: 0010:resv_map_release+0x1cb/0x270
> RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287
> RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000
> RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410
> RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418
> FS:  00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0
> Call Trace:
>  hugetlbfs_evict_inode+0x80/0xa0
>  ? hugetlbfs_setattr+0x3c0/0x3c0
>  evict+0x24a/0x620
>  iput+0x48f/0x8c0
>  dentry_unlink_inode+0x31f/0x4d0
>  __dentry_kill+0x292/0x5e0
>  dput+0x730/0x830
>  __fput+0x438/0x720
>  ____fput+0x1a/0x20
>  task_work_run+0xfe/0x180
>  exit_to_usermode_loop+0x133/0x150
>  syscall_return_slowpath+0x184/0x1c0
>  entry_SYSCALL_64_fastpath+0xab/0xad
> 
> To reproduce:
> 
> mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL);
> 
> Curiously enough, it's the patch from this thread (i.e. commit
> ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it,
> according to git bisect. Reverting the commit from linus/master fixes
> the problem.

Thanks for finding this.

I do not think commit ff8c0c53 is the root cause of this BUG/issue.

Due to the very high offset (0x8000000000000000ULL) passed to mmap, there
is some overflow and/or truncation of values happening before getting to
the hugetlbfs reservation code.  The routine hugetlb_reserve_pages() is
passed a negative page offset value (from=4398046511104, to=-4398046511103).
Bad!!!  The routine region_chg() takes these values to determine how many
reservations are needed and calculates/returns a negative value.  This
appears as an error.  So, the code from commit ff8c0c53 prevents the call
to region_abort(), adds_in_progress does not get decremented and we hit the
BUG.

We should have never calculated and acted upon negative page offsets.  It
was just 'lucky' that things appeared to work before this commit.  I have
not yet determined all the things that could have gone wrong when passing
around these incorrect values.

I believe commit ff8c0c53 should remain.  I will start working on a fix
to this overflow and/or truncation of page offsets.

-- 
Mike Kravetz

> 
> Also found by syzcaller (no fault injections this time).
> 
> 
> Vegard
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-10 23:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz
2017-03-29 21:17 ` Andrew Morton
2017-03-30 12:28   ` Dmitry Vyukov
2017-03-30 20:20     ` Mike Kravetz
2017-04-10 21:38 ` Vegard Nossum
2017-04-10 23:37   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).