linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* UBSAN: Undefined behaviour in mm/page_alloc.c
@ 2018-11-09  4:09 Kyungtae Kim
  2018-11-09  8:42 ` Vlastimil Babka
  2018-11-09  8:43 ` Michal Hocko
  0 siblings, 2 replies; 22+ messages in thread
From: Kyungtae Kim @ 2018-11-09  4:09 UTC (permalink / raw)
  To: akpm, mhocko, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman
  Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm

We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):

kernel config: https://kt0755.github.io/etc/config_v2-4.19
repro: https://kt0755.github.io/etc/repro.c4074.c

In the middle of page request, this arose because order is too large to handle
 (mm/page_alloc.c:3119). It actually comes from that order is
controllable by user input
via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
To stop it, we can use like MAX_ORDER for bounds check before using it.

=========================================
UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
shift exponent 51 is too large for 32-bit type 'int'
CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xd2/0x148 lib/dump_stack.c:113
 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
 __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
 __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
 zone_watermark_fast mm/page_alloc.c:3216 [inline]
 get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
 __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
 alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
 alloc_pages include/linux/gfp.h:509 [inline]
 __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
 dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
 raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
 raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
 fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
 fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
 __blkdev_driver_ioctl block/ioctl.c:303 [inline]
 blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
 block_ioctl+0x105/0x150 fs/block_dev.c:1883
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
 __do_sys_ioctl fs/ioctl.c:709 [inline]
 __se_sys_ioctl fs/ioctl.c:707 [inline]
 __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
=========================================================


Thanks,
Kyungtae Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
@ 2018-11-09  8:42 ` Vlastimil Babka
  2018-11-09  8:43 ` Michal Hocko
  1 sibling, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09  8:42 UTC (permalink / raw)
  To: Kyungtae Kim, akpm, mhocko, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman
  Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/9/18 5:09 AM, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> 
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
> 
> In the middle of page request, this arose because order is too large to handle
>  (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.

This together with [1] makes me rather convinced that we should really
move the check back from __alloc_pages_slowpath to
__alloc_pages_nodemask. It should be a single predictable branch with an
unlikely()?

[1]
https://lore.kernel.org/lkml/154109387197.925352.10499549042420271600.stgit@buzz/T/#u

> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xd2/0x148 lib/dump_stack.c:113
>  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>  zone_watermark_fast mm/page_alloc.c:3216 [inline]
>  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>  block_ioctl+0x105/0x150 fs/block_dev.c:1883
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>  __se_sys_ioctl fs/ioctl.c:707 [inline]
>  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
> 
> 
> Thanks,
> Kyungtae Kim
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
  2018-11-09  8:42 ` Vlastimil Babka
@ 2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09  8:43 UTC (permalink / raw)
  To: Kyungtae Kim
  Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> 
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
> 
> In the middle of page request, this arose because order is too large to handle
>  (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.

Yes, we do only check the max order in the slow path. We have already
discussed something similar with Konstantin [1][2]. Basically kvmalloc
for a large size might get to the page allocator with an out of bound
order and warn during direct reclaim.

I am wondering whether really want to check for the order in the fast
path instead. I have hard time to imagine this could cause a measurable
impact.

The full patch is below

[1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
[2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz

> 
> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xd2/0x148 lib/dump_stack.c:113
>  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>  zone_watermark_fast mm/page_alloc.c:3216 [inline]
>  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>  block_ioctl+0x105/0x150 fs/block_dev.c:1883
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>  __se_sys_ioctl fs/ioctl.c:707 [inline]
>  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
> 
> 
> Thanks,
> Kyungtae Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
@ 2018-11-09  9:41   ` Tetsuo Handa
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09  9:56   ` Mel Gorman
  2018-11-13  9:43   ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09  9:41 UTC (permalink / raw)
  To: Michal Hocko, Kyungtae Kim
  Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 17:43, Michal Hocko wrote:
> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>  	struct alloc_context ac = { };
>  
> +	/*
> +	 * In the slowpath, we sanity check order to avoid ever trying to

Please keep the comment up to dated.
I don't like that comments in OOM code is outdated.

> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> +	 * be using allocators in order of preference for an area that is
> +	 * too large.
> +	 */
> +	if (order >= MAX_ORDER) {

Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?

> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> +		return NULL;
> +	}
> +
>  	gfp_mask &= gfp_allowed_mask;
>  	alloc_mask = gfp_mask;
>  	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:41   ` Tetsuo Handa
@ 2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09  9:56 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> On 2018/11/09 17:43, Michal Hocko wrote:
> > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >  	struct alloc_context ac = { };
> >  
> > +	/*
> > +	 * In the slowpath, we sanity check order to avoid ever trying to
> 
> Please keep the comment up to dated.

Does this following look better?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9fc10a1029cf..bf9aecba4222 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	struct alloc_context ac = { };
 
 	/*
-	 * In the slowpath, we sanity check order to avoid ever trying to
-	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
-	 * be using allocators in order of preference for an area that is
-	 * too large.
+	 * There are several places where we assume that the order value is sane
+	 * so bail out early if the request is out of bound.
 	 */
 	if (order >= MAX_ORDER) {
 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

> I don't like that comments in OOM code is outdated.
> 
> > +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > +	 * be using allocators in order of preference for an area that is
> > +	 * too large.
> > +	 */
> > +	if (order >= MAX_ORDER) {
> 
> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?

Because we do not want to blow up the kernel just because of a stupid
usage of the allocator. Can you think of an example where it would
actually make any sense?

I would argue that such a theoretical abuse would blow up on an
unchecked NULL ptr access. Isn't that enough?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
@ 2018-11-09  9:56   ` Mel Gorman
  2018-11-13  9:43   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2018-11-09  9:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri, Nov 09, 2018 at 09:43:53AM +0100, Michal Hocko wrote:
> On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> > 
> > kernel config: https://kt0755.github.io/etc/config_v2-4.19
> > repro: https://kt0755.github.io/etc/repro.c4074.c
> > 
> > In the middle of page request, this arose because order is too large to handle
> >  (mm/page_alloc.c:3119). It actually comes from that order is
> > controllable by user input
> > via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> > To stop it, we can use like MAX_ORDER for bounds check before using it.
> 
> Yes, we do only check the max order in the slow path. We have already
> discussed something similar with Konstantin [1][2]. Basically kvmalloc
> for a large size might get to the page allocator with an out of bound
> order and warn during direct reclaim.
> 
> I am wondering whether really want to check for the order in the fast
> path instead. I have hard time to imagine this could cause a measurable
> impact.
> 
> The full patch is below
> 
> [1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
> [2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz
> 

I'm ok with such changes under the policy "there is no point being fast if
we're broken". It's unfortunate and I know the original microoptimisation
was mine but if the fast-path check ends up being a problem then I/we go
back to finding ways of making the page allocator faster from a fundamental
algorithmic point of view and not a microoptimisation approach. There is
potential fruit there, just none that is low-hanging.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
@ 2018-11-09 10:07       ` Tetsuo Handa
  2018-11-09 10:25         ` Michal Hocko
  2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:52       ` Balbir Singh
  2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 18:56, Michal Hocko wrote:
> Does this following look better?

Yes.

>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?

We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
Stopping with BUG_ON() is saner than NULL pointer dereference messages.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
@ 2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:22         ` Michal Hocko
  2018-11-09 10:24         ` Tetsuo Handa
  2018-11-09 10:52       ` Balbir Singh
  2 siblings, 2 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09 10:10 UTC (permalink / raw)
  To: Michal Hocko, Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/9/18 10:56 AM, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
>> On 2018/11/09 17:43, Michal Hocko wrote:
>>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>>  	struct alloc_context ac = { };
>>>  
>>> +	/*
>>> +	 * In the slowpath, we sanity check order to avoid ever trying to
>>
>> Please keep the comment up to dated.
> 
> Does this following look better?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	struct alloc_context ac = { };
>  
>  	/*
> -	 * In the slowpath, we sanity check order to avoid ever trying to
> -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> -	 * be using allocators in order of preference for an area that is
> -	 * too large.
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
>  	 */
>  	if (order >= MAX_ORDER) {
>  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

Looks ok, but I'd add unlikely(), although it doesn't currently seem to
make any difference.

You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>

>> I don't like that comments in OOM code is outdated.
>>
>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>> +	 * be using allocators in order of preference for an area that is
>>> +	 * too large.
>>> +	 */
>>> +	if (order >= MAX_ORDER) {
>>
>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?

Agreed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:10       ` Vlastimil Babka
@ 2018-11-09 10:22         ` Michal Hocko
  2018-11-09 10:24         ` Tetsuo Handa
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
	rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 11:10:00, Vlastimil Babka wrote:
> On 11/9/18 10:56 AM, Michal Hocko wrote:
> > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> >> On 2018/11/09 17:43, Michal Hocko wrote:
> >>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>>  	struct alloc_context ac = { };
> >>>  
> >>> +	/*
> >>> +	 * In the slowpath, we sanity check order to avoid ever trying to
> >>
> >> Please keep the comment up to dated.
> > 
> > Does this following look better?
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9fc10a1029cf..bf9aecba4222 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >  	struct alloc_context ac = { };
> >  
> >  	/*
> > -	 * In the slowpath, we sanity check order to avoid ever trying to
> > -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > -	 * be using allocators in order of preference for an area that is
> > -	 * too large.
> > +	 * There are several places where we assume that the order value is sane
> > +	 * so bail out early if the request is out of bound.
> >  	 */
> >  	if (order >= MAX_ORDER) {
> >  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> 
> Looks ok, but I'd add unlikely(), although it doesn't currently seem to
> make any difference.
> 
> You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>

OK, I have added both. Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:22         ` Michal Hocko
@ 2018-11-09 10:24         ` Tetsuo Handa
  2018-11-09 10:28           ` Michal Hocko
  1 sibling, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:24 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>>> +	 * be using allocators in order of preference for an area that is
>>>> +	 * too large.
>>>> +	 */
>>>> +	if (order >= MAX_ORDER) {
>>>
>>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>>
>> Because we do not want to blow up the kernel just because of a stupid
>> usage of the allocator. Can you think of an example where it would
>> actually make any sense?
>>
>> I would argue that such a theoretical abuse would blow up on an
>> unchecked NULL ptr access. Isn't that enough?
> 
> Agreed.
> 

If someone has written a module with __GFP_NOFAIL for an architecture
where PAGE_SIZE == 2048KB, and someone else tried to use that module on
another architecture where PAGE_SIZE == 4KB. You are saying that
triggering NULL pointer dereference is a fault of that user's ignorance
about MM. You are saying that everyone knows internal of MM. Sad...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:07       ` Tetsuo Handa
@ 2018-11-09 10:25         ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:25 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 19:07:49, Tetsuo Handa wrote:
> On 2018/11/09 18:56, Michal Hocko wrote:
> > Does this following look better?
> 
> Yes.
> 
> >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> > 
> > Because we do not want to blow up the kernel just because of a stupid
> > usage of the allocator. Can you think of an example where it would
> > actually make any sense?
> > 
> > I would argue that such a theoretical abuse would blow up on an
> > unchecked NULL ptr access. Isn't that enough?
> 
> We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
> Stopping with BUG_ON() is saner than NULL pointer dereference messages.

I disagree (strongly to be more explicit). You never know the context
the allocator is called from. We do not want to oops with a random state
(locks heled etc). If the access blows up in the user then be it, the
bug will be clear and to be fixed but BUG_ON on an invalid core kernel
function is just a bad idea. I believe Linus was quite explicit about it
and I fully agree with him.

Besides that this is really off-topic to the issue at hands. Don't you
think?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:24         ` Tetsuo Handa
@ 2018-11-09 10:28           ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:28 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Vlastimil Babka, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
	rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 19:24:48, Tetsuo Handa wrote:
> On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> >>>> +	 * be using allocators in order of preference for an area that is
> >>>> +	 * too large.
> >>>> +	 */
> >>>> +	if (order >= MAX_ORDER) {
> >>>
> >>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> >>
> >> Because we do not want to blow up the kernel just because of a stupid
> >> usage of the allocator. Can you think of an example where it would
> >> actually make any sense?
> >>
> >> I would argue that such a theoretical abuse would blow up on an
> >> unchecked NULL ptr access. Isn't that enough?
> > 
> > Agreed.
> > 
> 
> If someone has written a module with __GFP_NOFAIL for an architecture
> where PAGE_SIZE == 2048KB, and someone else tried to use that module on
> another architecture where PAGE_SIZE == 4KB. You are saying that
> triggering NULL pointer dereference is a fault of that user's ignorance
> about MM. You are saying that everyone knows internal of MM. Sad...

What kind of argument is this? Seriously! We do consider GFP_NOFAIL
problematic even for !order-0 requests and warn appropriately. Talking
about anything getting close to MAX_ORDER is just a crazy talk. In any
case this is largely tangential to the issue reported here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
  2018-11-09 10:10       ` Vlastimil Babka
@ 2018-11-09 10:52       ` Balbir Singh
  2 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2018-11-09 10:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, vbabka,
	osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
	mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
	linux-mm, Konstantin Khlebnikov

On Fri, Nov 09, 2018 at 10:56:04AM +0100, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> > On 2018/11/09 17:43, Michal Hocko wrote:
> > > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> > >  	struct alloc_context ac = { };
> > >  
> > > +	/*
> > > +	 * In the slowpath, we sanity check order to avoid ever trying to
> > 
> > Please keep the comment up to dated.
> 
> Does this following look better?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	struct alloc_context ac = { };
>  
>  	/*
> -	 * In the slowpath, we sanity check order to avoid ever trying to
> -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> -	 * be using allocators in order of preference for an area that is
> -	 * too large.
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
>  	 */
>  	if (order >= MAX_ORDER) {
>  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

if (unlikely()) might help

> 
> > I don't like that comments in OOM code is outdated.
> > 
> > > +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > > +	 * be using allocators in order of preference for an area that is
> > > +	 * too large.
> > > +	 */
> > > +	if (order >= MAX_ORDER) {
> > 
> > Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?
> -- 

Balbir Singh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
  2018-11-09  9:56   ` Mel Gorman
@ 2018-11-13  9:43   ` Michal Hocko
  2018-11-13 23:15     ` Andrew Morton
  2018-11-13 23:29     ` Andrew Morton
  2 siblings, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-13  9:43 UTC (permalink / raw)
  To: akpm
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

Andrew, could you take the patch please? This patch contains a comment
update as suggested by Tetsuo. I do not think there were any other
unresolved concerns.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13  9:43   ` Michal Hocko
@ 2018-11-13 23:15     ` Andrew Morton
  2018-11-13 23:23       ` Vlastimil Babka
  2018-11-13 23:29     ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
> [...]
> [Thu Nov  1 08:43:56 2018] Call Trace:
> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> ...
> 
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>  	struct alloc_context ac = { };
>  
> +	/*
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
> +	 */
> +	if (unlikely(order >= MAX_ORDER)) {
> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> +		return NULL;
> +	}
> +

I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
we could help those who choose not to enable it by using

#ifdef CONFIG_DEBUG_VM
	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
		return NULL;
#endif

(Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:15     ` Andrew Morton
@ 2018-11-13 23:23       ` Vlastimil Babka
  2018-11-13 23:32         ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-13 23:23 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/14/18 12:15 AM, Andrew Morton wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>  	struct alloc_context ac = { };
>>  
>> +	/*
>> +	 * There are several places where we assume that the order value is sane
>> +	 * so bail out early if the request is out of bound.
>> +	 */
>> +	if (unlikely(order >= MAX_ORDER)) {
>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>> +		return NULL;
>> +	}
>> +
> 
> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> we could help those who choose not to enable it by using
> 
> #ifdef CONFIG_DEBUG_VM
> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> 		return NULL;
> #endif

Hmm, but that would mean there's still potential undefined behavior for
!CONFIG_DEBUG_VM, so I would prefer not to do it like that.

> 
> (Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13  9:43   ` Michal Hocko
  2018-11-13 23:15     ` Andrew Morton
@ 2018-11-13 23:29     ` Andrew Morton
  2018-11-14  7:10       ` Michal Hocko
  2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 9 Nov 2018 09:35:29 +0100
> Subject: [PATCH] mm, page_alloc: check for max order in hot path
> 
> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60

um, wait...

> [...]
> [Thu Nov  1 08:43:56 2018] Call Trace:
> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

If kvalloc_node() is going to call kmalloc() without checking for a
huge allocation request then surely it should set __GFP_NOWARN.  And it
shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely?  So
something like

--- a/mm/util.c~a
+++ a/mm/util.c
@@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
 	void *ret;
 
 	/*
-	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
-	 * so the given set of flags has to be compatible.
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
+	 * tables) so the given set of flags has to be compatible.
 	 */
-	if ((flags & GFP_KERNEL) != GFP_KERNEL)
+	if ((flags & GFP_KERNEL) != GFP_KERNEL) {
+		if (size > KMALLOC_MAX_SIZE)
+			return NULL;
+		if (size > PAGE_SIZE)
+			flags |= __GFP_NOWARN;
 		return kmalloc_node(size, flags, node);
+	}
 
 	/*
 	 * We want to attempt a large physically contiguous block first because


> the problem is that we only check for an out of bound order in the slow
> path and the node reclaim might happen from the fast path already. This
> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
> the code is rather fragile. A recent UBSAN report just underlines that
> by the following report
> 
>  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
>  shift exponent 51 is too large for 32-bit type 'int'
>  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0xd2/0x148 lib/dump_stack.c:113
>   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>   zone_watermark_fast mm/page_alloc.c:3216 [inline]
>   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>   alloc_pages include/linux/gfp.h:509 [inline]
>   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>   block_ioctl+0x105/0x150 fs/block_dev.c:1883
>   vfs_ioctl fs/ioctl.c:46 [inline]
>   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>   __se_sys_ioctl fs/ioctl.c:707 [inline]
>   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe

And we could fix this in the floppy driver.

> Note that this is not a kvmalloc path. It is just that the fast path
> really depends on having sanitzed order as well. Therefore move the
> order check to the fast path.

But do we really need to do this?  Are there any other known potential
callsites?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:23       ` Vlastimil Babka
@ 2018-11-13 23:32         ` Andrew Morton
  2018-11-14  0:05           ` Vlastimil Babka
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 11/14/18 12:15 AM, Andrew Morton wrote:
> > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>  	struct alloc_context ac = { };
> >>  
> >> +	/*
> >> +	 * There are several places where we assume that the order value is sane
> >> +	 * so bail out early if the request is out of bound.
> >> +	 */
> >> +	if (unlikely(order >= MAX_ORDER)) {
> >> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >> +		return NULL;
> >> +	}
> >> +
> > 
> > I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> > we could help those who choose not to enable it by using
> > 
> > #ifdef CONFIG_DEBUG_VM
> > 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> > 		return NULL;
> > #endif
> 
> Hmm, but that would mean there's still potential undefined behavior for
> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
> 

What does "potential undefined behavior" mean here?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:32         ` Andrew Morton
@ 2018-11-14  0:05           ` Vlastimil Babka
  2018-11-19 23:57             ` Pavel Machek
  0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-14  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/14/18 12:32 AM, Andrew Morton wrote:
> On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>> On 11/14/18 12:15 AM, Andrew Morton wrote:
>>> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>>>  	struct alloc_context ac = { };
>>>>  
>>>> +	/*
>>>> +	 * There are several places where we assume that the order value is sane
>>>> +	 * so bail out early if the request is out of bound.
>>>> +	 */
>>>> +	if (unlikely(order >= MAX_ORDER)) {
>>>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>
>>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
>>> we could help those who choose not to enable it by using
>>>
>>> #ifdef CONFIG_DEBUG_VM
>>> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
>>> 		return NULL;
>>> #endif
>>
>> Hmm, but that would mean there's still potential undefined behavior for
>> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
>>
> 
> What does "potential undefined behavior" mean here?

I mean that it becomes undefined once a caller with order >= MAX_ORDER
appears. Worse if it's directly due to a userspace action, like in this
case.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:29     ` Andrew Morton
@ 2018-11-14  7:10       ` Michal Hocko
  2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-14  7:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue 13-11-18 15:29:41, Andrew Morton wrote:
[...]
> But do we really need to do this?  Are there any other known potential
> callsites?

The main point is that the code as it stands is quite fragile, isn't it?
Fixing up all the callers is possible but can you actually think of a
reason why this would cause any measurable effect in the fast path?
The order argument is usually in a register and comparing it to a number
with unlikely branch should be hardly something visible.

Besides that we are talking few cycles at best compared to a fragile
code that got broken by accident without anybody noticing for quite some
time.

I vote for the maintainability over few cycles here. Should anybody find
this measurable we can rework the code by other means.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:29     ` Andrew Morton
  2018-11-14  7:10       ` Michal Hocko
@ 2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2018-11-16 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, Vlastimil Babka,
	osalvador, Mike Rapoport, aaron.lu, Joonsoo Kim,
	alexander.h.duyck, Mel Gorman, lifeasageek, Dae R. Jeong,
	syzkaller, LKML, Linux-MM, Konstantin Khlebnikov

On Tue, Nov 13, 2018 at 3:29 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Fri, 9 Nov 2018 09:35:29 +0100
>> Subject: [PATCH] mm, page_alloc: check for max order in hot path
>>
>> Konstantin has noticed that kvmalloc might trigger the following warning
>> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
>
> um, wait...
>
>> [...]
>> [Thu Nov  1 08:43:56 2018] Call Trace:
>> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
>> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
>> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
>> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
>> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
>> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
>> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
>> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
>> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
>> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
>> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
>> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
>> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
>> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
>> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
>> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
>> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
>> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> If kvalloc_node() is going to call kmalloc() without checking for a
> huge allocation request then surely it should set __GFP_NOWARN.


kmalloc won't warn about large allocations after "mm: don't warn about
large allocations for slab":
https://lkml.org/lkml/2018/9/27/1156
It will just return NULL. That was already the case for slub.


> And it
> shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely?  So
> something like
>
> --- a/mm/util.c~a
> +++ a/mm/util.c
> @@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
>         void *ret;
>
>         /*
> -        * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> -        * so the given set of flags has to be compatible.
> +        * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
> +        * tables) so the given set of flags has to be compatible.
>          */
> -       if ((flags & GFP_KERNEL) != GFP_KERNEL)
> +       if ((flags & GFP_KERNEL) != GFP_KERNEL) {
> +               if (size > KMALLOC_MAX_SIZE)
> +                       return NULL;
> +               if (size > PAGE_SIZE)
> +                       flags |= __GFP_NOWARN;
>                 return kmalloc_node(size, flags, node);
> +       }
>
>         /*
>          * We want to attempt a large physically contiguous block first because
>
>
>> the problem is that we only check for an out of bound order in the slow
>> path and the node reclaim might happen from the fast path already. This
>> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
>> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
>> the code is rather fragile. A recent UBSAN report just underlines that
>> by the following report
>>
>>  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
>>  shift exponent 51 is too large for 32-bit type 'int'
>>  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  Call Trace:
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0xd2/0x148 lib/dump_stack.c:113
>>   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>>   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>>   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>>   zone_watermark_fast mm/page_alloc.c:3216 [inline]
>>   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>>   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>>   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>>   alloc_pages include/linux/gfp.h:509 [inline]
>>   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>>   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>>   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>>   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>>   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>>   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>>   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>>   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>>   block_ioctl+0x105/0x150 fs/block_dev.c:1883
>>   vfs_ioctl fs/ioctl.c:46 [inline]
>>   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>>   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>>   __se_sys_ioctl fs/ioctl.c:707 [inline]
>>   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>>   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> And we could fix this in the floppy driver.
>
>> Note that this is not a kvmalloc path. It is just that the fast path
>> really depends on having sanitzed order as well. Therefore move the
>> order check to the fast path.
>
> But do we really need to do this?  Are there any other known potential
> callsites?
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-14  0:05           ` Vlastimil Babka
@ 2018-11-19 23:57             ` Pavel Machek
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Machek @ 2018-11-19 23:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Kyungtae Kim, pavel.tatashin,
	osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
	mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
	linux-mm, Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

Hi1

> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>>>  	struct alloc_context ac = { };
> >>>>  
> >>>> +	/*
> >>>> +	 * There are several places where we assume that the order value is sane
> >>>> +	 * so bail out early if the request is out of bound.
> >>>> +	 */
> >>>> +	if (unlikely(order >= MAX_ORDER)) {
> >>>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >>>> +		return NULL;
> >>>> +	}
> >>>> +
> >>>
> >>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> >>> we could help those who choose not to enable it by using
> >>>
> >>> #ifdef CONFIG_DEBUG_VM
> >>> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> >>> 		return NULL;
> >>> #endif
> >>
> >> Hmm, but that would mean there's still potential undefined behavior for
> >> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
> >>
> > 
> > What does "potential undefined behavior" mean here?
> 
> I mean that it becomes undefined once a caller with order >= MAX_ORDER
> appears. Worse if it's directly due to a userspace action, like in this
> case.

We should really check if value from userspace is sane _before_
passing it to alloc_pages(). Anything else is too fragile. Maybe
alloc_pages should do the second check, but...

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-19 23:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
2018-11-09  8:42 ` Vlastimil Babka
2018-11-09  8:43 ` Michal Hocko
2018-11-09  9:41   ` Tetsuo Handa
2018-11-09  9:56     ` Michal Hocko
2018-11-09 10:07       ` Tetsuo Handa
2018-11-09 10:25         ` Michal Hocko
2018-11-09 10:10       ` Vlastimil Babka
2018-11-09 10:22         ` Michal Hocko
2018-11-09 10:24         ` Tetsuo Handa
2018-11-09 10:28           ` Michal Hocko
2018-11-09 10:52       ` Balbir Singh
2018-11-09  9:56   ` Mel Gorman
2018-11-13  9:43   ` Michal Hocko
2018-11-13 23:15     ` Andrew Morton
2018-11-13 23:23       ` Vlastimil Babka
2018-11-13 23:32         ` Andrew Morton
2018-11-14  0:05           ` Vlastimil Babka
2018-11-19 23:57             ` Pavel Machek
2018-11-13 23:29     ` Andrew Morton
2018-11-14  7:10       ` Michal Hocko
2018-11-16 23:43       ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).