linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* UBSAN: Undefined behaviour in mm/page_alloc.c
@ 2018-11-09  4:09 Kyungtae Kim
  2018-11-09  8:42 ` Vlastimil Babka
  2018-11-09  8:43 ` Michal Hocko
  0 siblings, 2 replies; 22+ messages in thread
From: Kyungtae Kim @ 2018-11-09  4:09 UTC (permalink / raw)
  To: akpm, mhocko, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman
  Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm

We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):

kernel config: https://kt0755.github.io/etc/config_v2-4.19
repro: https://kt0755.github.io/etc/repro.c4074.c

In the middle of page request, this arose because order is too large to handle
 (mm/page_alloc.c:3119). It actually comes from that order is
controllable by user input
via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
To stop it, we can use like MAX_ORDER for bounds check before using it.

=========================================
UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
shift exponent 51 is too large for 32-bit type 'int'
CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xd2/0x148 lib/dump_stack.c:113
 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
 __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
 __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
 zone_watermark_fast mm/page_alloc.c:3216 [inline]
 get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
 __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
 alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
 alloc_pages include/linux/gfp.h:509 [inline]
 __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
 dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
 raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
 raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
 fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
 fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
 __blkdev_driver_ioctl block/ioctl.c:303 [inline]
 blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
 block_ioctl+0x105/0x150 fs/block_dev.c:1883
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
 __do_sys_ioctl fs/ioctl.c:709 [inline]
 __se_sys_ioctl fs/ioctl.c:707 [inline]
 __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
=========================================================


Thanks,
Kyungtae Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
@ 2018-11-09  8:42 ` Vlastimil Babka
  2018-11-09  8:43 ` Michal Hocko
  1 sibling, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09  8:42 UTC (permalink / raw)
  To: Kyungtae Kim, akpm, mhocko, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman
  Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/9/18 5:09 AM, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> 
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
> 
> In the middle of page request, this arose because order is too large to handle
>  (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.

This together with [1] makes me rather convinced that we should really
move the check back from __alloc_pages_slowpath to
__alloc_pages_nodemask. It should be a single predictable branch with an
unlikely()?

[1]
https://lore.kernel.org/lkml/154109387197.925352.10499549042420271600.stgit@buzz/T/#u

> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xd2/0x148 lib/dump_stack.c:113
>  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>  zone_watermark_fast mm/page_alloc.c:3216 [inline]
>  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>  block_ioctl+0x105/0x150 fs/block_dev.c:1883
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>  __se_sys_ioctl fs/ioctl.c:707 [inline]
>  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
> 
> 
> Thanks,
> Kyungtae Kim
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
  2018-11-09  8:42 ` Vlastimil Babka
@ 2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09  8:43 UTC (permalink / raw)
  To: Kyungtae Kim
  Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> 
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
> 
> In the middle of page request, this arose because order is too large to handle
>  (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.

Yes, we do only check the max order in the slow path. We have already
discussed something similar with Konstantin [1][2]. Basically kvmalloc
for a large size might get to the page allocator with an out of bound
order and warn during direct reclaim.

I am wondering whether really want to check for the order in the fast
path instead. I have hard time to imagine this could cause a measurable
impact.

The full patch is below

[1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
[2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz

> 
> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xd2/0x148 lib/dump_stack.c:113
>  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>  zone_watermark_fast mm/page_alloc.c:3216 [inline]
>  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>  block_ioctl+0x105/0x150 fs/block_dev.c:1883
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>  __se_sys_ioctl fs/ioctl.c:707 [inline]
>  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
> 
> 
> Thanks,
> Kyungtae Kim


From 7110220512be16054f2c8ee16bdd076c77c2456c Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 9 Nov 2018 09:35:29 +0100
Subject: [PATCH] mm, page_alloc: check for max order in hot path

Konstantin has noticed that kvmalloc might trigger the following warning
[Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[...]
[Thu Nov  1 08:43:56 2018] Call Trace:
[Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
[Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
[Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
[Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
[Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
[Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
[Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
[Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
[Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
[Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
[Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
[Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
[Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
[Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
[Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
[Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
[Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
[Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already. This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile. A recent UBSAN report just underlines that
by the following report

 UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
 shift exponent 51 is too large for 32-bit type 'int'
 CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xd2/0x148 lib/dump_stack.c:113
  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
  zone_watermark_fast mm/page_alloc.c:3216 [inline]
  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
  alloc_pages include/linux/gfp.h:509 [inline]
  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
  block_ioctl+0x105/0x150 fs/block_dev.c:1883
  vfs_ioctl fs/ioctl.c:46 [inline]
  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
  __do_sys_ioctl fs/ioctl.c:709 [inline]
  __se_sys_ioctl fs/ioctl.c:707 [inline]
  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note that this is not a kvmalloc path. It is just that the fast path
really depends on having sanitzed order as well. Therefore move the
order check to the fast path.

Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..9fc10a1029cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4060,17 +4060,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned int cpuset_mems_cookie;
 	int reserve_flags;
 
-	/*
-	 * In the slowpath, we sanity check order to avoid ever trying to
-	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
-	 * be using allocators in order of preference for an area that is
-	 * too large.
-	 */
-	if (order >= MAX_ORDER) {
-		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
-		return NULL;
-	}
-
 	/*
 	 * We also sanity check to catch abuse of atomic reserves being used by
 	 * callers that are not in atomic context.
@@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
 
+	/*
+	 * In the slowpath, we sanity check order to avoid ever trying to
+	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
+	 * be using allocators in order of preference for an area that is
+	 * too large.
+	 */
+	if (order >= MAX_ORDER) {
+		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
+		return NULL;
+	}
+
 	gfp_mask &= gfp_allowed_mask;
 	alloc_mask = gfp_mask;
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
-- 
2.19.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
@ 2018-11-09  9:41   ` Tetsuo Handa
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09  9:56   ` Mel Gorman
  2018-11-13  9:43   ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09  9:41 UTC (permalink / raw)
  To: Michal Hocko, Kyungtae Kim
  Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 17:43, Michal Hocko wrote:
> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>  	struct alloc_context ac = { };
>  
> +	/*
> +	 * In the slowpath, we sanity check order to avoid ever trying to

Please keep the comment up to dated.
I don't like that comments in OOM code is outdated.

> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> +	 * be using allocators in order of preference for an area that is
> +	 * too large.
> +	 */
> +	if (order >= MAX_ORDER) {

Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?

> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> +		return NULL;
> +	}
> +
>  	gfp_mask &= gfp_allowed_mask;
>  	alloc_mask = gfp_mask;
>  	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:41   ` Tetsuo Handa
@ 2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09  9:56 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> On 2018/11/09 17:43, Michal Hocko wrote:
> > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >  	struct alloc_context ac = { };
> >  
> > +	/*
> > +	 * In the slowpath, we sanity check order to avoid ever trying to
> 
> Please keep the comment up to dated.

Does this following look better?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9fc10a1029cf..bf9aecba4222 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	struct alloc_context ac = { };
 
 	/*
-	 * In the slowpath, we sanity check order to avoid ever trying to
-	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
-	 * be using allocators in order of preference for an area that is
-	 * too large.
+	 * There are several places where we assume that the order value is sane
+	 * so bail out early if the request is out of bound.
 	 */
 	if (order >= MAX_ORDER) {
 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

> I don't like that comments in OOM code is outdated.
> 
> > +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > +	 * be using allocators in order of preference for an area that is
> > +	 * too large.
> > +	 */
> > +	if (order >= MAX_ORDER) {
> 
> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?

Because we do not want to blow up the kernel just because of a stupid
usage of the allocator. Can you think of an example where it would
actually make any sense?

I would argue that such a theoretical abuse would blow up on an
unchecked NULL ptr access. Isn't that enough?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
@ 2018-11-09  9:56   ` Mel Gorman
  2018-11-13  9:43   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2018-11-09  9:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri, Nov 09, 2018 at 09:43:53AM +0100, Michal Hocko wrote:
> On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> > 
> > kernel config: https://kt0755.github.io/etc/config_v2-4.19
> > repro: https://kt0755.github.io/etc/repro.c4074.c
> > 
> > In the middle of page request, this arose because order is too large to handle
> >  (mm/page_alloc.c:3119). It actually comes from that order is
> > controllable by user input
> > via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> > To stop it, we can use like MAX_ORDER for bounds check before using it.
> 
> Yes, we do only check the max order in the slow path. We have already
> discussed something similar with Konstantin [1][2]. Basically kvmalloc
> for a large size might get to the page allocator with an out of bound
> order and warn during direct reclaim.
> 
> I am wondering whether really want to check for the order in the fast
> path instead. I have hard time to imagine this could cause a measurable
> impact.
> 
> The full patch is below
> 
> [1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
> [2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz
> 

I'm ok with such changes under the policy "there is no point being fast if
we're broken". It's unfortunate and I know the original microoptimisation
was mine but if the fast-path check ends up being a problem then I/we go
back to finding ways of making the page allocator faster from a fundamental
algorithmic point of view and not a microoptimisation approach. There is
potential fruit there, just none that is low-hanging.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
@ 2018-11-09 10:07       ` Tetsuo Handa
  2018-11-09 10:25         ` Michal Hocko
  2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:52       ` Balbir Singh
  2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 18:56, Michal Hocko wrote:
> Does this following look better?

Yes.

>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?

We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
Stopping with BUG_ON() is saner than NULL pointer dereference messages.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
@ 2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:22         ` Michal Hocko
  2018-11-09 10:24         ` Tetsuo Handa
  2018-11-09 10:52       ` Balbir Singh
  2 siblings, 2 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09 10:10 UTC (permalink / raw)
  To: Michal Hocko, Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/9/18 10:56 AM, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
>> On 2018/11/09 17:43, Michal Hocko wrote:
>>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>>  	struct alloc_context ac = { };
>>>  
>>> +	/*
>>> +	 * In the slowpath, we sanity check order to avoid ever trying to
>>
>> Please keep the comment up to dated.
> 
> Does this following look better?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	struct alloc_context ac = { };
>  
>  	/*
> -	 * In the slowpath, we sanity check order to avoid ever trying to
> -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> -	 * be using allocators in order of preference for an area that is
> -	 * too large.
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
>  	 */
>  	if (order >= MAX_ORDER) {
>  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

Looks ok, but I'd add unlikely(), although it doesn't currently seem to
make any difference.

You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>

>> I don't like that comments in OOM code is outdated.
>>
>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>> +	 * be using allocators in order of preference for an area that is
>>> +	 * too large.
>>> +	 */
>>> +	if (order >= MAX_ORDER) {
>>
>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?

Agreed.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:10       ` Vlastimil Babka
@ 2018-11-09 10:22         ` Michal Hocko
  2018-11-09 10:24         ` Tetsuo Handa
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
	rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 11:10:00, Vlastimil Babka wrote:
> On 11/9/18 10:56 AM, Michal Hocko wrote:
> > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> >> On 2018/11/09 17:43, Michal Hocko wrote:
> >>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>>  	struct alloc_context ac = { };
> >>>  
> >>> +	/*
> >>> +	 * In the slowpath, we sanity check order to avoid ever trying to
> >>
> >> Please keep the comment up to dated.
> > 
> > Does this following look better?
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9fc10a1029cf..bf9aecba4222 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >  	struct alloc_context ac = { };
> >  
> >  	/*
> > -	 * In the slowpath, we sanity check order to avoid ever trying to
> > -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > -	 * be using allocators in order of preference for an area that is
> > -	 * too large.
> > +	 * There are several places where we assume that the order value is sane
> > +	 * so bail out early if the request is out of bound.
> >  	 */
> >  	if (order >= MAX_ORDER) {
> >  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> 
> Looks ok, but I'd add unlikely(), although it doesn't currently seem to
> make any difference.
> 
> You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>

OK, I have added both. Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:10       ` Vlastimil Babka
  2018-11-09 10:22         ` Michal Hocko
@ 2018-11-09 10:24         ` Tetsuo Handa
  2018-11-09 10:28           ` Michal Hocko
  1 sibling, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:24 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>>> +	 * be using allocators in order of preference for an area that is
>>>> +	 * too large.
>>>> +	 */
>>>> +	if (order >= MAX_ORDER) {
>>>
>>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>>
>> Because we do not want to blow up the kernel just because of a stupid
>> usage of the allocator. Can you think of an example where it would
>> actually make any sense?
>>
>> I would argue that such a theoretical abuse would blow up on an
>> unchecked NULL ptr access. Isn't that enough?
> 
> Agreed.
> 

If someone has written a module with __GFP_NOFAIL for an architecture
where PAGE_SIZE == 2048KB, and someone else tried to use that module on
another architecture where PAGE_SIZE == 4KB. You are saying that
triggering NULL pointer dereference is a fault of that user's ignorance
about MM. You are saying that everyone knows internal of MM. Sad...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:07       ` Tetsuo Handa
@ 2018-11-09 10:25         ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:25 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 19:07:49, Tetsuo Handa wrote:
> On 2018/11/09 18:56, Michal Hocko wrote:
> > Does this following look better?
> 
> Yes.
> 
> >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> > 
> > Because we do not want to blow up the kernel just because of a stupid
> > usage of the allocator. Can you think of an example where it would
> > actually make any sense?
> > 
> > I would argue that such a theoretical abuse would blow up on an
> > unchecked NULL ptr access. Isn't that enough?
> 
> We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
> Stopping with BUG_ON() is saner than NULL pointer dereference messages.

I disagree (strongly to be more explicit). You never know the context
the allocator is called from. We do not want to oops with a random state
(locks heled etc). If the access blows up in the user then be it, the
bug will be clear and to be fixed but BUG_ON on an invalid core kernel
function is just a bad idea. I believe Linus was quite explicit about it
and I fully agree with him.

Besides that this is really off-topic to the issue at hands. Don't you
think?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09 10:24         ` Tetsuo Handa
@ 2018-11-09 10:28           ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:28 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Vlastimil Babka, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
	rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Fri 09-11-18 19:24:48, Tetsuo Handa wrote:
> On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> >>>> +	 * be using allocators in order of preference for an area that is
> >>>> +	 * too large.
> >>>> +	 */
> >>>> +	if (order >= MAX_ORDER) {
> >>>
> >>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> >>
> >> Because we do not want to blow up the kernel just because of a stupid
> >> usage of the allocator. Can you think of an example where it would
> >> actually make any sense?
> >>
> >> I would argue that such a theoretical abuse would blow up on an
> >> unchecked NULL ptr access. Isn't that enough?
> > 
> > Agreed.
> > 
> 
> If someone has written a module with __GFP_NOFAIL for an architecture
> where PAGE_SIZE == 2048KB, and someone else tried to use that module on
> another architecture where PAGE_SIZE == 4KB. You are saying that
> triggering NULL pointer dereference is a fault of that user's ignorance
> about MM. You are saying that everyone knows internal of MM. Sad...

What kind of argument is this? Seriously! We do consider GFP_NOFAIL
problematic even for !order-0 requests and warn appropriately. Talking
about anything getting close to MAX_ORDER is just a crazy talk. In any
case this is largely tangential to the issue reported here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  9:56     ` Michal Hocko
  2018-11-09 10:07       ` Tetsuo Handa
  2018-11-09 10:10       ` Vlastimil Babka
@ 2018-11-09 10:52       ` Balbir Singh
  2 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2018-11-09 10:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, vbabka,
	osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
	mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
	linux-mm, Konstantin Khlebnikov

On Fri, Nov 09, 2018 at 10:56:04AM +0100, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> > On 2018/11/09 17:43, Michal Hocko wrote:
> > > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> > >  	struct alloc_context ac = { };
> > >  
> > > +	/*
> > > +	 * In the slowpath, we sanity check order to avoid ever trying to
> > 
> > Please keep the comment up to dated.
> 
> Does this following look better?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	struct alloc_context ac = { };
>  
>  	/*
> -	 * In the slowpath, we sanity check order to avoid ever trying to
> -	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> -	 * be using allocators in order of preference for an area that is
> -	 * too large.
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
>  	 */
>  	if (order >= MAX_ORDER) {
>  		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

if (unlikely()) might help

> 
> > I don't like that comments in OOM code is outdated.
> > 
> > > +	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > > +	 * be using allocators in order of preference for an area that is
> > > +	 * too large.
> > > +	 */
> > > +	if (order >= MAX_ORDER) {
> > 
> > Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> 
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
> 
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?
> -- 

Balbir Singh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-09  8:43 ` Michal Hocko
  2018-11-09  9:41   ` Tetsuo Handa
  2018-11-09  9:56   ` Mel Gorman
@ 2018-11-13  9:43   ` Michal Hocko
  2018-11-13 23:15     ` Andrew Morton
  2018-11-13 23:29     ` Andrew Morton
  2 siblings, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-13  9:43 UTC (permalink / raw)
  To: akpm
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

Andrew, could you take the patch please? This patch contains a comment
update as suggested by Tetsuo. I do not think there were any other
unresolved concerns.


From 9ad6b1d9c07b18dd25a6af8cccbc56d1fbe6b922 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 9 Nov 2018 09:35:29 +0100
Subject: [PATCH] mm, page_alloc: check for max order in hot path

Konstantin has noticed that kvmalloc might trigger the following warning
[Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[...]
[Thu Nov  1 08:43:56 2018] Call Trace:
[Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
[Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
[Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
[Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
[Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
[Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
[Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
[Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
[Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
[Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
[Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
[Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
[Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
[Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
[Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
[Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
[Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
[Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already. This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile. A recent UBSAN report just underlines that
by the following report

 UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
 shift exponent 51 is too large for 32-bit type 'int'
 CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xd2/0x148 lib/dump_stack.c:113
  ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
  __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
  __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
  zone_watermark_fast mm/page_alloc.c:3216 [inline]
  get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
  __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
  alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
  alloc_pages include/linux/gfp.h:509 [inline]
  __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
  dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
  raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
  raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
  fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
  fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
  blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
  block_ioctl+0x105/0x150 fs/block_dev.c:1883
  vfs_ioctl fs/ioctl.c:46 [inline]
  do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
  ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
  __do_sys_ioctl fs/ioctl.c:709 [inline]
  __se_sys_ioctl fs/ioctl.c:707 [inline]
  __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
  do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note that this is not a kvmalloc path. It is just that the fast path
really depends on having sanitzed order as well. Therefore move the
order check to the fast path.

Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..a2b68c513e22 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4060,17 +4060,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned int cpuset_mems_cookie;
 	int reserve_flags;
 
-	/*
-	 * In the slowpath, we sanity check order to avoid ever trying to
-	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
-	 * be using allocators in order of preference for an area that is
-	 * too large.
-	 */
-	if (order >= MAX_ORDER) {
-		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
-		return NULL;
-	}
-
 	/*
 	 * We also sanity check to catch abuse of atomic reserves being used by
 	 * callers that are not in atomic context.
@@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
 
+	/*
+	 * There are several places where we assume that the order value is sane
+	 * so bail out early if the request is out of bound.
+	 */
+	if (unlikely(order >= MAX_ORDER)) {
+		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
+		return NULL;
+	}
+
 	gfp_mask &= gfp_allowed_mask;
 	alloc_mask = gfp_mask;
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
-- 
2.19.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13  9:43   ` Michal Hocko
@ 2018-11-13 23:15     ` Andrew Morton
  2018-11-13 23:23       ` Vlastimil Babka
  2018-11-13 23:29     ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
> [...]
> [Thu Nov  1 08:43:56 2018] Call Trace:
> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> ...
> 
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>  	struct alloc_context ac = { };
>  
> +	/*
> +	 * There are several places where we assume that the order value is sane
> +	 * so bail out early if the request is out of bound.
> +	 */
> +	if (unlikely(order >= MAX_ORDER)) {
> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> +		return NULL;
> +	}
> +

I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
we could help those who choose not to enable it by using

#ifdef CONFIG_DEBUG_VM
	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
		return NULL;
#endif

(Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:15     ` Andrew Morton
@ 2018-11-13 23:23       ` Vlastimil Babka
  2018-11-13 23:32         ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-13 23:23 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/14/18 12:15 AM, Andrew Morton wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>  	struct alloc_context ac = { };
>>  
>> +	/*
>> +	 * There are several places where we assume that the order value is sane
>> +	 * so bail out early if the request is out of bound.
>> +	 */
>> +	if (unlikely(order >= MAX_ORDER)) {
>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>> +		return NULL;
>> +	}
>> +
> 
> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> we could help those who choose not to enable it by using
> 
> #ifdef CONFIG_DEBUG_VM
> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> 		return NULL;
> #endif

Hmm, but that would mean there's still potential undefined behavior for
!CONFIG_DEBUG_VM, so I would prefer not to do it like that.

> 
> (Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13  9:43   ` Michal Hocko
  2018-11-13 23:15     ` Andrew Morton
@ 2018-11-13 23:29     ` Andrew Morton
  2018-11-14  7:10       ` Michal Hocko
  2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 9 Nov 2018 09:35:29 +0100
> Subject: [PATCH] mm, page_alloc: check for max order in hot path
> 
> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60

um, wait...

> [...]
> [Thu Nov  1 08:43:56 2018] Call Trace:
> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

If kvalloc_node() is going to call kmalloc() without checking for a
huge allocation request then surely it should set __GFP_NOWARN.  And it
shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely?  So
something like

--- a/mm/util.c~a
+++ a/mm/util.c
@@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
 	void *ret;
 
 	/*
-	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
-	 * so the given set of flags has to be compatible.
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
+	 * tables) so the given set of flags has to be compatible.
 	 */
-	if ((flags & GFP_KERNEL) != GFP_KERNEL)
+	if ((flags & GFP_KERNEL) != GFP_KERNEL) {
+		if (size > KMALLOC_MAX_SIZE)
+			return NULL;
+		if (size > PAGE_SIZE)
+			flags |= __GFP_NOWARN;
 		return kmalloc_node(size, flags, node);
+	}
 
 	/*
 	 * We want to attempt a large physically contiguous block first because


> the problem is that we only check for an out of bound order in the slow
> path and the node reclaim might happen from the fast path already. This
> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
> the code is rather fragile. A recent UBSAN report just underlines that
> by the following report
> 
>  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
>  shift exponent 51 is too large for 32-bit type 'int'
>  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0xd2/0x148 lib/dump_stack.c:113
>   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>   zone_watermark_fast mm/page_alloc.c:3216 [inline]
>   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>   alloc_pages include/linux/gfp.h:509 [inline]
>   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>   block_ioctl+0x105/0x150 fs/block_dev.c:1883
>   vfs_ioctl fs/ioctl.c:46 [inline]
>   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>   __se_sys_ioctl fs/ioctl.c:707 [inline]
>   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe

And we could fix this in the floppy driver.

> Note that this is not a kvmalloc path. It is just that the fast path
> really depends on having sanitzed order as well. Therefore move the
> order check to the fast path.

But do we really need to do this?  Are there any other known potential
callsites?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:23       ` Vlastimil Babka
@ 2018-11-13 23:32         ` Andrew Morton
  2018-11-14  0:05           ` Vlastimil Babka
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 11/14/18 12:15 AM, Andrew Morton wrote:
> > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>  	struct alloc_context ac = { };
> >>  
> >> +	/*
> >> +	 * There are several places where we assume that the order value is sane
> >> +	 * so bail out early if the request is out of bound.
> >> +	 */
> >> +	if (unlikely(order >= MAX_ORDER)) {
> >> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >> +		return NULL;
> >> +	}
> >> +
> > 
> > I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> > we could help those who choose not to enable it by using
> > 
> > #ifdef CONFIG_DEBUG_VM
> > 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> > 		return NULL;
> > #endif
> 
> Hmm, but that would mean there's still potential undefined behavior for
> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
> 

What does "potential undefined behavior" mean here?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:32         ` Andrew Morton
@ 2018-11-14  0:05           ` Vlastimil Babka
  2018-11-19 23:57             ` Pavel Machek
  0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-14  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
	aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
	lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On 11/14/18 12:32 AM, Andrew Morton wrote:
> On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>> On 11/14/18 12:15 AM, Andrew Morton wrote:
>>> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>>>  	struct alloc_context ac = { };
>>>>  
>>>> +	/*
>>>> +	 * There are several places where we assume that the order value is sane
>>>> +	 * so bail out early if the request is out of bound.
>>>> +	 */
>>>> +	if (unlikely(order >= MAX_ORDER)) {
>>>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>
>>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
>>> we could help those who choose not to enable it by using
>>>
>>> #ifdef CONFIG_DEBUG_VM
>>> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
>>> 		return NULL;
>>> #endif
>>
>> Hmm, but that would mean there's still potential undefined behavior for
>> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
>>
> 
> What does "potential undefined behavior" mean here?

I mean that it becomes undefined once a caller with order >= MAX_ORDER
appears. Worse if it's directly due to a userspace action, like in this
case.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:29     ` Andrew Morton
@ 2018-11-14  7:10       ` Michal Hocko
  2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-14  7:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
	iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
	threeearcat, syzkaller, linux-kernel, linux-mm,
	Konstantin Khlebnikov

On Tue 13-11-18 15:29:41, Andrew Morton wrote:
[...]
> But do we really need to do this?  Are there any other known potential
> callsites?

The main point is that the code as it stands is quite fragile, isn't it?
Fixing up all the callers is possible but can you actually think of a
reason why this would cause any measurable effect in the fast path?
The order argument is usually in a register and comparing it to a number
with unlikely branch should be hardly something visible.

Besides that we are talking few cycles at best compared to a fragile
code that got broken by accident without anybody noticing for quite some
time.

I vote for the maintainability over few cycles here. Should anybody find
this measurable we can rework the code by other means.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-13 23:29     ` Andrew Morton
  2018-11-14  7:10       ` Michal Hocko
@ 2018-11-16 23:43       ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2018-11-16 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, Vlastimil Babka,
	osalvador, Mike Rapoport, aaron.lu, Joonsoo Kim,
	alexander.h.duyck, Mel Gorman, lifeasageek, Dae R. Jeong,
	syzkaller, LKML, Linux-MM, Konstantin Khlebnikov

On Tue, Nov 13, 2018 at 3:29 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Fri, 9 Nov 2018 09:35:29 +0100
>> Subject: [PATCH] mm, page_alloc: check for max order in hot path
>>
>> Konstantin has noticed that kvmalloc might trigger the following warning
>> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
>
> um, wait...
>
>> [...]
>> [Thu Nov  1 08:43:56 2018] Call Trace:
>> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
>> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
>> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
>> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
>> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
>> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
>> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
>> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
>> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
>> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
>> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
>> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
>> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
>> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
>> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
>> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
>> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
>> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> If kvalloc_node() is going to call kmalloc() without checking for a
> huge allocation request then surely it should set __GFP_NOWARN.


kmalloc won't warn about large allocations after "mm: don't warn about
large allocations for slab":
https://lkml.org/lkml/2018/9/27/1156
It will just return NULL. That was already the case for slub.


> And it
> shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely?  So
> something like
>
> --- a/mm/util.c~a
> +++ a/mm/util.c
> @@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
>         void *ret;
>
>         /*
> -        * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> -        * so the given set of flags has to be compatible.
> +        * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
> +        * tables) so the given set of flags has to be compatible.
>          */
> -       if ((flags & GFP_KERNEL) != GFP_KERNEL)
> +       if ((flags & GFP_KERNEL) != GFP_KERNEL) {
> +               if (size > KMALLOC_MAX_SIZE)
> +                       return NULL;
> +               if (size > PAGE_SIZE)
> +                       flags |= __GFP_NOWARN;
>                 return kmalloc_node(size, flags, node);
> +       }
>
>         /*
>          * We want to attempt a large physically contiguous block first because
>
>
>> the problem is that we only check for an out of bound order in the slow
>> path and the node reclaim might happen from the fast path already. This
>> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
>> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
>> the code is rather fragile. A recent UBSAN report just underlines that
>> by the following report
>>
>>  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
>>  shift exponent 51 is too large for 32-bit type 'int'
>>  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  Call Trace:
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0xd2/0x148 lib/dump_stack.c:113
>>   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>>   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>>   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>>   zone_watermark_fast mm/page_alloc.c:3216 [inline]
>>   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>>   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>>   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>>   alloc_pages include/linux/gfp.h:509 [inline]
>>   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>>   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>>   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>>   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>>   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>>   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>>   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>>   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>>   block_ioctl+0x105/0x150 fs/block_dev.c:1883
>>   vfs_ioctl fs/ioctl.c:46 [inline]
>>   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>>   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>>   __se_sys_ioctl fs/ioctl.c:707 [inline]
>>   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>>   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> And we could fix this in the floppy driver.
>
>> Note that this is not a kvmalloc path. It is just that the fast path
>> really depends on having sanitzed order as well. Therefore move the
>> order check to the fast path.
>
> But do we really need to do this?  Are there any other known potential
> callsites?
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
  2018-11-14  0:05           ` Vlastimil Babka
@ 2018-11-19 23:57             ` Pavel Machek
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Machek @ 2018-11-19 23:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Kyungtae Kim, pavel.tatashin,
	osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
	mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
	linux-mm, Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

Hi1

> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>>>  	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>>>  	struct alloc_context ac = { };
> >>>>  
> >>>> +	/*
> >>>> +	 * There are several places where we assume that the order value is sane
> >>>> +	 * so bail out early if the request is out of bound.
> >>>> +	 */
> >>>> +	if (unlikely(order >= MAX_ORDER)) {
> >>>> +		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >>>> +		return NULL;
> >>>> +	}
> >>>> +
> >>>
> >>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> >>> we could help those who choose not to enable it by using
> >>>
> >>> #ifdef CONFIG_DEBUG_VM
> >>> 	if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> >>> 		return NULL;
> >>> #endif
> >>
> >> Hmm, but that would mean there's still potential undefined behavior for
> >> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
> >>
> > 
> > What does "potential undefined behavior" mean here?
> 
> I mean that it becomes undefined once a caller with order >= MAX_ORDER
> appears. Worse if it's directly due to a userspace action, like in this
> case.

We should really check if value from userspace is sane _before_
passing it to alloc_pages(). Anything else is too fragile. Maybe
alloc_pages should do the second check, but...

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-19 23:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09  4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
2018-11-09  8:42 ` Vlastimil Babka
2018-11-09  8:43 ` Michal Hocko
2018-11-09  9:41   ` Tetsuo Handa
2018-11-09  9:56     ` Michal Hocko
2018-11-09 10:07       ` Tetsuo Handa
2018-11-09 10:25         ` Michal Hocko
2018-11-09 10:10       ` Vlastimil Babka
2018-11-09 10:22         ` Michal Hocko
2018-11-09 10:24         ` Tetsuo Handa
2018-11-09 10:28           ` Michal Hocko
2018-11-09 10:52       ` Balbir Singh
2018-11-09  9:56   ` Mel Gorman
2018-11-13  9:43   ` Michal Hocko
2018-11-13 23:15     ` Andrew Morton
2018-11-13 23:23       ` Vlastimil Babka
2018-11-13 23:32         ` Andrew Morton
2018-11-14  0:05           ` Vlastimil Babka
2018-11-19 23:57             ` Pavel Machek
2018-11-13 23:29     ` Andrew Morton
2018-11-14  7:10       ` Michal Hocko
2018-11-16 23:43       ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).