* UBSAN: Undefined behaviour in mm/page_alloc.c @ 2018-11-09 4:09 Kyungtae Kim 2018-11-09 8:42 ` Vlastimil Babka 2018-11-09 8:43 ` Michal Hocko 0 siblings, 2 replies; 22+ messages in thread From: Kyungtae Kim @ 2018-11-09 4:09 UTC (permalink / raw) To: akpm, mhocko, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess): kernel config: https://kt0755.github.io/etc/config_v2-4.19 repro: https://kt0755.github.io/etc/repro.c4074.c In the middle of page request, this arose because order is too large to handle (mm/page_alloc.c:3119). It actually comes from that order is controllable by user input via raw_cmd_ioctl without its sanity check, thereby causing memory problem. To stop it, we can use like MAX_ORDER for bounds check before using it. ========================================= UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 shift exponent 51 is too large for 32-bit type 'int' CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xd2/0x148 lib/dump_stack.c:113 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 zone_watermark_fast mm/page_alloc.c:3216 [inline] get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 alloc_pages include/linux/gfp.h:509 [inline] __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 raw_cmd_copyin drivers/block/floppy.c:3159 [inline] raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 __blkdev_driver_ioctl block/ioctl.c:303 [inline] blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 block_ioctl+0x105/0x150 fs/block_dev.c:1883 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 __do_sys_ioctl fs/ioctl.c:709 [inline] __se_sys_ioctl fs/ioctl.c:707 [inline] __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4497b9 Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9 RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014 RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700 ========================================================= Thanks, Kyungtae Kim ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim @ 2018-11-09 8:42 ` Vlastimil Babka 2018-11-09 8:43 ` Michal Hocko 1 sibling, 0 replies; 22+ messages in thread From: Vlastimil Babka @ 2018-11-09 8:42 UTC (permalink / raw) To: Kyungtae Kim, akpm, mhocko, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 11/9/18 5:09 AM, Kyungtae Kim wrote: > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess): > > kernel config: https://kt0755.github.io/etc/config_v2-4.19 > repro: https://kt0755.github.io/etc/repro.c4074.c > > In the middle of page request, this arose because order is too large to handle > (mm/page_alloc.c:3119). It actually comes from that order is > controllable by user input > via raw_cmd_ioctl without its sanity check, thereby causing memory problem. > To stop it, we can use like MAX_ORDER for bounds check before using it. This together with [1] makes me rather convinced that we should really move the check back from __alloc_pages_slowpath to __alloc_pages_nodemask. It should be a single predictable branch with an unlikely()? [1] https://lore.kernel.org/lkml/154109387197.925352.10499549042420271600.stgit@buzz/T/#u > ========================================= > UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 > shift exponent 51 is too large for 32-bit type 'int' > CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xd2/0x148 lib/dump_stack.c:113 > ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 > __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 > __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 > zone_watermark_fast mm/page_alloc.c:3216 [inline] > get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 > __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 > alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 > alloc_pages include/linux/gfp.h:509 [inline] > __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 > dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 > raw_cmd_copyin drivers/block/floppy.c:3159 [inline] > raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] > fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 > fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 > __blkdev_driver_ioctl block/ioctl.c:303 [inline] > blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 > block_ioctl+0x105/0x150 fs/block_dev.c:1883 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 > ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 > __do_sys_ioctl fs/ioctl.c:709 [inline] > __se_sys_ioctl fs/ioctl.c:707 [inline] > __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 > do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4497b9 > Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9 > RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014 > RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700 > ========================================================= > > > Thanks, > Kyungtae Kim > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim 2018-11-09 8:42 ` Vlastimil Babka @ 2018-11-09 8:43 ` Michal Hocko 2018-11-09 9:41 ` Tetsuo Handa ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-09 8:43 UTC (permalink / raw) To: Kyungtae Kim Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Thu 08-11-18 23:09:23, Kyungtae Kim wrote: > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess): > > kernel config: https://kt0755.github.io/etc/config_v2-4.19 > repro: https://kt0755.github.io/etc/repro.c4074.c > > In the middle of page request, this arose because order is too large to handle > (mm/page_alloc.c:3119). It actually comes from that order is > controllable by user input > via raw_cmd_ioctl without its sanity check, thereby causing memory problem. > To stop it, we can use like MAX_ORDER for bounds check before using it. Yes, we do only check the max order in the slow path. We have already discussed something similar with Konstantin [1][2]. Basically kvmalloc for a large size might get to the page allocator with an out of bound order and warn during direct reclaim. I am wondering whether really want to check for the order in the fast path instead. I have hard time to imagine this could cause a measurable impact. The full patch is below [1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz [2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz > > ========================================= > UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 > shift exponent 51 is too large for 32-bit type 'int' > CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xd2/0x148 lib/dump_stack.c:113 > ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 > __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 > __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 > zone_watermark_fast mm/page_alloc.c:3216 [inline] > get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 > __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 > alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 > alloc_pages include/linux/gfp.h:509 [inline] > __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 > dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 > raw_cmd_copyin drivers/block/floppy.c:3159 [inline] > raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] > fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 > fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 > __blkdev_driver_ioctl block/ioctl.c:303 [inline] > blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 > block_ioctl+0x105/0x150 fs/block_dev.c:1883 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 > ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 > __do_sys_ioctl fs/ioctl.c:709 [inline] > __se_sys_ioctl fs/ioctl.c:707 [inline] > __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 > do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4497b9 > Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9 > RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014 > RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700 > ========================================================= > > > Thanks, > Kyungtae Kim ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 8:43 ` Michal Hocko @ 2018-11-09 9:41 ` Tetsuo Handa 2018-11-09 9:56 ` Michal Hocko 2018-11-09 9:56 ` Mel Gorman 2018-11-13 9:43 ` Michal Hocko 2 siblings, 1 reply; 22+ messages in thread From: Tetsuo Handa @ 2018-11-09 9:41 UTC (permalink / raw) To: Michal Hocko, Kyungtae Kim Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 2018/11/09 17:43, Michal Hocko wrote: > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > struct alloc_context ac = { }; > > + /* > + * In the slowpath, we sanity check order to avoid ever trying to Please keep the comment up to dated. I don't like that comments in OOM code is outdated. > + * reclaim >= MAX_ORDER areas which will never succeed. Callers may > + * be using allocators in order of preference for an area that is > + * too large. > + */ > + if (order >= MAX_ORDER) { Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > + return NULL; > + } > + > gfp_mask &= gfp_allowed_mask; > alloc_mask = gfp_mask; > if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags)) > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 9:41 ` Tetsuo Handa @ 2018-11-09 9:56 ` Michal Hocko 2018-11-09 10:07 ` Tetsuo Handa ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-09 9:56 UTC (permalink / raw) To: Tetsuo Handa Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri 09-11-18 18:41:53, Tetsuo Handa wrote: > On 2018/11/09 17:43, Michal Hocko wrote: > > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > > struct alloc_context ac = { }; > > > > + /* > > + * In the slowpath, we sanity check order to avoid ever trying to > > Please keep the comment up to dated. Does this following look better? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9fc10a1029cf..bf9aecba4222 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, struct alloc_context ac = { }; /* - * In the slowpath, we sanity check order to avoid ever trying to - * reclaim >= MAX_ORDER areas which will never succeed. Callers may - * be using allocators in order of preference for an area that is - * too large. + * There are several places where we assume that the order value is sane + * so bail out early if the request is out of bound. */ if (order >= MAX_ORDER) { WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > I don't like that comments in OOM code is outdated. > > > + * reclaim >= MAX_ORDER areas which will never succeed. Callers may > > + * be using allocators in order of preference for an area that is > > + * too large. > > + */ > > + if (order >= MAX_ORDER) { > > Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? Because we do not want to blow up the kernel just because of a stupid usage of the allocator. Can you think of an example where it would actually make any sense? I would argue that such a theoretical abuse would blow up on an unchecked NULL ptr access. Isn't that enough? -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 9:56 ` Michal Hocko @ 2018-11-09 10:07 ` Tetsuo Handa 2018-11-09 10:25 ` Michal Hocko 2018-11-09 10:10 ` Vlastimil Babka 2018-11-09 10:52 ` Balbir Singh 2 siblings, 1 reply; 22+ messages in thread From: Tetsuo Handa @ 2018-11-09 10:07 UTC (permalink / raw) To: Michal Hocko Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 2018/11/09 18:56, Michal Hocko wrote: > Does this following look better? Yes. >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > > Because we do not want to blow up the kernel just because of a stupid > usage of the allocator. Can you think of an example where it would > actually make any sense? > > I would argue that such a theoretical abuse would blow up on an > unchecked NULL ptr access. Isn't that enough? We after all can't avoid blowing up the kernel even if we don't add BUG_ON(). Stopping with BUG_ON() is saner than NULL pointer dereference messages. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 10:07 ` Tetsuo Handa @ 2018-11-09 10:25 ` Michal Hocko 0 siblings, 0 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-09 10:25 UTC (permalink / raw) To: Tetsuo Handa Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri 09-11-18 19:07:49, Tetsuo Handa wrote: > On 2018/11/09 18:56, Michal Hocko wrote: > > Does this following look better? > > Yes. > > >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > > > > Because we do not want to blow up the kernel just because of a stupid > > usage of the allocator. Can you think of an example where it would > > actually make any sense? > > > > I would argue that such a theoretical abuse would blow up on an > > unchecked NULL ptr access. Isn't that enough? > > We after all can't avoid blowing up the kernel even if we don't add BUG_ON(). > Stopping with BUG_ON() is saner than NULL pointer dereference messages. I disagree (strongly to be more explicit). You never know the context the allocator is called from. We do not want to oops with a random state (locks heled etc). If the access blows up in the user then be it, the bug will be clear and to be fixed but BUG_ON on an invalid core kernel function is just a bad idea. I believe Linus was quite explicit about it and I fully agree with him. Besides that this is really off-topic to the issue at hands. Don't you think? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 9:56 ` Michal Hocko 2018-11-09 10:07 ` Tetsuo Handa @ 2018-11-09 10:10 ` Vlastimil Babka 2018-11-09 10:22 ` Michal Hocko 2018-11-09 10:24 ` Tetsuo Handa 2018-11-09 10:52 ` Balbir Singh 2 siblings, 2 replies; 22+ messages in thread From: Vlastimil Babka @ 2018-11-09 10:10 UTC (permalink / raw) To: Michal Hocko, Tetsuo Handa Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 11/9/18 10:56 AM, Michal Hocko wrote: > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote: >> On 2018/11/09 17:43, Michal Hocko wrote: >>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, >>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ >>> struct alloc_context ac = { }; >>> >>> + /* >>> + * In the slowpath, we sanity check order to avoid ever trying to >> >> Please keep the comment up to dated. > > Does this following look better? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9fc10a1029cf..bf9aecba4222 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > struct alloc_context ac = { }; > > /* > - * In the slowpath, we sanity check order to avoid ever trying to > - * reclaim >= MAX_ORDER areas which will never succeed. Callers may > - * be using allocators in order of preference for an area that is > - * too large. > + * There are several places where we assume that the order value is sane > + * so bail out early if the request is out of bound. > */ > if (order >= MAX_ORDER) { > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); Looks ok, but I'd add unlikely(), although it doesn't currently seem to make any difference. You can add Acked-by: Vlastimil Babka <vbabka@suse.cz> >> I don't like that comments in OOM code is outdated. >> >>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may >>> + * be using allocators in order of preference for an area that is >>> + * too large. >>> + */ >>> + if (order >= MAX_ORDER) { >> >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > > Because we do not want to blow up the kernel just because of a stupid > usage of the allocator. Can you think of an example where it would > actually make any sense? > > I would argue that such a theoretical abuse would blow up on an > unchecked NULL ptr access. Isn't that enough? Agreed. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 10:10 ` Vlastimil Babka @ 2018-11-09 10:22 ` Michal Hocko 2018-11-09 10:24 ` Tetsuo Handa 1 sibling, 0 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-09 10:22 UTC (permalink / raw) To: Vlastimil Babka Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri 09-11-18 11:10:00, Vlastimil Babka wrote: > On 11/9/18 10:56 AM, Michal Hocko wrote: > > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote: > >> On 2018/11/09 17:43, Michal Hocko wrote: > >>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > >>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > >>> struct alloc_context ac = { }; > >>> > >>> + /* > >>> + * In the slowpath, we sanity check order to avoid ever trying to > >> > >> Please keep the comment up to dated. > > > > Does this following look better? > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 9fc10a1029cf..bf9aecba4222 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > > struct alloc_context ac = { }; > > > > /* > > - * In the slowpath, we sanity check order to avoid ever trying to > > - * reclaim >= MAX_ORDER areas which will never succeed. Callers may > > - * be using allocators in order of preference for an area that is > > - * too large. > > + * There are several places where we assume that the order value is sane > > + * so bail out early if the request is out of bound. > > */ > > if (order >= MAX_ORDER) { > > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > > Looks ok, but I'd add unlikely(), although it doesn't currently seem to > make any difference. > > You can add Acked-by: Vlastimil Babka <vbabka@suse.cz> OK, I have added both. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 10:10 ` Vlastimil Babka 2018-11-09 10:22 ` Michal Hocko @ 2018-11-09 10:24 ` Tetsuo Handa 2018-11-09 10:28 ` Michal Hocko 1 sibling, 1 reply; 22+ messages in thread From: Tetsuo Handa @ 2018-11-09 10:24 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may >>>> + * be using allocators in order of preference for an area that is >>>> + * too large. >>>> + */ >>>> + if (order >= MAX_ORDER) { >>> >>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? >> >> Because we do not want to blow up the kernel just because of a stupid >> usage of the allocator. Can you think of an example where it would >> actually make any sense? >> >> I would argue that such a theoretical abuse would blow up on an >> unchecked NULL ptr access. Isn't that enough? > > Agreed. > If someone has written a module with __GFP_NOFAIL for an architecture where PAGE_SIZE == 2048KB, and someone else tried to use that module on another architecture where PAGE_SIZE == 4KB. You are saying that triggering NULL pointer dereference is a fault of that user's ignorance about MM. You are saying that everyone knows internal of MM. Sad... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 10:24 ` Tetsuo Handa @ 2018-11-09 10:28 ` Michal Hocko 0 siblings, 0 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-09 10:28 UTC (permalink / raw) To: Tetsuo Handa Cc: Vlastimil Babka, Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri 09-11-18 19:24:48, Tetsuo Handa wrote: > On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may > >>>> + * be using allocators in order of preference for an area that is > >>>> + * too large. > >>>> + */ > >>>> + if (order >= MAX_ORDER) { > >>> > >>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > >> > >> Because we do not want to blow up the kernel just because of a stupid > >> usage of the allocator. Can you think of an example where it would > >> actually make any sense? > >> > >> I would argue that such a theoretical abuse would blow up on an > >> unchecked NULL ptr access. Isn't that enough? > > > > Agreed. > > > > If someone has written a module with __GFP_NOFAIL for an architecture > where PAGE_SIZE == 2048KB, and someone else tried to use that module on > another architecture where PAGE_SIZE == 4KB. You are saying that > triggering NULL pointer dereference is a fault of that user's ignorance > about MM. You are saying that everyone knows internal of MM. Sad... What kind of argument is this? Seriously! We do consider GFP_NOFAIL problematic even for !order-0 requests and warn appropriately. Talking about anything getting close to MAX_ORDER is just a crazy talk. In any case this is largely tangential to the issue reported here. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 9:56 ` Michal Hocko 2018-11-09 10:07 ` Tetsuo Handa 2018-11-09 10:10 ` Vlastimil Babka @ 2018-11-09 10:52 ` Balbir Singh 2 siblings, 0 replies; 22+ messages in thread From: Balbir Singh @ 2018-11-09 10:52 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri, Nov 09, 2018 at 10:56:04AM +0100, Michal Hocko wrote: > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote: > > On 2018/11/09 17:43, Michal Hocko wrote: > > > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > > > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > > > struct alloc_context ac = { }; > > > > > > + /* > > > + * In the slowpath, we sanity check order to avoid ever trying to > > > > Please keep the comment up to dated. > > Does this following look better? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9fc10a1029cf..bf9aecba4222 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > struct alloc_context ac = { }; > > /* > - * In the slowpath, we sanity check order to avoid ever trying to > - * reclaim >= MAX_ORDER areas which will never succeed. Callers may > - * be using allocators in order of preference for an area that is > - * too large. > + * There are several places where we assume that the order value is sane > + * so bail out early if the request is out of bound. > */ > if (order >= MAX_ORDER) { > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); if (unlikely()) might help > > > I don't like that comments in OOM code is outdated. > > > > > + * reclaim >= MAX_ORDER areas which will never succeed. Callers may > > > + * be using allocators in order of preference for an area that is > > > + * too large. > > > + */ > > > + if (order >= MAX_ORDER) { > > > > Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here? > > Because we do not want to blow up the kernel just because of a stupid > usage of the allocator. Can you think of an example where it would > actually make any sense? > > I would argue that such a theoretical abuse would blow up on an > unchecked NULL ptr access. Isn't that enough? > -- Balbir Singh. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 8:43 ` Michal Hocko 2018-11-09 9:41 ` Tetsuo Handa @ 2018-11-09 9:56 ` Mel Gorman 2018-11-13 9:43 ` Michal Hocko 2 siblings, 0 replies; 22+ messages in thread From: Mel Gorman @ 2018-11-09 9:56 UTC (permalink / raw) To: Michal Hocko Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Fri, Nov 09, 2018 at 09:43:53AM +0100, Michal Hocko wrote: > On Thu 08-11-18 23:09:23, Kyungtae Kim wrote: > > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess): > > > > kernel config: https://kt0755.github.io/etc/config_v2-4.19 > > repro: https://kt0755.github.io/etc/repro.c4074.c > > > > In the middle of page request, this arose because order is too large to handle > > (mm/page_alloc.c:3119). It actually comes from that order is > > controllable by user input > > via raw_cmd_ioctl without its sanity check, thereby causing memory problem. > > To stop it, we can use like MAX_ORDER for bounds check before using it. > > Yes, we do only check the max order in the slow path. We have already > discussed something similar with Konstantin [1][2]. Basically kvmalloc > for a large size might get to the page allocator with an out of bound > order and warn during direct reclaim. > > I am wondering whether really want to check for the order in the fast > path instead. I have hard time to imagine this could cause a measurable > impact. > > The full patch is below > > [1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz > [2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz > I'm ok with such changes under the policy "there is no point being fast if we're broken". It's unfortunate and I know the original microoptimisation was mine but if the fast-path check ends up being a problem then I/we go back to finding ways of making the page allocator faster from a fundamental algorithmic point of view and not a microoptimisation approach. There is potential fruit there, just none that is low-hanging. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-09 8:43 ` Michal Hocko 2018-11-09 9:41 ` Tetsuo Handa 2018-11-09 9:56 ` Mel Gorman @ 2018-11-13 9:43 ` Michal Hocko 2018-11-13 23:15 ` Andrew Morton 2018-11-13 23:29 ` Andrew Morton 2 siblings, 2 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-13 9:43 UTC (permalink / raw) To: akpm Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov Andrew, could you take the patch please? This patch contains a comment update as suggested by Tetsuo. I do not think there were any other unresolved concerns. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 9:43 ` Michal Hocko @ 2018-11-13 23:15 ` Andrew Morton 2018-11-13 23:23 ` Vlastimil Babka 2018-11-13 23:29 ` Andrew Morton 1 sibling, 1 reply; 22+ messages in thread From: Andrew Morton @ 2018-11-13 23:15 UTC (permalink / raw) To: Michal Hocko Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: > Konstantin has noticed that kvmalloc might trigger the following warning > [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60 > [...] > [Thu Nov 1 08:43:56 2018] Call Trace: > [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90 > [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0 > [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310 > [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250 > [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0 > [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0 > [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90 > [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290 > [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0 > [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90 > [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0 > [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70 > [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables] > [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables] > [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60 > [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0 > [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120 > [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > struct alloc_context ac = { }; > > + /* > + * There are several places where we assume that the order value is sane > + * so bail out early if the request is out of bound. > + */ > + if (unlikely(order >= MAX_ORDER)) { > + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > + return NULL; > + } > + I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath, we could help those who choose not to enable it by using #ifdef CONFIG_DEBUG_VM if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN))) return NULL; #endif (Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals")) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 23:15 ` Andrew Morton @ 2018-11-13 23:23 ` Vlastimil Babka 2018-11-13 23:32 ` Andrew Morton 0 siblings, 1 reply; 22+ messages in thread From: Vlastimil Babka @ 2018-11-13 23:23 UTC (permalink / raw) To: Andrew Morton, Michal Hocko Cc: Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 11/14/18 12:15 AM, Andrew Morton wrote: > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: > >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, >> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ >> struct alloc_context ac = { }; >> >> + /* >> + * There are several places where we assume that the order value is sane >> + * so bail out early if the request is out of bound. >> + */ >> + if (unlikely(order >= MAX_ORDER)) { >> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); >> + return NULL; >> + } >> + > > I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath, > we could help those who choose not to enable it by using > > #ifdef CONFIG_DEBUG_VM > if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN))) > return NULL; > #endif Hmm, but that would mean there's still potential undefined behavior for !CONFIG_DEBUG_VM, so I would prefer not to do it like that. > > (Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals")) > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 23:23 ` Vlastimil Babka @ 2018-11-13 23:32 ` Andrew Morton 2018-11-14 0:05 ` Vlastimil Babka 0 siblings, 1 reply; 22+ messages in thread From: Andrew Morton @ 2018-11-13 23:32 UTC (permalink / raw) To: Vlastimil Babka Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote: > On 11/14/18 12:15 AM, Andrew Morton wrote: > > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > >> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > >> struct alloc_context ac = { }; > >> > >> + /* > >> + * There are several places where we assume that the order value is sane > >> + * so bail out early if the request is out of bound. > >> + */ > >> + if (unlikely(order >= MAX_ORDER)) { > >> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > >> + return NULL; > >> + } > >> + > > > > I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath, > > we could help those who choose not to enable it by using > > > > #ifdef CONFIG_DEBUG_VM > > if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN))) > > return NULL; > > #endif > > Hmm, but that would mean there's still potential undefined behavior for > !CONFIG_DEBUG_VM, so I would prefer not to do it like that. > What does "potential undefined behavior" mean here? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 23:32 ` Andrew Morton @ 2018-11-14 0:05 ` Vlastimil Babka 2018-11-19 23:57 ` Pavel Machek 0 siblings, 1 reply; 22+ messages in thread From: Vlastimil Babka @ 2018-11-14 0:05 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On 11/14/18 12:32 AM, Andrew Morton wrote: > On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote: > >> On 11/14/18 12:15 AM, Andrew Morton wrote: >>> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: >>> >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, >>>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ >>>> struct alloc_context ac = { }; >>>> >>>> + /* >>>> + * There are several places where we assume that the order value is sane >>>> + * so bail out early if the request is out of bound. >>>> + */ >>>> + if (unlikely(order >= MAX_ORDER)) { >>>> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); >>>> + return NULL; >>>> + } >>>> + >>> >>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath, >>> we could help those who choose not to enable it by using >>> >>> #ifdef CONFIG_DEBUG_VM >>> if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN))) >>> return NULL; >>> #endif >> >> Hmm, but that would mean there's still potential undefined behavior for >> !CONFIG_DEBUG_VM, so I would prefer not to do it like that. >> > > What does "potential undefined behavior" mean here? I mean that it becomes undefined once a caller with order >= MAX_ORDER appears. Worse if it's directly due to a userspace action, like in this case. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-14 0:05 ` Vlastimil Babka @ 2018-11-19 23:57 ` Pavel Machek 0 siblings, 0 replies; 22+ messages in thread From: Pavel Machek @ 2018-11-19 23:57 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov [-- Attachment #1: Type: text/plain, Size: 1647 bytes --] Hi1 > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > >>>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > >>>> struct alloc_context ac = { }; > >>>> > >>>> + /* > >>>> + * There are several places where we assume that the order value is sane > >>>> + * so bail out early if the request is out of bound. > >>>> + */ > >>>> + if (unlikely(order >= MAX_ORDER)) { > >>>> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > >>>> + return NULL; > >>>> + } > >>>> + > >>> > >>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath, > >>> we could help those who choose not to enable it by using > >>> > >>> #ifdef CONFIG_DEBUG_VM > >>> if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN))) > >>> return NULL; > >>> #endif > >> > >> Hmm, but that would mean there's still potential undefined behavior for > >> !CONFIG_DEBUG_VM, so I would prefer not to do it like that. > >> > > > > What does "potential undefined behavior" mean here? > > I mean that it becomes undefined once a caller with order >= MAX_ORDER > appears. Worse if it's directly due to a userspace action, like in this > case. We should really check if value from userspace is sane _before_ passing it to alloc_pages(). Anything else is too fragile. Maybe alloc_pages should do the second check, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 9:43 ` Michal Hocko 2018-11-13 23:15 ` Andrew Morton @ 2018-11-13 23:29 ` Andrew Morton 2018-11-14 7:10 ` Michal Hocko 2018-11-16 23:43 ` Dmitry Vyukov 1 sibling, 2 replies; 22+ messages in thread From: Andrew Morton @ 2018-11-13 23:29 UTC (permalink / raw) To: Michal Hocko Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: > From: Michal Hocko <mhocko@suse.com> > Date: Fri, 9 Nov 2018 09:35:29 +0100 > Subject: [PATCH] mm, page_alloc: check for max order in hot path > > Konstantin has noticed that kvmalloc might trigger the following warning > [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60 um, wait... > [...] > [Thu Nov 1 08:43:56 2018] Call Trace: > [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90 > [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0 > [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310 > [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250 > [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0 > [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0 > [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90 > [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290 > [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0 > [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90 > [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0 > [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70 > [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables] > [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables] > [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60 > [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0 > [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120 > [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 If kvalloc_node() is going to call kmalloc() without checking for a huge allocation request then surely it should set __GFP_NOWARN. And it shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely? So something like --- a/mm/util.c~a +++ a/mm/util.c @@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f void *ret; /* - * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables) - * so the given set of flags has to be compatible. + * vmalloc uses GFP_KERNEL for some internal allocations (e.g page + * tables) so the given set of flags has to be compatible. */ - if ((flags & GFP_KERNEL) != GFP_KERNEL) + if ((flags & GFP_KERNEL) != GFP_KERNEL) { + if (size > KMALLOC_MAX_SIZE) + return NULL; + if (size > PAGE_SIZE) + flags |= __GFP_NOWARN; return kmalloc_node(size, flags, node); + } /* * We want to attempt a large physically contiguous block first because > the problem is that we only check for an out of bound order in the slow > path and the node reclaim might happen from the fast path already. This > is fixable by making sure that kvmalloc doesn't ever use kmalloc for > requests that are larger than KMALLOC_MAX_SIZE but this also shows that > the code is rather fragile. A recent UBSAN report just underlines that > by the following report > > UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 > shift exponent 51 is too large for 32-bit type 'int' > CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xd2/0x148 lib/dump_stack.c:113 > ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 > __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 > __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 > zone_watermark_fast mm/page_alloc.c:3216 [inline] > get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 > __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 > alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 > alloc_pages include/linux/gfp.h:509 [inline] > __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 > dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 > raw_cmd_copyin drivers/block/floppy.c:3159 [inline] > raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] > fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 > fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 > __blkdev_driver_ioctl block/ioctl.c:303 [inline] > blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 > block_ioctl+0x105/0x150 fs/block_dev.c:1883 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 > ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 > __do_sys_ioctl fs/ioctl.c:709 [inline] > __se_sys_ioctl fs/ioctl.c:707 [inline] > __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 > do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe And we could fix this in the floppy driver. > Note that this is not a kvmalloc path. It is just that the fast path > really depends on having sanitzed order as well. Therefore move the > order check to the fast path. But do we really need to do this? Are there any other known potential callsites? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 23:29 ` Andrew Morton @ 2018-11-14 7:10 ` Michal Hocko 2018-11-16 23:43 ` Dmitry Vyukov 1 sibling, 0 replies; 22+ messages in thread From: Michal Hocko @ 2018-11-14 7:10 UTC (permalink / raw) To: Andrew Morton Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm, Konstantin Khlebnikov On Tue 13-11-18 15:29:41, Andrew Morton wrote: [...] > But do we really need to do this? Are there any other known potential > callsites? The main point is that the code as it stands is quite fragile, isn't it? Fixing up all the callers is possible but can you actually think of a reason why this would cause any measurable effect in the fast path? The order argument is usually in a register and comparing it to a number with unlikely branch should be hardly something visible. Besides that we are talking few cycles at best compared to a fragile code that got broken by accident without anybody noticing for quite some time. I vote for the maintainability over few cycles here. Should anybody find this measurable we can rework the code by other means. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c 2018-11-13 23:29 ` Andrew Morton 2018-11-14 7:10 ` Michal Hocko @ 2018-11-16 23:43 ` Dmitry Vyukov 1 sibling, 0 replies; 22+ messages in thread From: Dmitry Vyukov @ 2018-11-16 23:43 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, Vlastimil Babka, osalvador, Mike Rapoport, aaron.lu, Joonsoo Kim, alexander.h.duyck, Mel Gorman, lifeasageek, Dae R. Jeong, syzkaller, LKML, Linux-MM, Konstantin Khlebnikov On Tue, Nov 13, 2018 at 3:29 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote: > >> From: Michal Hocko <mhocko@suse.com> >> Date: Fri, 9 Nov 2018 09:35:29 +0100 >> Subject: [PATCH] mm, page_alloc: check for max order in hot path >> >> Konstantin has noticed that kvmalloc might trigger the following warning >> [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60 > > um, wait... > >> [...] >> [Thu Nov 1 08:43:56 2018] Call Trace: >> [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90 >> [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0 >> [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310 >> [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250 >> [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0 >> [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0 >> [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90 >> [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290 >> [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0 >> [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90 >> [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0 >> [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70 >> [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables] >> [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables] >> [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60 >> [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0 >> [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120 >> [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > If kvalloc_node() is going to call kmalloc() without checking for a > huge allocation request then surely it should set __GFP_NOWARN. kmalloc won't warn about large allocations after "mm: don't warn about large allocations for slab": https://lkml.org/lkml/2018/9/27/1156 It will just return NULL. That was already the case for slub. > And it > shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely? So > something like > > --- a/mm/util.c~a > +++ a/mm/util.c > @@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f > void *ret; > > /* > - * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables) > - * so the given set of flags has to be compatible. > + * vmalloc uses GFP_KERNEL for some internal allocations (e.g page > + * tables) so the given set of flags has to be compatible. > */ > - if ((flags & GFP_KERNEL) != GFP_KERNEL) > + if ((flags & GFP_KERNEL) != GFP_KERNEL) { > + if (size > KMALLOC_MAX_SIZE) > + return NULL; > + if (size > PAGE_SIZE) > + flags |= __GFP_NOWARN; > return kmalloc_node(size, flags, node); > + } > > /* > * We want to attempt a large physically contiguous block first because > > >> the problem is that we only check for an out of bound order in the slow >> path and the node reclaim might happen from the fast path already. This >> is fixable by making sure that kvmalloc doesn't ever use kmalloc for >> requests that are larger than KMALLOC_MAX_SIZE but this also shows that >> the code is rather fragile. A recent UBSAN report just underlines that >> by the following report >> >> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 >> shift exponent 51 is too large for 32-bit type 'int' >> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:77 [inline] >> dump_stack+0xd2/0x148 lib/dump_stack.c:113 >> ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 >> __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 >> __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 >> zone_watermark_fast mm/page_alloc.c:3216 [inline] >> get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 >> __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 >> alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 >> alloc_pages include/linux/gfp.h:509 [inline] >> __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 >> dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 >> raw_cmd_copyin drivers/block/floppy.c:3159 [inline] >> raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] >> fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 >> fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 >> __blkdev_driver_ioctl block/ioctl.c:303 [inline] >> blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 >> block_ioctl+0x105/0x150 fs/block_dev.c:1883 >> vfs_ioctl fs/ioctl.c:46 [inline] >> do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 >> ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 >> __do_sys_ioctl fs/ioctl.c:709 [inline] >> __se_sys_ioctl fs/ioctl.c:707 [inline] >> __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 >> do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 >> entry_SYSCALL_64_after_hwframe+0x49/0xbe > > And we could fix this in the floppy driver. > >> Note that this is not a kvmalloc path. It is just that the fast path >> really depends on having sanitzed order as well. Therefore move the >> order check to the fast path. > > But do we really need to do this? Are there any other known potential > callsites? > > -- > You received this message because you are subscribed to the Google Groups "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-11-19 23:57 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim 2018-11-09 8:42 ` Vlastimil Babka 2018-11-09 8:43 ` Michal Hocko 2018-11-09 9:41 ` Tetsuo Handa 2018-11-09 9:56 ` Michal Hocko 2018-11-09 10:07 ` Tetsuo Handa 2018-11-09 10:25 ` Michal Hocko 2018-11-09 10:10 ` Vlastimil Babka 2018-11-09 10:22 ` Michal Hocko 2018-11-09 10:24 ` Tetsuo Handa 2018-11-09 10:28 ` Michal Hocko 2018-11-09 10:52 ` Balbir Singh 2018-11-09 9:56 ` Mel Gorman 2018-11-13 9:43 ` Michal Hocko 2018-11-13 23:15 ` Andrew Morton 2018-11-13 23:23 ` Vlastimil Babka 2018-11-13 23:32 ` Andrew Morton 2018-11-14 0:05 ` Vlastimil Babka 2018-11-19 23:57 ` Pavel Machek 2018-11-13 23:29 ` Andrew Morton 2018-11-14 7:10 ` Michal Hocko 2018-11-16 23:43 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).