* UBSAN: Undefined behaviour in mm/page_alloc.c
@ 2018-11-09 4:09 Kyungtae Kim
2018-11-09 8:42 ` Vlastimil Babka
2018-11-09 8:43 ` Michal Hocko
0 siblings, 2 replies; 22+ messages in thread
From: Kyungtae Kim @ 2018-11-09 4:09 UTC (permalink / raw)
To: akpm, mhocko, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman
Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm
We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
kernel config: https://kt0755.github.io/etc/config_v2-4.19
repro: https://kt0755.github.io/etc/repro.c4074.c
In the middle of page request, this arose because order is too large to handle
(mm/page_alloc.c:3119). It actually comes from that order is
controllable by user input
via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
To stop it, we can use like MAX_ORDER for bounds check before using it.
=========================================
UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
shift exponent 51 is too large for 32-bit type 'int'
CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xd2/0x148 lib/dump_stack.c:113
ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
__ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
__zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
zone_watermark_fast mm/page_alloc.c:3216 [inline]
get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
__alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0x12/0x60 mm/page_alloc.c:4414
dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
block_ioctl+0x105/0x150 fs/block_dev.c:1883
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
__do_sys_ioctl fs/ioctl.c:709 [inline]
__se_sys_ioctl fs/ioctl.c:707 [inline]
__x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
=========================================================
Thanks,
Kyungtae Kim
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
@ 2018-11-09 8:42 ` Vlastimil Babka
2018-11-09 8:43 ` Michal Hocko
1 sibling, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09 8:42 UTC (permalink / raw)
To: Kyungtae Kim, akpm, mhocko, pavel.tatashin, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman
Cc: lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 11/9/18 5:09 AM, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
>
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
>
> In the middle of page request, this arose because order is too large to handle
> (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.
This together with [1] makes me rather convinced that we should really
move the check back from __alloc_pages_slowpath to
__alloc_pages_nodemask. It should be a single predictable branch with an
unlikely()?
[1]
https://lore.kernel.org/lkml/154109387197.925352.10499549042420271600.stgit@buzz/T/#u
> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xd2/0x148 lib/dump_stack.c:113
> ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
> __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
> __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
> zone_watermark_fast mm/page_alloc.c:3216 [inline]
> get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
> __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
> alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
> alloc_pages include/linux/gfp.h:509 [inline]
> __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
> dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
> raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
> raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
> fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
> fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
> __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
> block_ioctl+0x105/0x150 fs/block_dev.c:1883
> vfs_ioctl fs/ioctl.c:46 [inline]
> do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
> ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
> __do_sys_ioctl fs/ioctl.c:709 [inline]
> __se_sys_ioctl fs/ioctl.c:707 [inline]
> __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
> do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
>
>
> Thanks,
> Kyungtae Kim
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
2018-11-09 8:42 ` Vlastimil Babka
@ 2018-11-09 8:43 ` Michal Hocko
2018-11-09 9:41 ` Tetsuo Handa
` (2 more replies)
1 sibling, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 8:43 UTC (permalink / raw)
To: Kyungtae Kim
Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
>
> kernel config: https://kt0755.github.io/etc/config_v2-4.19
> repro: https://kt0755.github.io/etc/repro.c4074.c
>
> In the middle of page request, this arose because order is too large to handle
> (mm/page_alloc.c:3119). It actually comes from that order is
> controllable by user input
> via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> To stop it, we can use like MAX_ORDER for bounds check before using it.
Yes, we do only check the max order in the slow path. We have already
discussed something similar with Konstantin [1][2]. Basically kvmalloc
for a large size might get to the page allocator with an out of bound
order and warn during direct reclaim.
I am wondering whether really want to check for the order in the fast
path instead. I have hard time to imagine this could cause a measurable
impact.
The full patch is below
[1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
[2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz
>
> =========================================
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xd2/0x148 lib/dump_stack.c:113
> ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
> __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
> __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
> zone_watermark_fast mm/page_alloc.c:3216 [inline]
> get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
> __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
> alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
> alloc_pages include/linux/gfp.h:509 [inline]
> __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
> dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
> raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
> raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
> fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
> fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
> __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
> block_ioctl+0x105/0x150 fs/block_dev.c:1883
> vfs_ioctl fs/ioctl.c:46 [inline]
> do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
> ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
> __do_sys_ioctl fs/ioctl.c:709 [inline]
> __se_sys_ioctl fs/ioctl.c:707 [inline]
> __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
> do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb5ef0e2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007fb5ef0e36cc RCX: 00000000004497b9
> RDX: 0000000020000040 RSI: 0000000000000258 RDI: 0000000000000014
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000005490 R14: 00000000006ed530 R15: 00007fb5ef0e3700
> =========================================================
>
>
> Thanks,
> Kyungtae Kim
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 8:43 ` Michal Hocko
@ 2018-11-09 9:41 ` Tetsuo Handa
2018-11-09 9:56 ` Michal Hocko
2018-11-09 9:56 ` Mel Gorman
2018-11-13 9:43 ` Michal Hocko
2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 9:41 UTC (permalink / raw)
To: Michal Hocko, Kyungtae Kim
Cc: akpm, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 2018/11/09 17:43, Michal Hocko wrote:
> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> struct alloc_context ac = { };
>
> + /*
> + * In the slowpath, we sanity check order to avoid ever trying to
Please keep the comment up to dated.
I don't like that comments in OOM code is outdated.
> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> + * be using allocators in order of preference for an area that is
> + * too large.
> + */
> + if (order >= MAX_ORDER) {
Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> + return NULL;
> + }
> +
> gfp_mask &= gfp_allowed_mask;
> alloc_mask = gfp_mask;
> if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 9:41 ` Tetsuo Handa
@ 2018-11-09 9:56 ` Michal Hocko
2018-11-09 10:07 ` Tetsuo Handa
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 9:56 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> On 2018/11/09 17:43, Michal Hocko wrote:
> > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> > struct alloc_context ac = { };
> >
> > + /*
> > + * In the slowpath, we sanity check order to avoid ever trying to
>
> Please keep the comment up to dated.
Does this following look better?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9fc10a1029cf..bf9aecba4222 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
struct alloc_context ac = { };
/*
- * In the slowpath, we sanity check order to avoid ever trying to
- * reclaim >= MAX_ORDER areas which will never succeed. Callers may
- * be using allocators in order of preference for an area that is
- * too large.
+ * There are several places where we assume that the order value is sane
+ * so bail out early if the request is out of bound.
*/
if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> I don't like that comments in OOM code is outdated.
>
> > + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > + * be using allocators in order of preference for an area that is
> > + * too large.
> > + */
> > + if (order >= MAX_ORDER) {
>
> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
Because we do not want to blow up the kernel just because of a stupid
usage of the allocator. Can you think of an example where it would
actually make any sense?
I would argue that such a theoretical abuse would blow up on an
unchecked NULL ptr access. Isn't that enough?
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 8:43 ` Michal Hocko
2018-11-09 9:41 ` Tetsuo Handa
@ 2018-11-09 9:56 ` Mel Gorman
2018-11-13 9:43 ` Michal Hocko
2 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2018-11-09 9:56 UTC (permalink / raw)
To: Michal Hocko
Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Fri, Nov 09, 2018 at 09:43:53AM +0100, Michal Hocko wrote:
> On Thu 08-11-18 23:09:23, Kyungtae Kim wrote:
> > We report a bug in v4.19-rc2 (4.20-rc1 as well, I guess):
> >
> > kernel config: https://kt0755.github.io/etc/config_v2-4.19
> > repro: https://kt0755.github.io/etc/repro.c4074.c
> >
> > In the middle of page request, this arose because order is too large to handle
> > (mm/page_alloc.c:3119). It actually comes from that order is
> > controllable by user input
> > via raw_cmd_ioctl without its sanity check, thereby causing memory problem.
> > To stop it, we can use like MAX_ORDER for bounds check before using it.
>
> Yes, we do only check the max order in the slow path. We have already
> discussed something similar with Konstantin [1][2]. Basically kvmalloc
> for a large size might get to the page allocator with an out of bound
> order and warn during direct reclaim.
>
> I am wondering whether really want to check for the order in the fast
> path instead. I have hard time to imagine this could cause a measurable
> impact.
>
> The full patch is below
>
> [1] http://lkml.kernel.org/r/154109387197.925352.10499549042420271600.stgit@buzz
> [2] http://lkml.kernel.org/r/154106356066.887821.4649178319705436373.stgit@buzz
>
I'm ok with such changes under the policy "there is no point being fast if
we're broken". It's unfortunate and I know the original microoptimisation
was mine but if the fast-path check ends up being a problem then I/we go
back to finding ways of making the page allocator faster from a fundamental
algorithmic point of view and not a microoptimisation approach. There is
potential fruit there, just none that is low-hanging.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 9:56 ` Michal Hocko
@ 2018-11-09 10:07 ` Tetsuo Handa
2018-11-09 10:25 ` Michal Hocko
2018-11-09 10:10 ` Vlastimil Babka
2018-11-09 10:52 ` Balbir Singh
2 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:07 UTC (permalink / raw)
To: Michal Hocko
Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 2018/11/09 18:56, Michal Hocko wrote:
> Does this following look better?
Yes.
>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
>
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?
We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
Stopping with BUG_ON() is saner than NULL pointer dereference messages.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 9:56 ` Michal Hocko
2018-11-09 10:07 ` Tetsuo Handa
@ 2018-11-09 10:10 ` Vlastimil Babka
2018-11-09 10:22 ` Michal Hocko
2018-11-09 10:24 ` Tetsuo Handa
2018-11-09 10:52 ` Balbir Singh
2 siblings, 2 replies; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-09 10:10 UTC (permalink / raw)
To: Michal Hocko, Tetsuo Handa
Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 11/9/18 10:56 AM, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
>> On 2018/11/09 17:43, Michal Hocko wrote:
>>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>> struct alloc_context ac = { };
>>>
>>> + /*
>>> + * In the slowpath, we sanity check order to avoid ever trying to
>>
>> Please keep the comment up to dated.
>
> Does this following look better?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> struct alloc_context ac = { };
>
> /*
> - * In the slowpath, we sanity check order to avoid ever trying to
> - * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> - * be using allocators in order of preference for an area that is
> - * too large.
> + * There are several places where we assume that the order value is sane
> + * so bail out early if the request is out of bound.
> */
> if (order >= MAX_ORDER) {
> WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
Looks ok, but I'd add unlikely(), although it doesn't currently seem to
make any difference.
You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> I don't like that comments in OOM code is outdated.
>>
>>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>> + * be using allocators in order of preference for an area that is
>>> + * too large.
>>> + */
>>> + if (order >= MAX_ORDER) {
>>
>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
>
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?
Agreed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 10:10 ` Vlastimil Babka
@ 2018-11-09 10:22 ` Michal Hocko
2018-11-09 10:24 ` Tetsuo Handa
1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:22 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Fri 09-11-18 11:10:00, Vlastimil Babka wrote:
> On 11/9/18 10:56 AM, Michal Hocko wrote:
> > On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> >> On 2018/11/09 17:43, Michal Hocko wrote:
> >>> @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>> struct alloc_context ac = { };
> >>>
> >>> + /*
> >>> + * In the slowpath, we sanity check order to avoid ever trying to
> >>
> >> Please keep the comment up to dated.
> >
> > Does this following look better?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9fc10a1029cf..bf9aecba4222 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > struct alloc_context ac = { };
> >
> > /*
> > - * In the slowpath, we sanity check order to avoid ever trying to
> > - * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > - * be using allocators in order of preference for an area that is
> > - * too large.
> > + * There are several places where we assume that the order value is sane
> > + * so bail out early if the request is out of bound.
> > */
> > if (order >= MAX_ORDER) {
> > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>
> Looks ok, but I'd add unlikely(), although it doesn't currently seem to
> make any difference.
>
> You can add Acked-by: Vlastimil Babka <vbabka@suse.cz>
OK, I have added both. Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 10:10 ` Vlastimil Babka
2018-11-09 10:22 ` Michal Hocko
@ 2018-11-09 10:24 ` Tetsuo Handa
2018-11-09 10:28 ` Michal Hocko
1 sibling, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2018-11-09 10:24 UTC (permalink / raw)
To: Vlastimil Babka, Michal Hocko
Cc: Kyungtae Kim, akpm, pavel.tatashin, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>>>> + * be using allocators in order of preference for an area that is
>>>> + * too large.
>>>> + */
>>>> + if (order >= MAX_ORDER) {
>>>
>>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>>
>> Because we do not want to blow up the kernel just because of a stupid
>> usage of the allocator. Can you think of an example where it would
>> actually make any sense?
>>
>> I would argue that such a theoretical abuse would blow up on an
>> unchecked NULL ptr access. Isn't that enough?
>
> Agreed.
>
If someone has written a module with __GFP_NOFAIL for an architecture
where PAGE_SIZE == 2048KB, and someone else tried to use that module on
another architecture where PAGE_SIZE == 4KB. You are saying that
triggering NULL pointer dereference is a fault of that user's ignorance
about MM. You are saying that everyone knows internal of MM. Sad...
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 10:07 ` Tetsuo Handa
@ 2018-11-09 10:25 ` Michal Hocko
0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:25 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Kyungtae Kim, akpm, pavel.tatashin, vbabka, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Fri 09-11-18 19:07:49, Tetsuo Handa wrote:
> On 2018/11/09 18:56, Michal Hocko wrote:
> > Does this following look better?
>
> Yes.
>
> >> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> >
> > Because we do not want to blow up the kernel just because of a stupid
> > usage of the allocator. Can you think of an example where it would
> > actually make any sense?
> >
> > I would argue that such a theoretical abuse would blow up on an
> > unchecked NULL ptr access. Isn't that enough?
>
> We after all can't avoid blowing up the kernel even if we don't add BUG_ON().
> Stopping with BUG_ON() is saner than NULL pointer dereference messages.
I disagree (strongly to be more explicit). You never know the context
the allocator is called from. We do not want to oops with a random state
(locks heled etc). If the access blows up in the user then be it, the
bug will be clear and to be fixed but BUG_ON on an invalid core kernel
function is just a bad idea. I believe Linus was quite explicit about it
and I fully agree with him.
Besides that this is really off-topic to the issue at hands. Don't you
think?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 10:24 ` Tetsuo Handa
@ 2018-11-09 10:28 ` Michal Hocko
0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-09 10:28 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Vlastimil Babka, Kyungtae Kim, akpm, pavel.tatashin, osalvador,
rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Fri 09-11-18 19:24:48, Tetsuo Handa wrote:
> On 2018/11/09 19:10, Vlastimil Babka wrote:>>>> + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> >>>> + * be using allocators in order of preference for an area that is
> >>>> + * too large.
> >>>> + */
> >>>> + if (order >= MAX_ORDER) {
> >>>
> >>> Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
> >>
> >> Because we do not want to blow up the kernel just because of a stupid
> >> usage of the allocator. Can you think of an example where it would
> >> actually make any sense?
> >>
> >> I would argue that such a theoretical abuse would blow up on an
> >> unchecked NULL ptr access. Isn't that enough?
> >
> > Agreed.
> >
>
> If someone has written a module with __GFP_NOFAIL for an architecture
> where PAGE_SIZE == 2048KB, and someone else tried to use that module on
> another architecture where PAGE_SIZE == 4KB. You are saying that
> triggering NULL pointer dereference is a fault of that user's ignorance
> about MM. You are saying that everyone knows internal of MM. Sad...
What kind of argument is this? Seriously! We do consider GFP_NOFAIL
problematic even for !order-0 requests and warn appropriately. Talking
about anything getting close to MAX_ORDER is just a crazy talk. In any
case this is largely tangential to the issue reported here.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 9:56 ` Michal Hocko
2018-11-09 10:07 ` Tetsuo Handa
2018-11-09 10:10 ` Vlastimil Babka
@ 2018-11-09 10:52 ` Balbir Singh
2 siblings, 0 replies; 22+ messages in thread
From: Balbir Singh @ 2018-11-09 10:52 UTC (permalink / raw)
To: Michal Hocko
Cc: Tetsuo Handa, Kyungtae Kim, akpm, pavel.tatashin, vbabka,
osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
linux-mm, Konstantin Khlebnikov
On Fri, Nov 09, 2018 at 10:56:04AM +0100, Michal Hocko wrote:
> On Fri 09-11-18 18:41:53, Tetsuo Handa wrote:
> > On 2018/11/09 17:43, Michal Hocko wrote:
> > > @@ -4364,6 +4353,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > > gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> > > struct alloc_context ac = { };
> > >
> > > + /*
> > > + * In the slowpath, we sanity check order to avoid ever trying to
> >
> > Please keep the comment up to dated.
>
> Does this following look better?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fc10a1029cf..bf9aecba4222 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4354,10 +4354,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> struct alloc_context ac = { };
>
> /*
> - * In the slowpath, we sanity check order to avoid ever trying to
> - * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> - * be using allocators in order of preference for an area that is
> - * too large.
> + * There are several places where we assume that the order value is sane
> + * so bail out early if the request is out of bound.
> */
> if (order >= MAX_ORDER) {
> WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
if (unlikely()) might help
>
> > I don't like that comments in OOM code is outdated.
> >
> > > + * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > > + * be using allocators in order of preference for an area that is
> > > + * too large.
> > > + */
> > > + if (order >= MAX_ORDER) {
> >
> > Also, why not to add BUG_ON(gfp_mask & __GFP_NOFAIL); here?
>
> Because we do not want to blow up the kernel just because of a stupid
> usage of the allocator. Can you think of an example where it would
> actually make any sense?
>
> I would argue that such a theoretical abuse would blow up on an
> unchecked NULL ptr access. Isn't that enough?
> --
Balbir Singh.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-09 8:43 ` Michal Hocko
2018-11-09 9:41 ` Tetsuo Handa
2018-11-09 9:56 ` Mel Gorman
@ 2018-11-13 9:43 ` Michal Hocko
2018-11-13 23:15 ` Andrew Morton
2018-11-13 23:29 ` Andrew Morton
2 siblings, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-13 9:43 UTC (permalink / raw)
To: akpm
Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
Andrew, could you take the patch please? This patch contains a comment
update as suggested by Tetsuo. I do not think there were any other
unresolved concerns.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 9:43 ` Michal Hocko
@ 2018-11-13 23:15 ` Andrew Morton
2018-11-13 23:23 ` Vlastimil Babka
2018-11-13 23:29 ` Andrew Morton
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:15 UTC (permalink / raw)
To: Michal Hocko
Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
> [...]
> [Thu Nov 1 08:43:56 2018] Call Trace:
> [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90
> [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0
> [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310
> [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250
> [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0
> [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90
> [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90
> [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0
> [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70
> [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60
> [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0
> [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120
> [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> struct alloc_context ac = { };
>
> + /*
> + * There are several places where we assume that the order value is sane
> + * so bail out early if the request is out of bound.
> + */
> + if (unlikely(order >= MAX_ORDER)) {
> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> + return NULL;
> + }
> +
I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
we could help those who choose not to enable it by using
#ifdef CONFIG_DEBUG_VM
if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
return NULL;
#endif
(Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 23:15 ` Andrew Morton
@ 2018-11-13 23:23 ` Vlastimil Babka
2018-11-13 23:32 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-13 23:23 UTC (permalink / raw)
To: Andrew Morton, Michal Hocko
Cc: Kyungtae Kim, pavel.tatashin, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 11/14/18 12:15 AM, Andrew Morton wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>> struct alloc_context ac = { };
>>
>> + /*
>> + * There are several places where we assume that the order value is sane
>> + * so bail out early if the request is out of bound.
>> + */
>> + if (unlikely(order >= MAX_ORDER)) {
>> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>> + return NULL;
>> + }
>> +
>
> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> we could help those who choose not to enable it by using
>
> #ifdef CONFIG_DEBUG_VM
> if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> return NULL;
> #endif
Hmm, but that would mean there's still potential undefined behavior for
!CONFIG_DEBUG_VM, so I would prefer not to do it like that.
>
> (Again curses 91241681c62 ("include/linux/mmdebug.h: make VM_WARN* non-rvals"))
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 9:43 ` Michal Hocko
2018-11-13 23:15 ` Andrew Morton
@ 2018-11-13 23:29 ` Andrew Morton
2018-11-14 7:10 ` Michal Hocko
2018-11-16 23:43 ` Dmitry Vyukov
1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:29 UTC (permalink / raw)
To: Michal Hocko
Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 9 Nov 2018 09:35:29 +0100
> Subject: [PATCH] mm, page_alloc: check for max order in hot path
>
> Konstantin has noticed that kvmalloc might trigger the following warning
> [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
um, wait...
> [...]
> [Thu Nov 1 08:43:56 2018] Call Trace:
> [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90
> [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0
> [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310
> [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250
> [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0
> [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90
> [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90
> [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0
> [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70
> [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60
> [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0
> [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120
> [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
If kvalloc_node() is going to call kmalloc() without checking for a
huge allocation request then surely it should set __GFP_NOWARN. And it
shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely? So
something like
--- a/mm/util.c~a
+++ a/mm/util.c
@@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
void *ret;
/*
- * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
- * so the given set of flags has to be compatible.
+ * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
+ * tables) so the given set of flags has to be compatible.
*/
- if ((flags & GFP_KERNEL) != GFP_KERNEL)
+ if ((flags & GFP_KERNEL) != GFP_KERNEL) {
+ if (size > KMALLOC_MAX_SIZE)
+ return NULL;
+ if (size > PAGE_SIZE)
+ flags |= __GFP_NOWARN;
return kmalloc_node(size, flags, node);
+ }
/*
* We want to attempt a large physically contiguous block first because
> the problem is that we only check for an out of bound order in the slow
> path and the node reclaim might happen from the fast path already. This
> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
> the code is rather fragile. A recent UBSAN report just underlines that
> by the following report
>
> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
> shift exponent 51 is too large for 32-bit type 'int'
> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xd2/0x148 lib/dump_stack.c:113
> ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
> __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
> __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
> zone_watermark_fast mm/page_alloc.c:3216 [inline]
> get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
> __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
> alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
> alloc_pages include/linux/gfp.h:509 [inline]
> __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
> dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
> raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
> raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
> fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
> fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
> __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
> block_ioctl+0x105/0x150 fs/block_dev.c:1883
> vfs_ioctl fs/ioctl.c:46 [inline]
> do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
> ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
> __do_sys_ioctl fs/ioctl.c:709 [inline]
> __se_sys_ioctl fs/ioctl.c:707 [inline]
> __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
> do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
And we could fix this in the floppy driver.
> Note that this is not a kvmalloc path. It is just that the fast path
> really depends on having sanitzed order as well. Therefore move the
> order check to the fast path.
But do we really need to do this? Are there any other known potential
callsites?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 23:23 ` Vlastimil Babka
@ 2018-11-13 23:32 ` Andrew Morton
2018-11-14 0:05 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2018-11-13 23:32 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> On 11/14/18 12:15 AM, Andrew Morton wrote:
> > On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> >
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >> struct alloc_context ac = { };
> >>
> >> + /*
> >> + * There are several places where we assume that the order value is sane
> >> + * so bail out early if the request is out of bound.
> >> + */
> >> + if (unlikely(order >= MAX_ORDER)) {
> >> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >> + return NULL;
> >> + }
> >> +
> >
> > I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> > we could help those who choose not to enable it by using
> >
> > #ifdef CONFIG_DEBUG_VM
> > if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> > return NULL;
> > #endif
>
> Hmm, but that would mean there's still potential undefined behavior for
> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
>
What does "potential undefined behavior" mean here?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 23:32 ` Andrew Morton
@ 2018-11-14 0:05 ` Vlastimil Babka
2018-11-19 23:57 ` Pavel Machek
0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2018-11-14 0:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, osalvador, rppt,
aaron.lu, iamjoonsoo.kim, alexander.h.duyck, mgorman,
lifeasageek, threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On 11/14/18 12:32 AM, Andrew Morton wrote:
> On Wed, 14 Nov 2018 00:23:28 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
>
>> On 11/14/18 12:15 AM, Andrew Morton wrote:
>>> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
>>>> struct alloc_context ac = { };
>>>>
>>>> + /*
>>>> + * There are several places where we assume that the order value is sane
>>>> + * so bail out early if the request is out of bound.
>>>> + */
>>>> + if (unlikely(order >= MAX_ORDER)) {
>>>> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>>>> + return NULL;
>>>> + }
>>>> +
>>>
>>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
>>> we could help those who choose not to enable it by using
>>>
>>> #ifdef CONFIG_DEBUG_VM
>>> if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
>>> return NULL;
>>> #endif
>>
>> Hmm, but that would mean there's still potential undefined behavior for
>> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
>>
>
> What does "potential undefined behavior" mean here?
I mean that it becomes undefined once a caller with order >= MAX_ORDER
appears. Worse if it's directly due to a userspace action, like in this
case.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 23:29 ` Andrew Morton
@ 2018-11-14 7:10 ` Michal Hocko
2018-11-16 23:43 ` Dmitry Vyukov
1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-11-14 7:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Kyungtae Kim, pavel.tatashin, vbabka, osalvador, rppt, aaron.lu,
iamjoonsoo.kim, alexander.h.duyck, mgorman, lifeasageek,
threeearcat, syzkaller, linux-kernel, linux-mm,
Konstantin Khlebnikov
On Tue 13-11-18 15:29:41, Andrew Morton wrote:
[...]
> But do we really need to do this? Are there any other known potential
> callsites?
The main point is that the code as it stands is quite fragile, isn't it?
Fixing up all the callers is possible but can you actually think of a
reason why this would cause any measurable effect in the fast path?
The order argument is usually in a register and comparing it to a number
with unlikely branch should be hardly something visible.
Besides that we are talking few cycles at best compared to a fragile
code that got broken by accident without anybody noticing for quite some
time.
I vote for the maintainability over few cycles here. Should anybody find
this measurable we can rework the code by other means.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-13 23:29 ` Andrew Morton
2018-11-14 7:10 ` Michal Hocko
@ 2018-11-16 23:43 ` Dmitry Vyukov
1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2018-11-16 23:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Kyungtae Kim, pavel.tatashin, Vlastimil Babka,
osalvador, Mike Rapoport, aaron.lu, Joonsoo Kim,
alexander.h.duyck, Mel Gorman, lifeasageek, Dae R. Jeong,
syzkaller, LKML, Linux-MM, Konstantin Khlebnikov
On Tue, Nov 13, 2018 at 3:29 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 13 Nov 2018 10:43:05 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Fri, 9 Nov 2018 09:35:29 +0100
>> Subject: [PATCH] mm, page_alloc: check for max order in hot path
>>
>> Konstantin has noticed that kvmalloc might trigger the following warning
>> [Thu Nov 1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
>
> um, wait...
>
>> [...]
>> [Thu Nov 1 08:43:56 2018] Call Trace:
>> [Thu Nov 1 08:43:56 2018] fragmentation_index+0x76/0x90
>> [Thu Nov 1 08:43:56 2018] compaction_suitable+0x4f/0xf0
>> [Thu Nov 1 08:43:56 2018] shrink_node+0x295/0x310
>> [Thu Nov 1 08:43:56 2018] node_reclaim+0x205/0x250
>> [Thu Nov 1 08:43:56 2018] get_page_from_freelist+0x649/0xad0
>> [Thu Nov 1 08:43:56 2018] ? get_page_from_freelist+0x2d4/0xad0
>> [Thu Nov 1 08:43:56 2018] ? release_sock+0x19/0x90
>> [Thu Nov 1 08:43:56 2018] ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
>> [Thu Nov 1 08:43:56 2018] __alloc_pages_nodemask+0x12a/0x2a0
>> [Thu Nov 1 08:43:56 2018] kmalloc_large_node+0x47/0x90
>> [Thu Nov 1 08:43:56 2018] __kmalloc_node+0x22b/0x2e0
>> [Thu Nov 1 08:43:56 2018] kvmalloc_node+0x3e/0x70
>> [Thu Nov 1 08:43:56 2018] xt_alloc_table_info+0x3a/0x80 [x_tables]
>> [Thu Nov 1 08:43:56 2018] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
>> [Thu Nov 1 08:43:56 2018] nf_setsockopt+0x44/0x60
>> [Thu Nov 1 08:43:56 2018] SyS_setsockopt+0x6f/0xc0
>> [Thu Nov 1 08:43:56 2018] do_syscall_64+0x67/0x120
>> [Thu Nov 1 08:43:56 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> If kvalloc_node() is going to call kmalloc() without checking for a
> huge allocation request then surely it should set __GFP_NOWARN.
kmalloc won't warn about large allocations after "mm: don't warn about
large allocations for slab":
https://lkml.org/lkml/2018/9/27/1156
It will just return NULL. That was already the case for slub.
> And it
> shouldn't bother at all if size > KMALLOC_MAX_SIZE, surely? So
> something like
>
> --- a/mm/util.c~a
> +++ a/mm/util.c
> @@ -393,11 +393,16 @@ void *kvmalloc_node(size_t size, gfp_t f
> void *ret;
>
> /*
> - * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> - * so the given set of flags has to be compatible.
> + * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
> + * tables) so the given set of flags has to be compatible.
> */
> - if ((flags & GFP_KERNEL) != GFP_KERNEL)
> + if ((flags & GFP_KERNEL) != GFP_KERNEL) {
> + if (size > KMALLOC_MAX_SIZE)
> + return NULL;
> + if (size > PAGE_SIZE)
> + flags |= __GFP_NOWARN;
> return kmalloc_node(size, flags, node);
> + }
>
> /*
> * We want to attempt a large physically contiguous block first because
>
>
>> the problem is that we only check for an out of bound order in the slow
>> path and the node reclaim might happen from the fast path already. This
>> is fixable by making sure that kvmalloc doesn't ever use kmalloc for
>> requests that are larger than KMALLOC_MAX_SIZE but this also shows that
>> the code is rather fragile. A recent UBSAN report just underlines that
>> by the following report
>>
>> UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
>> shift exponent 51 is too large for 32-bit type 'int'
>> CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:77 [inline]
>> dump_stack+0xd2/0x148 lib/dump_stack.c:113
>> ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
>> __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
>> __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
>> zone_watermark_fast mm/page_alloc.c:3216 [inline]
>> get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
>> __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
>> alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
>> alloc_pages include/linux/gfp.h:509 [inline]
>> __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
>> dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
>> raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
>> raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
>> fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
>> fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
>> __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>> blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
>> block_ioctl+0x105/0x150 fs/block_dev.c:1883
>> vfs_ioctl fs/ioctl.c:46 [inline]
>> do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
>> ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
>> __do_sys_ioctl fs/ioctl.c:709 [inline]
>> __se_sys_ioctl fs/ioctl.c:707 [inline]
>> __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
>> do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> And we could fix this in the floppy driver.
>
>> Note that this is not a kvmalloc path. It is just that the fast path
>> really depends on having sanitzed order as well. Therefore move the
>> order check to the fast path.
>
> But do we really need to do this? Are there any other known potential
> callsites?
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: UBSAN: Undefined behaviour in mm/page_alloc.c
2018-11-14 0:05 ` Vlastimil Babka
@ 2018-11-19 23:57 ` Pavel Machek
0 siblings, 0 replies; 22+ messages in thread
From: Pavel Machek @ 2018-11-19 23:57 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Michal Hocko, Kyungtae Kim, pavel.tatashin,
osalvador, rppt, aaron.lu, iamjoonsoo.kim, alexander.h.duyck,
mgorman, lifeasageek, threeearcat, syzkaller, linux-kernel,
linux-mm, Konstantin Khlebnikov
[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]
Hi1
> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -4364,6 +4353,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >>>> gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
> >>>> struct alloc_context ac = { };
> >>>>
> >>>> + /*
> >>>> + * There are several places where we assume that the order value is sane
> >>>> + * so bail out early if the request is out of bound.
> >>>> + */
> >>>> + if (unlikely(order >= MAX_ORDER)) {
> >>>> + WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >>>> + return NULL;
> >>>> + }
> >>>> +
> >>>
> >>> I know "everybody enables CONFIG_DEBUG_VM", but given this is fastpath,
> >>> we could help those who choose not to enable it by using
> >>>
> >>> #ifdef CONFIG_DEBUG_VM
> >>> if (WARN_ON_ONCE(order >= MAX_ORDER && !(gfp_mask & __GFP_NOWARN)))
> >>> return NULL;
> >>> #endif
> >>
> >> Hmm, but that would mean there's still potential undefined behavior for
> >> !CONFIG_DEBUG_VM, so I would prefer not to do it like that.
> >>
> >
> > What does "potential undefined behavior" mean here?
>
> I mean that it becomes undefined once a caller with order >= MAX_ORDER
> appears. Worse if it's directly due to a userspace action, like in this
> case.
We should really check if value from userspace is sane _before_
passing it to alloc_pages(). Anything else is too fragile. Maybe
alloc_pages should do the second check, but...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-11-19 23:57 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09 4:09 UBSAN: Undefined behaviour in mm/page_alloc.c Kyungtae Kim
2018-11-09 8:42 ` Vlastimil Babka
2018-11-09 8:43 ` Michal Hocko
2018-11-09 9:41 ` Tetsuo Handa
2018-11-09 9:56 ` Michal Hocko
2018-11-09 10:07 ` Tetsuo Handa
2018-11-09 10:25 ` Michal Hocko
2018-11-09 10:10 ` Vlastimil Babka
2018-11-09 10:22 ` Michal Hocko
2018-11-09 10:24 ` Tetsuo Handa
2018-11-09 10:28 ` Michal Hocko
2018-11-09 10:52 ` Balbir Singh
2018-11-09 9:56 ` Mel Gorman
2018-11-13 9:43 ` Michal Hocko
2018-11-13 23:15 ` Andrew Morton
2018-11-13 23:23 ` Vlastimil Babka
2018-11-13 23:32 ` Andrew Morton
2018-11-14 0:05 ` Vlastimil Babka
2018-11-19 23:57 ` Pavel Machek
2018-11-13 23:29 ` Andrew Morton
2018-11-14 7:10 ` Michal Hocko
2018-11-16 23:43 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).