* next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 @ 2017-05-15 22:06 Luis R. Rodriguez 2017-05-15 22:15 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-15 22:06 UTC (permalink / raw) To: Stephen Smalley, Ingo Molnar Cc: Andy Lutomirski, Michal Hocko, Andrew Morton, Kees Cook, Eric W. Biederman, Mateusz Guzik, mcgrof, linux-kernel For a few kernel releases now I have managed to trigger the warning added via commit e1a58320a38dfa ("x86/mm: Warn on W^X mappings", merged upstream since v4.4) on my KVM qemu x86_64 system. Since I just booted into the shiny new linux-next tag next-20170515 (based on v4.12-rc1) and this is still triggering I figured its time to tackle this. Let me know if this is already known or what can be done to try to fix this. Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) I will try updating my distro package for qemu and see if perhaps its this and for the other odd fork issue I reported [0]. [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com My config: http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/configs/piggy-x86_64_qemu_fork_kmemleak.config The splat: [ 0.911209] x86/mm: Found insecure W+X mapping at address ffffffffc0288000/0xffffffffc0288000 [ 0.912066] ------------[ cut here ]------------ [ 0.912544] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 [ 0.913381] Modules linked in: [ 0.913672] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #144 [ 0.914434] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 0.915595] task: ffff98d43a5eac80 task.stack: ffffad22c0630000 [ 0.916174] RIP: 0010:note_page+0x630/0x7e0 [ 0.916595] RSP: 0018:ffffad22c0633df0 EFLAGS: 00010286 [ 0.917101] RAX: 0000000000000051 RBX: ffffad22c0633e88 RCX: ffffffff91256708 [ 0.917805] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 [ 0.918511] RBP: ffffad22c0633e28 R08: 6666666666666678 R09: 0000000000000160 [ 0.919214] R10: ffffad22c0633dd8 R11: 3030303838323063 R12: 0000000000000000 [ 0.919917] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 [ 0.920615] FS: 0000000000000000(0000) GS:ffff98d43fc00000(0000) knlGS:0000000000000000 [ 0.921384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921943] CR2: 0000000000000000 CR3: 00000000a3a09000 CR4: 00000000000006f0 [ 0.922657] Call Trace: [ 0.922901] ptdump_walk_pgd_level_core+0x3e7/0x490 [ 0.923354] ? 0xffffffff90600000 [ 0.923662] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 0.924145] mark_rodata_ro+0xf4/0x100 [ 0.924536] ? rest_init+0x80/0x80 [ 0.924862] kernel_init+0x2f/0x100 [ 0.925197] ret_from_fork+0x2c/0x40 [ 0.925552] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 c8 34 fe 90 c6 05 c8 eb bc 00 01 48 89 f2 e8 8d fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 05 b1 fe 90 e8 76 fc [ 0.927368] ---[ end trace 97137ae213b9cb25 ]--- [ 0.927830] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez @ 2017-05-15 22:15 ` Luis R. Rodriguez 2017-05-15 22:57 ` Kees Cook 2017-05-15 23:30 ` Luis R. Rodriguez 0 siblings, 2 replies; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-15 22:15 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: > Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) > > I will try updating my distro package for qemu and see if perhaps its this > and for the other odd fork issue I reported [0]. > > [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com Yeah nope, using my distribution latest: QEMU emulator version 2.8.0(openSUSE Tumbleweed) And still both issues are present. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-15 22:15 ` Luis R. Rodriguez @ 2017-05-15 22:57 ` Kees Cook 2017-05-15 23:45 ` Luis R. Rodriguez 2017-05-15 23:30 ` Luis R. Rodriguez 1 sibling, 1 reply; 21+ messages in thread From: Kees Cook @ 2017-05-15 22:57 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: >> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) >> >> I will try updating my distro package for qemu and see if perhaps its this >> and for the other odd fork issue I reported [0]. >> >> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com > > Yeah nope, using my distribution latest: > > QEMU emulator version 2.8.0(openSUSE Tumbleweed) > > And still both issues are present. > > Luis Can you enable CONFIG_X86_PTDUMP=y and then find out what is located at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ? -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-15 22:57 ` Kees Cook @ 2017-05-15 23:45 ` Luis R. Rodriguez 2017-05-16 0:12 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-15 23:45 UTC (permalink / raw) To: Kees Cook Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote: > On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: >>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) >>> >>> I will try updating my distro package for qemu and see if perhaps its this >>> and for the other odd fork issue I reported [0]. >>> >>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com >> >> Yeah nope, using my distribution latest: >> >> QEMU emulator version 2.8.0(openSUSE Tumbleweed) >> >> And still both issues are present. >> >> Luis > > Can you enable CONFIG_X86_PTDUMP=y and then find out what is located > at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ? Sure thing. Recompiled with this enabled, new warning: [ 0.891559] x86/mm: Found insecure W+X mapping at address ffffffffc00e4000/0xffffffffc00e4000 [ 0.892394] ------------[ cut here ]------------ [ 0.892834] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 [ 0.893674] Modules linked in: [ 0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 [ 0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000 [ 0.896403] RIP: 0010:note_page+0x630/0x7e0 [ 0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286 [ 0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708 [ 0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 [ 0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165 [ 0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000 [ 0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 [ 0.900629] FS: 0000000000000000(0000) GS:ffff8ed7ffc00000(0000) knlGS:0000000000000000 [ 0.901398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0 [ 0.902590] Call Trace: [ 0.902827] ptdump_walk_pgd_level_core+0x3e7/0x490 [ 0.903274] ? 0xffffffff9a800000 [ 0.903595] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 0.904064] mark_rodata_ro+0xf4/0x100 [ 0.904423] ? rest_init+0x80/0x80 [ 0.904744] kernel_init+0x2f/0x100 [ 0.905068] ret_from_fork+0x2c/0x40 [ 0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8 cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8 b6 fc [ 0.907173] ---[ end trace 878b39cb0c248e66 ]--- [ 0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. And ffffffffc00e4000 is: ---[ Modules ]--- 0xffffffffc0000000-0xffffffffc00e4000 912K pte 0xffffffffc00e4000-0xffffffffc00e5000 4K RW GLB x pte In case someone needs the full /sys/kernel/debug/kernel_page_tables file: http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-15 23:45 ` Luis R. Rodriguez @ 2017-05-16 0:12 ` Kees Cook 2017-05-17 16:40 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2017-05-16 0:12 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Mon, May 15, 2017 at 4:45 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote: >> On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >>> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: >>>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) >>>> >>>> I will try updating my distro package for qemu and see if perhaps its this >>>> and for the other odd fork issue I reported [0]. >>>> >>>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com >>> >>> Yeah nope, using my distribution latest: >>> >>> QEMU emulator version 2.8.0(openSUSE Tumbleweed) >>> >>> And still both issues are present. >>> >>> Luis >> >> Can you enable CONFIG_X86_PTDUMP=y and then find out what is located >> at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ? > > Sure thing. > > Recompiled with this enabled, new warning: > > [ 0.891559] x86/mm: Found insecure W+X mapping at address > ffffffffc00e4000/0xffffffffc00e4000 > [ 0.892394] ------------[ cut here ]------------ > [ 0.892834] WARNING: CPU: 0 PID: 1 at > arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > [ 0.893674] Modules linked in: > [ 0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 4.12.0-rc1-next-20170515+ #145 > [ 0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000 > [ 0.896403] RIP: 0010:note_page+0x630/0x7e0 > [ 0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286 > [ 0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708 > [ 0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 > [ 0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165 > [ 0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000 > [ 0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 > [ 0.900629] FS: 0000000000000000(0000) GS:ffff8ed7ffc00000(0000) > knlGS:0000000000000000 > [ 0.901398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0 > [ 0.902590] Call Trace: > [ 0.902827] ptdump_walk_pgd_level_core+0x3e7/0x490 > [ 0.903274] ? 0xffffffff9a800000 > [ 0.903595] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 0.904064] mark_rodata_ro+0xf4/0x100 > [ 0.904423] ? rest_init+0x80/0x80 > [ 0.904744] kernel_init+0x2f/0x100 > [ 0.905068] ret_from_fork+0x2c/0x40 > [ 0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff > ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8 > cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8 > b6 fc > [ 0.907173] ---[ end trace 878b39cb0c248e66 ]--- > [ 0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > And ffffffffc00e4000 is: > > ---[ Modules ]--- > 0xffffffffc0000000-0xffffffffc00e4000 912K > pte > 0xffffffffc00e4000-0xffffffffc00e5000 4K RW > GLB x pte > > In case someone needs the full /sys/kernel/debug/kernel_page_tables file: > > http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt ---[ Modules ]--- 0xffffffffc0000000-0xffffffffc00e4000 912K pte This should be the modules ASLR gap 0xffffffffc00e4000-0xffffffffc00e5000 4K RW GLB x pte This is part of the same gap, but it's RW+x strangely? 0xffffffffc00e5000-0xffffffffc00e6000 4K pte This is more of the gap? 0xffffffffc00e6000-0xffffffffc00fa000 80K ro GLB x pte 0xffffffffc00fa000-0xffffffffc010c000 72K ro GLB NX pte 0xffffffffc010c000-0xffffffffc011b000 60K RW GLB NX pte This should be the first loaded module. Can you check that 0xffffffffc00e6000 matches the first module in /proc/modules? Something touched the module gap and left is RW+x... Are you able to bisect this? -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-16 0:12 ` Kees Cook @ 2017-05-17 16:40 ` Luis R. Rodriguez 2017-05-17 17:53 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-17 16:40 UTC (permalink / raw) To: Kees Cook Cc: Luis R. Rodriguez, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Mon, May 15, 2017 at 05:12:18PM -0700, Kees Cook wrote: > On Mon, May 15, 2017 at 4:45 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote: > >> On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > >>> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: > >>>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) > >>>> > >>>> I will try updating my distro package for qemu and see if perhaps its this > >>>> and for the other odd fork issue I reported [0]. > >>>> > >>>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com > >>> > >>> Yeah nope, using my distribution latest: > >>> > >>> QEMU emulator version 2.8.0(openSUSE Tumbleweed) > >>> > >>> And still both issues are present. > >>> > >>> Luis > >> > >> Can you enable CONFIG_X86_PTDUMP=y and then find out what is located > >> at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ? > > > > Sure thing. > > > > Recompiled with this enabled, new warning: > > > > [ 0.891559] x86/mm: Found insecure W+X mapping at address > > ffffffffc00e4000/0xffffffffc00e4000 > > [ 0.892394] ------------[ cut here ]------------ > > [ 0.892834] WARNING: CPU: 0 PID: 1 at > > arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > [ 0.893674] Modules linked in: > > [ 0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > 4.12.0-rc1-next-20170515+ #145 > > [ 0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > [ 0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000 > > [ 0.896403] RIP: 0010:note_page+0x630/0x7e0 > > [ 0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286 > > [ 0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708 > > [ 0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 > > [ 0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165 > > [ 0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000 > > [ 0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 > > [ 0.900629] FS: 0000000000000000(0000) GS:ffff8ed7ffc00000(0000) > > knlGS:0000000000000000 > > [ 0.901398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0 > > [ 0.902590] Call Trace: > > [ 0.902827] ptdump_walk_pgd_level_core+0x3e7/0x490 > > [ 0.903274] ? 0xffffffff9a800000 > > [ 0.903595] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > [ 0.904064] mark_rodata_ro+0xf4/0x100 > > [ 0.904423] ? rest_init+0x80/0x80 > > [ 0.904744] kernel_init+0x2f/0x100 > > [ 0.905068] ret_from_fork+0x2c/0x40 > > [ 0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff > > ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8 > > cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8 > > b6 fc > > [ 0.907173] ---[ end trace 878b39cb0c248e66 ]--- > > [ 0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > And ffffffffc00e4000 is: > > > > ---[ Modules ]--- > > 0xffffffffc0000000-0xffffffffc00e4000 912K > > pte > > 0xffffffffc00e4000-0xffffffffc00e5000 4K RW > > GLB x pte > > > > In case someone needs the full /sys/kernel/debug/kernel_page_tables file: > > > > http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt > > ---[ Modules ]--- > 0xffffffffc0000000-0xffffffffc00e4000 912K > pte > > This should be the modules ASLR gap > > 0xffffffffc00e4000-0xffffffffc00e5000 4K RW > GLB x pte > > This is part of the same gap, but it's RW+x strangely? > > 0xffffffffc00e5000-0xffffffffc00e6000 4K > pte > > This is more of the gap? > > 0xffffffffc00e6000-0xffffffffc00fa000 80K ro > GLB x pte > 0xffffffffc00fa000-0xffffffffc010c000 72K ro > GLB NX pte > 0xffffffffc010c000-0xffffffffc011b000 60K RW > GLB NX pte > > This should be the first loaded module. Can you check that > 0xffffffffc00e6000 matches the first module in /proc/modules? Yes, but I had killed that boot session again, so upon my next boot I had a different layout, the ASLR gap was much larger: ---[ Modules ]--- 0xffffffffc0000000-0xffffffffc01b0000 1728K pte 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte 0xffffffffc01b1000-0xffffffffc01b2000 4K pte 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte As you can guess if we follow similar pattern the RW hole is the one this boot warned about: [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 [ 1.451280] ------------[ cut here ]------------ [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 [ 1.452499] Modules linked in: [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one on the /proc/modules list but then again /proc/modules does not seem to have a specific order other than perhaps being pegged into a linked list of modules once they go live, and it seems its typically output backwards from when that happened, sorting that by address we get: root@piggy:~# cat /proc/modules | sort -k 6 | head -3 e1000 143360 0 - Live 0xffffffffc01b2000 (E) mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) And this then seems to be the first module loaded: e1000 143360 0 - Live 0xffffffffc01b2000 (E) The output of dmesg seems to confirm this as per the list of modules sorted as per above. > Something touched the module gap and left is RW+x... Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > Are you able to bisect this? This issue has been present for a while so since I recall this I might be able to reduce the number of needed target kernels to bisect. Lemme tinker a bit and if no clear culprit comes up then will try bisect. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-17 16:40 ` Luis R. Rodriguez @ 2017-05-17 17:53 ` Kees Cook 2017-05-19 0:44 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2017-05-17 17:53 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > Yes, but I had killed that boot session again, so upon my next boot > I had a different layout, the ASLR gap was much larger: > > ---[ Modules ]--- > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > As you can guess if we follow similar pattern the RW hole is the one this boot > warned about: > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > [ 1.451280] ------------[ cut here ]------------ > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > [ 1.452499] Modules linked in: > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > on the /proc/modules list but then again /proc/modules does not seem to have a specific > order other than perhaps being pegged into a linked list of modules once they go live, > and it seems its typically output backwards from when that happened, sorting that > by address we get: Right, sorry, I'd expect it at the bottom of the list in /proc/modules, but that's fine, it's there. > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > And this then seems to be the first module loaded: > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > The output of dmesg seems to confirm this as per the list of modules sorted > as per above. > >> Something touched the module gap and left is RW+x... > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. Is it possible a module got loaded before e1000 and then unloaded? That seems odd, but maybe unload isn't cleaning up? >> Are you able to bisect this? > > This issue has been present for a while so since I recall this I might be > able to reduce the number of needed target kernels to bisect. Lemme tinker > a bit and if no clear culprit comes up then will try bisect. Okay, thanks! -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-17 17:53 ` Kees Cook @ 2017-05-19 0:44 ` Luis R. Rodriguez 2017-05-19 3:08 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-19 0:44 UTC (permalink / raw) To: Kees Cook Cc: Luis R. Rodriguez, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > Yes, but I had killed that boot session again, so upon my next boot > > I had a different layout, the ASLR gap was much larger: > > > > ---[ Modules ]--- > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > warned about: > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > [ 1.451280] ------------[ cut here ]------------ > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > [ 1.452499] Modules linked in: > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > order other than perhaps being pegged into a linked list of modules once they go live, > > and it seems its typically output backwards from when that happened, sorting that > > by address we get: > > Right, sorry, I'd expect it at the bottom of the list in > /proc/modules, but that's fine, it's there. > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > And this then seems to be the first module loaded: > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > as per above. > > > >> Something touched the module gap and left is RW+x... > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > Is it possible a module got loaded before e1000 and then unloaded? > That seems odd, but maybe unload isn't cleaning up? > > >> Are you able to bisect this? > > > > This issue has been present for a while so since I recall this I might be > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > a bit and if no clear culprit comes up then will try bisect. > > Okay, thanks! Sorry to report that this issue is present since the feature's addition. So the issue is there since its addition and is still present today. *But* it may also be a configuration issue, given I have booted this guest *without* this issue ... So: git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db That boots with the warning. To help debug further I've minimized my modules to only a few: scsi_mod, e1000, libata. I suspect at this point this is not the fault of a particular module but instead just an accounting semantic (>= or <= on an edge) but let's see. I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X mappings") and I with: [ 0.949435] ------------[ cut here ]------------ [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 [ 0.951814] Modules linked in: [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 [ 0.956256] Call Trace: [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte 0xffffffff82400000-0xffffffffc0000000 988M pmd ---[ Modules ]--- 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte 0xffffffffc0001000-0xffffffffc0002000 4K pte 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte root@piggy:~# cat /proc/modules | sort -k 6 | head -3 scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) e1000 127757 0 - Live 0xffffffffc004d000 (E) libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) So that 4K RW seems suspect of getting used for allocation purpose on edge for a particular reason and it also happens to be on the edge of the high kernel mapping. Could it be the boundary semantic issue ? For instance can it be that since 0xffffffffc0002000 is given to the first module by the allocator, scsi_mod, and since that address is *technically* part of two boundaries we get a splat ? 0xffffffffc0001000-0xffffffffc0002000 4K pte 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 0:44 ` Luis R. Rodriguez @ 2017-05-19 3:08 ` Luis R. Rodriguez 2017-05-19 15:40 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-19 3:08 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > Yes, but I had killed that boot session again, so upon my next boot > > > I had a different layout, the ASLR gap was much larger: > > > > > > ---[ Modules ]--- > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > warned about: > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > [ 1.451280] ------------[ cut here ]------------ > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > [ 1.452499] Modules linked in: > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > and it seems its typically output backwards from when that happened, sorting that > > > by address we get: > > > > Right, sorry, I'd expect it at the bottom of the list in > > /proc/modules, but that's fine, it's there. > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > And this then seems to be the first module loaded: > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > as per above. > > > > > >> Something touched the module gap and left is RW+x... > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > Is it possible a module got loaded before e1000 and then unloaded? > > That seems odd, but maybe unload isn't cleaning up? > > > > >> Are you able to bisect this? > > > > > > This issue has been present for a while so since I recall this I might be > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > a bit and if no clear culprit comes up then will try bisect. > > > > Okay, thanks! > > Sorry to report that this issue is present since the feature's addition. So > the issue is there since its addition and is still present today. *But* it > may also be a configuration issue, given I have booted this guest *without* > this issue ... > > So: > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > That boots with the warning. To help debug further I've minimized my modules > to only a few: scsi_mod, e1000, libata. > > I suspect at this point this is not the fault of a particular module but > instead just an accounting semantic (>= or <= on an edge) but let's see. > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > mappings") and I with: > > [ 0.949435] ------------[ cut here ]------------ > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > [ 0.951814] Modules linked in: > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > [ 0.956256] Call Trace: > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > ---[ High Kernel Mapping ]--- > 0xffffffff80000000-0xffffffff81000000 16M pmd > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > 0xffffffff82400000-0xffffffffc0000000 988M pmd > ---[ Modules ]--- > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > 0xffffffffc0001000-0xffffffffc0002000 4K pte > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > e1000 127757 0 - Live 0xffffffffc004d000 (E) > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > So that 4K RW seems suspect of getting used for allocation purpose on edge > for a particular reason and it also happens to be on the edge of the high > kernel mapping. Could it be the boundary semantic issue ? > > For instance can it be that since 0xffffffffc0002000 is given to the first > module by the allocator, scsi_mod, and since that address is *technically* > part of two boundaries we get a splat ? > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte Note on the latest linux-next and on the commit that introduced this the config and kernel yields only *one* page: x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. I believe this is more indications my suspicion might be right. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 3:08 ` Luis R. Rodriguez @ 2017-05-19 15:40 ` Luis R. Rodriguez 2017-05-19 17:28 ` Luis R. Rodriguez 2017-05-19 17:35 ` Catalin Marinas 0 siblings, 2 replies; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-19 15:40 UTC (permalink / raw) To: Catalin Marinas, Steven Rostedt Cc: Luis R. Rodriguez, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote: > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > Yes, but I had killed that boot session again, so upon my next boot > > > > I had a different layout, the ASLR gap was much larger: > > > > > > > > ---[ Modules ]--- > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > > warned about: > > > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > > [ 1.451280] ------------[ cut here ]------------ > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > > [ 1.452499] Modules linked in: > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > > and it seems its typically output backwards from when that happened, sorting that > > > > by address we get: > > > > > > Right, sorry, I'd expect it at the bottom of the list in > > > /proc/modules, but that's fine, it's there. > > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > > > And this then seems to be the first module loaded: > > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > > as per above. > > > > > > > >> Something touched the module gap and left is RW+x... > > > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > > > Is it possible a module got loaded before e1000 and then unloaded? > > > That seems odd, but maybe unload isn't cleaning up? > > > > > > >> Are you able to bisect this? > > > > > > > > This issue has been present for a while so since I recall this I might be > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > > a bit and if no clear culprit comes up then will try bisect. > > > > > > Okay, thanks! > > > > Sorry to report that this issue is present since the feature's addition. So > > the issue is there since its addition and is still present today. *But* it > > may also be a configuration issue, given I have booted this guest *without* > > this issue ... > > > > So: > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > > > That boots with the warning. To help debug further I've minimized my modules > > to only a few: scsi_mod, e1000, libata. > > > > I suspect at this point this is not the fault of a particular module but > > instead just an accounting semantic (>= or <= on an edge) but let's see. > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > > mappings") and I with: > > > > [ 0.949435] ------------[ cut here ]------------ > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > > [ 0.951814] Modules linked in: > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > > [ 0.956256] Call Trace: > > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f > > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 > > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 > > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 > > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 > > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 > > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 > > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 > > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > ---[ High Kernel Mapping ]--- > > 0xffffffff80000000-0xffffffff81000000 16M pmd > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > > 0xffffffff82400000-0xffffffffc0000000 988M pmd > > ---[ Modules ]--- > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > > e1000 127757 0 - Live 0xffffffffc004d000 (E) > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge > > for a particular reason and it also happens to be on the edge of the high > > kernel mapping. Could it be the boundary semantic issue ? > > > > For instance can it be that since 0xffffffffc0002000 is given to the first > > module by the allocator, scsi_mod, and since that address is *technically* > > part of two boundaries we get a splat ? > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > Note on the latest linux-next and on the commit that introduced this the config > and kernel yields only *one* page: > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > I believe this is more indications my suspicion might be right. If the following is a legit forced way to get query the kernel to ask it who owns a page then perhaps this technique can be used in the future to figure out who the hell caused this. Catalin, can you confirm? In this case this is perhaps not a leaked page but I am trying to abuse the kmemleak debugfs API to query who allocated the page. Is that fine? [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 [ 0.918502] Modules linked in: [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 0.920011] Call Trace: [ 0.920011] dump_stack+0x63/0x81 [ 0.920011] __warn+0xcb/0xf0 [ 0.920011] warn_slowpath_fmt+0x5a/0x80 [ 0.920011] note_page+0x63c/0x7e0 [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 [ 0.920011] ? 0xffffffff86c00000 [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 0.920011] mark_rodata_ro+0xf4/0x100 [ 0.920011] ? rest_init+0x80/0x80 [ 0.920011] kernel_init+0x2a/0x100 [ 0.920011] ret_from_fork+0x2c/0x40 [ 0.925474] ---[ end trace dca00cd779490a2b ]--- [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak dmesg | tail [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 [ 49.212148] kmemleak: min_count = 2 [ 49.212852] kmemleak: count = 0 [ 49.213363] kmemleak: flags = 0x1 [ 49.213363] kmemleak: checksum = 0 [ 49.213363] kmemleak: backtrace: [ 49.213363] kmemleak_alloc+0x4a/0xa0 [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 [ 49.213363] module_alloc+0x67/0xc0 [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 [ 49.213363] ftrace_startup+0x90/0x210 [ 49.213363] register_ftrace_function+0x4b/0x60 [ 49.213363] arm_kprobe+0x84/0xe0 [ 49.213363] register_kprobe+0x56e/0x5b0 [ 49.213363] init_test_probes+0x61/0x560 [ 49.213363] init_kprobes+0x1e3/0x206 [ 49.213363] do_one_initcall+0x52/0x1a0 [ 49.213363] kernel_init_freeable+0x178/0x200 [ 49.213363] kernel_init+0xe/0x100 [ 49.213363] ret_from_fork+0x2c/0x40 [ 49.213363] 0xffffffffffffffff Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 15:40 ` Luis R. Rodriguez @ 2017-05-19 17:28 ` Luis R. Rodriguez 2017-05-20 2:38 ` Masami Hiramatsu 2017-05-19 17:35 ` Catalin Marinas 1 sibling, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-19 17:28 UTC (permalink / raw) To: Masami Hiramatsu, Jim Keniston, davem, sagar.abhishek Cc: Catalin Marinas, mcgrof, Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote: > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > > Yes, but I had killed that boot session again, so upon my next boot > > > > > I had a different layout, the ASLR gap was much larger: > > > > > > > > > > ---[ Modules ]--- > > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > > > warned about: > > > > > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > > > [ 1.451280] ------------[ cut here ]------------ > > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > > > [ 1.452499] Modules linked in: > > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > > > and it seems its typically output backwards from when that happened, sorting that > > > > > by address we get: > > > > > > > > Right, sorry, I'd expect it at the bottom of the list in > > > > /proc/modules, but that's fine, it's there. > > > > > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > > > > > And this then seems to be the first module loaded: > > > > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > > > as per above. > > > > > > > > > >> Something touched the module gap and left is RW+x... > > > > > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > > > > > Is it possible a module got loaded before e1000 and then unloaded? > > > > That seems odd, but maybe unload isn't cleaning up? > > > > > > > > >> Are you able to bisect this? > > > > > > > > > > This issue has been present for a while so since I recall this I might be > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > > > a bit and if no clear culprit comes up then will try bisect. > > > > > > > > Okay, thanks! > > > > > > Sorry to report that this issue is present since the feature's addition. So > > > the issue is there since its addition and is still present today. *But* it > > > may also be a configuration issue, given I have booted this guest *without* > > > this issue ... > > > > > > So: > > > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > > > > > That boots with the warning. To help debug further I've minimized my modules > > > to only a few: scsi_mod, e1000, libata. > > > > > > I suspect at this point this is not the fault of a particular module but > > > instead just an accounting semantic (>= or <= on an edge) but let's see. > > > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > > > mappings") and I with: > > > > > > [ 0.949435] ------------[ cut here ]------------ > > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > > > [ 0.951814] Modules linked in: > > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > > > [ 0.956256] Call Trace: > > > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f > > > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 > > > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 > > > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 > > > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 > > > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 > > > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 > > > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 > > > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > > ---[ High Kernel Mapping ]--- > > > 0xffffffff80000000-0xffffffff81000000 16M pmd > > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > > > 0xffffffff82400000-0xffffffffc0000000 988M pmd > > > ---[ Modules ]--- > > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > > > e1000 127757 0 - Live 0xffffffffc004d000 (E) > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge > > > for a particular reason and it also happens to be on the edge of the high > > > kernel mapping. Could it be the boundary semantic issue ? > > > > > > For instance can it be that since 0xffffffffc0002000 is given to the first > > > module by the allocator, scsi_mod, and since that address is *technically* > > > part of two boundaries we get a splat ? > > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > Note on the latest linux-next and on the commit that introduced this the config > > and kernel yields only *one* page: > > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > I believe this is more indications my suspicion might be right. > > If the following is a legit forced way to get query the kernel to ask it > who owns a page then perhaps this technique can be used in the future to > figure out who the hell caused this. Catalin, can you confirm? In this > case this is perhaps not a leaked page but I am trying to abuse the > kmemleak debugfs API to query who allocated the page. Is that fine? > > [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 > [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 > [ 0.918502] Modules linked in: > [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 > [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 0.920011] Call Trace: > [ 0.920011] dump_stack+0x63/0x81 > [ 0.920011] __warn+0xcb/0xf0 > [ 0.920011] warn_slowpath_fmt+0x5a/0x80 > [ 0.920011] note_page+0x63c/0x7e0 > [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 > [ 0.920011] ? 0xffffffff86c00000 > [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 0.920011] mark_rodata_ro+0xf4/0x100 > [ 0.920011] ? rest_init+0x80/0x80 > [ 0.920011] kernel_init+0x2a/0x100 > [ 0.920011] ret_from_fork+0x2c/0x40 > [ 0.925474] ---[ end trace dca00cd779490a2b ]--- > [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak > dmesg | tail > > [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): > [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 > [ 49.212148] kmemleak: min_count = 2 > [ 49.212852] kmemleak: count = 0 > [ 49.213363] kmemleak: flags = 0x1 > [ 49.213363] kmemleak: checksum = 0 > [ 49.213363] kmemleak: backtrace: > [ 49.213363] kmemleak_alloc+0x4a/0xa0 > [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 > [ 49.213363] module_alloc+0x67/0xc0 > [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 > [ 49.213363] ftrace_startup+0x90/0x210 > [ 49.213363] register_ftrace_function+0x4b/0x60 > [ 49.213363] arm_kprobe+0x84/0xe0 > [ 49.213363] register_kprobe+0x56e/0x5b0 > [ 49.213363] init_test_probes+0x61/0x560 > [ 49.213363] init_kprobes+0x1e3/0x206 > [ 49.213363] do_one_initcall+0x52/0x1a0 > [ 49.213363] kernel_init_freeable+0x178/0x200 > [ 49.213363] kernel_init+0xe/0x100 > [ 49.213363] ret_from_fork+0x2c/0x40 > [ 49.213363] 0xffffffffffffffff Aha! And the winner is: CONFIG_KPROBES_SANITY_TEST I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN. I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields the same trace for both of these kernels. So -- the above kmemleak hack seems to actually work to seek who owns that page. Now to figure out how the hell kernel/test_kprobes.c screws around with things. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 17:28 ` Luis R. Rodriguez @ 2017-05-20 2:38 ` Masami Hiramatsu 2017-05-23 14:48 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Masami Hiramatsu @ 2017-05-20 2:38 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Jim Keniston, davem, sagar.abhishek, Catalin Marinas, Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML Hi Luis, On Fri, 19 May 2017 19:28:54 +0200 "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: > > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote: > > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > > > Yes, but I had killed that boot session again, so upon my next boot > > > > > > I had a different layout, the ASLR gap was much larger: > > > > > > > > > > > > ---[ Modules ]--- > > > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > > > > warned about: > > > > > > > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > > > > [ 1.451280] ------------[ cut here ]------------ > > > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > > > > [ 1.452499] Modules linked in: > > > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > > > > and it seems its typically output backwards from when that happened, sorting that > > > > > > by address we get: > > > > > > > > > > Right, sorry, I'd expect it at the bottom of the list in > > > > > /proc/modules, but that's fine, it's there. > > > > > > > > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > > > > > > > And this then seems to be the first module loaded: > > > > > > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > > > > as per above. > > > > > > > > > > > >> Something touched the module gap and left is RW+x... > > > > > > > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > > > > > > > Is it possible a module got loaded before e1000 and then unloaded? > > > > > That seems odd, but maybe unload isn't cleaning up? > > > > > > > > > > >> Are you able to bisect this? > > > > > > > > > > > > This issue has been present for a while so since I recall this I might be > > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > > > > a bit and if no clear culprit comes up then will try bisect. > > > > > > > > > > Okay, thanks! > > > > > > > > Sorry to report that this issue is present since the feature's addition. So > > > > the issue is there since its addition and is still present today. *But* it > > > > may also be a configuration issue, given I have booted this guest *without* > > > > this issue ... > > > > > > > > So: > > > > > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > > > > > > > That boots with the warning. To help debug further I've minimized my modules > > > > to only a few: scsi_mod, e1000, libata. > > > > > > > > I suspect at this point this is not the fault of a particular module but > > > > instead just an accounting semantic (>= or <= on an edge) but let's see. > > > > > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > > > > mappings") and I with: > > > > > > > > [ 0.949435] ------------[ cut here ]------------ > > > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > > > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > > > > [ 0.951814] Modules linked in: > > > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > > > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > > > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > > > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > > > > [ 0.956256] Call Trace: > > > > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f > > > > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 > > > > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 > > > > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 > > > > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 > > > > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 > > > > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 > > > > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 > > > > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > > > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > > > > > ---[ High Kernel Mapping ]--- > > > > 0xffffffff80000000-0xffffffff81000000 16M pmd > > > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > > > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > > > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > > > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > > > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > > > > 0xffffffff82400000-0xffffffffc0000000 988M pmd > > > > ---[ Modules ]--- > > > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > > > > e1000 127757 0 - Live 0xffffffffc004d000 (E) > > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > > > > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge > > > > for a particular reason and it also happens to be on the edge of the high > > > > kernel mapping. Could it be the boundary semantic issue ? > > > > > > > > For instance can it be that since 0xffffffffc0002000 is given to the first > > > > module by the allocator, scsi_mod, and since that address is *technically* > > > > part of two boundaries we get a splat ? > > > > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > Note on the latest linux-next and on the commit that introduced this the config > > > and kernel yields only *one* page: > > > > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > I believe this is more indications my suspicion might be right. > > > > If the following is a legit forced way to get query the kernel to ask it > > who owns a page then perhaps this technique can be used in the future to > > figure out who the hell caused this. Catalin, can you confirm? In this > > case this is perhaps not a leaked page but I am trying to abuse the > > kmemleak debugfs API to query who allocated the page. Is that fine? > > > > [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 > > [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 > > [ 0.918502] Modules linked in: > > [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 > > [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > [ 0.920011] Call Trace: > > [ 0.920011] dump_stack+0x63/0x81 > > [ 0.920011] __warn+0xcb/0xf0 > > [ 0.920011] warn_slowpath_fmt+0x5a/0x80 > > [ 0.920011] note_page+0x63c/0x7e0 > > [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 > > [ 0.920011] ? 0xffffffff86c00000 > > [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > [ 0.920011] mark_rodata_ro+0xf4/0x100 > > [ 0.920011] ? rest_init+0x80/0x80 > > [ 0.920011] kernel_init+0x2a/0x100 > > [ 0.920011] ret_from_fork+0x2c/0x40 > > [ 0.925474] ---[ end trace dca00cd779490a2b ]--- > > [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak > > dmesg | tail > > > > [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): > > [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 > > [ 49.212148] kmemleak: min_count = 2 > > [ 49.212852] kmemleak: count = 0 > > [ 49.213363] kmemleak: flags = 0x1 > > [ 49.213363] kmemleak: checksum = 0 > > [ 49.213363] kmemleak: backtrace: > > [ 49.213363] kmemleak_alloc+0x4a/0xa0 > > [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 > > [ 49.213363] module_alloc+0x67/0xc0 > > [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 > > [ 49.213363] ftrace_startup+0x90/0x210 > > [ 49.213363] register_ftrace_function+0x4b/0x60 > > [ 49.213363] arm_kprobe+0x84/0xe0 > > [ 49.213363] register_kprobe+0x56e/0x5b0 > > [ 49.213363] init_test_probes+0x61/0x560 > > [ 49.213363] init_kprobes+0x1e3/0x206 > > [ 49.213363] do_one_initcall+0x52/0x1a0 > > [ 49.213363] kernel_init_freeable+0x178/0x200 > > [ 49.213363] kernel_init+0xe/0x100 > > [ 49.213363] ret_from_fork+0x2c/0x40 > > [ 49.213363] 0xffffffffffffffff > > Aha! And the winner is: > > CONFIG_KPROBES_SANITY_TEST > > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN. > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields > the same trace for both of these kernels. > > So -- the above kmemleak hack seems to actually work to seek who owns that page. > > Now to figure out how the hell kernel/test_kprobes.c screws around with things. Ah, that was fixed recently; https://marc.info/?l=linux-kernel&m=149076389011850 Note that this patch depends another patch in the series; https://marc.info/?l=linux-kernel&m=149076370111812&w=2 Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-20 2:38 ` Masami Hiramatsu @ 2017-05-23 14:48 ` Luis R. Rodriguez 2017-05-24 17:55 ` Luis R. Rodriguez 0 siblings, 1 reply; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-23 14:48 UTC (permalink / raw) To: Masami Hiramatsu Cc: Luis R. Rodriguez, Jim Keniston, davem, sagar.abhishek, Catalin Marinas, Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote: > Hi Luis, > > On Fri, 19 May 2017 19:28:54 +0200 > "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > > > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: > > > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote: > > > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > > > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > > > > Yes, but I had killed that boot session again, so upon my next boot > > > > > > > I had a different layout, the ASLR gap was much larger: > > > > > > > > > > > > > > ---[ Modules ]--- > > > > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > > > > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > > > > > warned about: > > > > > > > > > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > > > > > [ 1.451280] ------------[ cut here ]------------ > > > > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > > > > > [ 1.452499] Modules linked in: > > > > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > > > > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > > > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > > > > > and it seems its typically output backwards from when that happened, sorting that > > > > > > > by address we get: > > > > > > > > > > > > Right, sorry, I'd expect it at the bottom of the list in > > > > > > /proc/modules, but that's fine, it's there. > > > > > > > > > > > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > > > > > > > > > And this then seems to be the first module loaded: > > > > > > > > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > > > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > > > > > as per above. > > > > > > > > > > > > > >> Something touched the module gap and left is RW+x... > > > > > > > > > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > > > > > > > > > Is it possible a module got loaded before e1000 and then unloaded? > > > > > > That seems odd, but maybe unload isn't cleaning up? > > > > > > > > > > > > >> Are you able to bisect this? > > > > > > > > > > > > > > This issue has been present for a while so since I recall this I might be > > > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > > > > > a bit and if no clear culprit comes up then will try bisect. > > > > > > > > > > > > Okay, thanks! > > > > > > > > > > Sorry to report that this issue is present since the feature's addition. So > > > > > the issue is there since its addition and is still present today. *But* it > > > > > may also be a configuration issue, given I have booted this guest *without* > > > > > this issue ... > > > > > > > > > > So: > > > > > > > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > > > > > > > > > That boots with the warning. To help debug further I've minimized my modules > > > > > to only a few: scsi_mod, e1000, libata. > > > > > > > > > > I suspect at this point this is not the fault of a particular module but > > > > > instead just an accounting semantic (>= or <= on an edge) but let's see. > > > > > > > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > > > > > mappings") and I with: > > > > > > > > > > [ 0.949435] ------------[ cut here ]------------ > > > > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > > > > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > > > > > [ 0.951814] Modules linked in: > > > > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > > > > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > > > > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > > > > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > > > > > [ 0.956256] Call Trace: > > > > > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f > > > > > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0 > > > > > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80 > > > > > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0 > > > > > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410 > > > > > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > > > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100 > > > > > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > > > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0 > > > > > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70 > > > > > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80 > > > > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > > > > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > > > > > > > > ---[ High Kernel Mapping ]--- > > > > > 0xffffffff80000000-0xffffffff81000000 16M pmd > > > > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > > > > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > > > > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > > > > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > > > > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > > > > > 0xffffffff82400000-0xffffffffc0000000 988M pmd > > > > > ---[ Modules ]--- > > > > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > > > > > e1000 127757 0 - Live 0xffffffffc004d000 (E) > > > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > > > > > > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge > > > > > for a particular reason and it also happens to be on the edge of the high > > > > > kernel mapping. Could it be the boundary semantic issue ? > > > > > > > > > > For instance can it be that since 0xffffffffc0002000 is given to the first > > > > > module by the allocator, scsi_mod, and since that address is *technically* > > > > > part of two boundaries we get a splat ? > > > > > > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > > > Note on the latest linux-next and on the commit that introduced this the config > > > > and kernel yields only *one* page: > > > > > > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > I believe this is more indications my suspicion might be right. > > > > > > If the following is a legit forced way to get query the kernel to ask it > > > who owns a page then perhaps this technique can be used in the future to > > > figure out who the hell caused this. Catalin, can you confirm? In this > > > case this is perhaps not a leaked page but I am trying to abuse the > > > kmemleak debugfs API to query who allocated the page. Is that fine? > > > > > > [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 > > > [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 > > > [ 0.918502] Modules linked in: > > > [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 > > > [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > [ 0.920011] Call Trace: > > > [ 0.920011] dump_stack+0x63/0x81 > > > [ 0.920011] __warn+0xcb/0xf0 > > > [ 0.920011] warn_slowpath_fmt+0x5a/0x80 > > > [ 0.920011] note_page+0x63c/0x7e0 > > > [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 > > > [ 0.920011] ? 0xffffffff86c00000 > > > [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > [ 0.920011] mark_rodata_ro+0xf4/0x100 > > > [ 0.920011] ? rest_init+0x80/0x80 > > > [ 0.920011] kernel_init+0x2a/0x100 > > > [ 0.920011] ret_from_fork+0x2c/0x40 > > > [ 0.925474] ---[ end trace dca00cd779490a2b ]--- > > > [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak > > > dmesg | tail > > > > > > [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): > > > [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 > > > [ 49.212148] kmemleak: min_count = 2 > > > [ 49.212852] kmemleak: count = 0 > > > [ 49.213363] kmemleak: flags = 0x1 > > > [ 49.213363] kmemleak: checksum = 0 > > > [ 49.213363] kmemleak: backtrace: > > > [ 49.213363] kmemleak_alloc+0x4a/0xa0 > > > [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 > > > [ 49.213363] module_alloc+0x67/0xc0 > > > [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 > > > [ 49.213363] ftrace_startup+0x90/0x210 > > > [ 49.213363] register_ftrace_function+0x4b/0x60 > > > [ 49.213363] arm_kprobe+0x84/0xe0 > > > [ 49.213363] register_kprobe+0x56e/0x5b0 > > > [ 49.213363] init_test_probes+0x61/0x560 > > > [ 49.213363] init_kprobes+0x1e3/0x206 > > > [ 49.213363] do_one_initcall+0x52/0x1a0 > > > [ 49.213363] kernel_init_freeable+0x178/0x200 > > > [ 49.213363] kernel_init+0xe/0x100 > > > [ 49.213363] ret_from_fork+0x2c/0x40 > > > [ 49.213363] 0xffffffffffffffff > > > > Aha! And the winner is: > > > > CONFIG_KPROBES_SANITY_TEST > > > > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN. > > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields > > the same trace for both of these kernels. > > > > So -- the above kmemleak hack seems to actually work to seek who owns that page. > > > > Now to figure out how the hell kernel/test_kprobes.c screws around with things. > > Ah, that was fixed recently; > > https://marc.info/?l=linux-kernel&m=149076389011850 > > Note that this patch depends another patch in the series; > > https://marc.info/?l=linux-kernel&m=149076370111812&w=2 I actually boot tested linux-next tag next-20170519 which carries these patches and the WARNING still is there. Please note the issue is with CONFIG_KPROBES_SANITY_TEST enabled. [ 1.025601] x86/mm: Found insecure W+X mapping at address ffffffffc01e7000/0xffffffffc01e7000 [ 1.026429] ------------[ cut here ]------------ [ 1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 [ 1.027711] Modules linked in: [ 1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170519 #151 [ 1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000 [ 1.030509] RIP: 0010:note_page+0x630/0x7e0 [ 1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286 [ 1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708 [ 1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 [ 1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161 [ 1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000 [ 1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 [ 1.034927] FS: 0000000000000000(0000) GS:ffff9fd47fc80000(0000) knlGS:0000000000000000 [ 1.035722] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0 [ 1.036839] Call Trace: [ 1.037034] ptdump_walk_pgd_level_core+0x3e7/0x490 [ 1.037367] ? 0xffffffffbaa00000 [ 1.037705] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 1.038187] mark_rodata_ro+0xf4/0x100 [ 1.038559] ? rest_init+0x80/0x80 [ 1.038890] kernel_init+0x2f/0x100 [ 1.039235] ret_from_fork+0x2c/0x40 [ 1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02 [ 1.041416] ---[ end trace e726c1b63e5a81a9 ]--- [ 1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak On dmesg: May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335): May 23 07:44:51 piggy kernel: kmemleak: comm "swapper/0", pid 1, jiffies 4294892451 May 23 07:44:51 piggy kernel: kmemleak: min_count = 2 May 23 07:44:51 piggy kernel: kmemleak: count = 2 May 23 07:44:51 piggy kernel: kmemleak: flags = 0x1 May 23 07:44:51 piggy kernel: kmemleak: checksum = 0 May 23 07:44:51 piggy kernel: kmemleak: backtrace: May 23 07:44:51 piggy kernel: kmemleak_alloc+0x4a/0xa0 May 23 07:44:51 piggy kernel: __vmalloc_node_range+0x20c/0x2b0 May 23 07:44:51 piggy kernel: module_alloc+0x67/0xc0 May 23 07:44:51 piggy kernel: arch_ftrace_update_trampoline+0xc1/0x240 May 23 07:44:51 piggy kernel: ftrace_startup+0x92/0x210 May 23 07:44:51 piggy kernel: register_ftrace_function+0x4b/0x60 May 23 07:44:51 piggy kernel: arm_kprobe+0x84/0xc0 May 23 07:44:51 piggy kernel: register_kprobe+0x59c/0x5e0 May 23 07:44:51 piggy kernel: init_test_probes+0x61/0x560 May 23 07:44:51 piggy kernel: init_kprobes+0x1ea/0x20d May 23 07:44:51 piggy kernel: do_one_initcall+0x52/0x1a0 May 23 07:44:51 piggy kernel: kernel_init_freeable+0x17d/0x205 May 23 07:44:51 piggy kernel: kernel_init+0xe/0x100 May 23 07:44:51 piggy kernel: ret_from_fork+0x2c/0x40 May 23 07:44:51 piggy kernel: 0xffffffffffffffff Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-23 14:48 ` Luis R. Rodriguez @ 2017-05-24 17:55 ` Luis R. Rodriguez 0 siblings, 0 replies; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-24 17:55 UTC (permalink / raw) To: Luis R. Rodriguez, Thomas Gleixner Cc: Masami Hiramatsu, Jim Keniston, davem, sagar.abhishek, Catalin Marinas, Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Tue, May 23, 2017 at 04:48:50PM +0200, Luis R. Rodriguez wrote: > On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote: > > Hi Luis, > > > > On Fri, 19 May 2017 19:28:54 +0200 > > "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > > > > > > Aha! And the winner is: > > > > > > CONFIG_KPROBES_SANITY_TEST > > > > > > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN. > > > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields > > > the same trace for both of these kernels. > > > > > > So -- the above kmemleak hack seems to actually work to seek who owns that page. > > > > > > Now to figure out how the hell kernel/test_kprobes.c screws around with things. > > > > Ah, that was fixed recently; > > > > https://marc.info/?l=linux-kernel&m=149076389011850 > > > > Note that this patch depends another patch in the series; > > > > https://marc.info/?l=linux-kernel&m=149076370111812&w=2 > > I actually boot tested linux-next tag next-20170519 which carries these > patches and the WARNING still is there. Please note the issue is with > CONFIG_KPROBES_SANITY_TEST enabled. > > [ 1.025601] x86/mm: Found insecure W+X mapping at address ffffffffc01e7000/0xffffffffc01e7000 > [ 1.026429] ------------[ cut here ]------------ > [ 1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > [ 1.027711] Modules linked in: > [ 1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170519 #151 > [ 1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000 > [ 1.030509] RIP: 0010:note_page+0x630/0x7e0 > [ 1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286 > [ 1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708 > [ 1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 > [ 1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161 > [ 1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000 > [ 1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 > [ 1.034927] FS: 0000000000000000(0000) GS:ffff9fd47fc80000(0000) knlGS:0000000000000000 > [ 1.035722] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0 > [ 1.036839] Call Trace: > [ 1.037034] ptdump_walk_pgd_level_core+0x3e7/0x490 > [ 1.037367] ? 0xffffffffbaa00000 > [ 1.037705] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 1.038187] mark_rodata_ro+0xf4/0x100 > [ 1.038559] ? rest_init+0x80/0x80 > [ 1.038890] kernel_init+0x2f/0x100 > [ 1.039235] ret_from_fork+0x2c/0x40 > [ 1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02 > [ 1.041416] ---[ end trace e726c1b63e5a81a9 ]--- > [ 1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak > > On dmesg: > > May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335): > May 23 07:44:51 piggy kernel: kmemleak: comm "swapper/0", pid 1, jiffies 4294892451 > May 23 07:44:51 piggy kernel: kmemleak: min_count = 2 > May 23 07:44:51 piggy kernel: kmemleak: count = 2 > May 23 07:44:51 piggy kernel: kmemleak: flags = 0x1 > May 23 07:44:51 piggy kernel: kmemleak: checksum = 0 > May 23 07:44:51 piggy kernel: kmemleak: backtrace: > May 23 07:44:51 piggy kernel: kmemleak_alloc+0x4a/0xa0 > May 23 07:44:51 piggy kernel: __vmalloc_node_range+0x20c/0x2b0 > May 23 07:44:51 piggy kernel: module_alloc+0x67/0xc0 > May 23 07:44:51 piggy kernel: arch_ftrace_update_trampoline+0xc1/0x240 > May 23 07:44:51 piggy kernel: ftrace_startup+0x92/0x210 > May 23 07:44:51 piggy kernel: register_ftrace_function+0x4b/0x60 > May 23 07:44:51 piggy kernel: arm_kprobe+0x84/0xc0 > May 23 07:44:51 piggy kernel: register_kprobe+0x59c/0x5e0 > May 23 07:44:51 piggy kernel: init_test_probes+0x61/0x560 > May 23 07:44:51 piggy kernel: init_kprobes+0x1ea/0x20d > May 23 07:44:51 piggy kernel: do_one_initcall+0x52/0x1a0 > May 23 07:44:51 piggy kernel: kernel_init_freeable+0x17d/0x205 > May 23 07:44:51 piggy kernel: kernel_init+0xe/0x100 > May 23 07:44:51 piggy kernel: ret_from_fork+0x2c/0x40 > May 23 07:44:51 piggy kernel: 0xffffffffffffffff Turns out that Thomas Gleixner's patch from today [0] fixes this as the same module_alloc() path was the culprit of the issue. Steven Rostedt however just reported that this patch crashes on his ftracetests, so it would seem we just need to address that last kink to fix this properly. We can take this further on, in that other thread. [0] https://lkml.kernel.org/r/alpine.DEB.2.20.1705241459480.2201@nanos [1] https://lkml.kernel.org/r/20170524134728.61a896c9@vmware.local.home Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 15:40 ` Luis R. Rodriguez 2017-05-19 17:28 ` Luis R. Rodriguez @ 2017-05-19 17:35 ` Catalin Marinas 2017-05-19 18:27 ` Andy Lutomirski 2017-05-26 22:13 ` Luis R. Rodriguez 1 sibling, 2 replies; 21+ messages in thread From: Catalin Marinas @ 2017-05-19 17:35 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: > If the following is a legit forced way to get query the kernel to ask it > who owns a page then perhaps this technique can be used in the future to > figure out who the hell caused this. Catalin, can you confirm? In this > case this is perhaps not a leaked page but I am trying to abuse the > kmemleak debugfs API to query who allocated the page. Is that fine? > > [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 > [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 > [ 0.918502] Modules linked in: > [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 > [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 0.920011] Call Trace: > [ 0.920011] dump_stack+0x63/0x81 > [ 0.920011] __warn+0xcb/0xf0 > [ 0.920011] warn_slowpath_fmt+0x5a/0x80 > [ 0.920011] note_page+0x63c/0x7e0 > [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 > [ 0.920011] ? 0xffffffff86c00000 > [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 0.920011] mark_rodata_ro+0xf4/0x100 > [ 0.920011] ? rest_init+0x80/0x80 > [ 0.920011] kernel_init+0x2a/0x100 > [ 0.920011] ret_from_fork+0x2c/0x40 > [ 0.925474] ---[ end trace dca00cd779490a2b ]--- > [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak > dmesg | tail > > [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): > [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 > [ 49.212148] kmemleak: min_count = 2 > [ 49.212852] kmemleak: count = 0 > [ 49.213363] kmemleak: flags = 0x1 > [ 49.213363] kmemleak: checksum = 0 > [ 49.213363] kmemleak: backtrace: > [ 49.213363] kmemleak_alloc+0x4a/0xa0 > [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 > [ 49.213363] module_alloc+0x67/0xc0 > [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 > [ 49.213363] ftrace_startup+0x90/0x210 > [ 49.213363] register_ftrace_function+0x4b/0x60 > [ 49.213363] arm_kprobe+0x84/0xe0 > [ 49.213363] register_kprobe+0x56e/0x5b0 > [ 49.213363] init_test_probes+0x61/0x560 > [ 49.213363] init_kprobes+0x1e3/0x206 > [ 49.213363] do_one_initcall+0x52/0x1a0 > [ 49.213363] kernel_init_freeable+0x178/0x200 > [ 49.213363] kernel_init+0xe/0x100 > [ 49.213363] ret_from_fork+0x2c/0x40 > [ 49.213363] 0xffffffffffffffff You could as well use kmemleak this way since it tracks the memory allocations. However, it doesn't track alloc_pages and also doesn't track mapping existing pages (vmap etc.) -- Catalin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 17:35 ` Catalin Marinas @ 2017-05-19 18:27 ` Andy Lutomirski 2017-05-19 19:16 ` Kees Cook 2017-05-26 22:13 ` Luis R. Rodriguez 1 sibling, 1 reply; 21+ messages in thread From: Andy Lutomirski @ 2017-05-19 18:27 UTC (permalink / raw) To: Catalin Marinas Cc: Luis R. Rodriguez, Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 10:35 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: >> If the following is a legit forced way to get query the kernel to ask it >> who owns a page then perhaps this technique can be used in the future to >> figure out who the hell caused this. Catalin, can you confirm? In this >> case this is perhaps not a leaked page but I am trying to abuse the >> kmemleak debugfs API to query who allocated the page. Is that fine? >> >> [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 >> [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 >> [ 0.918502] Modules linked in: >> [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 >> [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 >> [ 0.920011] Call Trace: >> [ 0.920011] dump_stack+0x63/0x81 >> [ 0.920011] __warn+0xcb/0xf0 >> [ 0.920011] warn_slowpath_fmt+0x5a/0x80 >> [ 0.920011] note_page+0x63c/0x7e0 >> [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 >> [ 0.920011] ? 0xffffffff86c00000 >> [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 >> [ 0.920011] mark_rodata_ro+0xf4/0x100 >> [ 0.920011] ? rest_init+0x80/0x80 >> [ 0.920011] kernel_init+0x2a/0x100 >> [ 0.920011] ret_from_fork+0x2c/0x40 >> [ 0.925474] ---[ end trace dca00cd779490a2b ]--- >> [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. >> >> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak >> dmesg | tail >> >> [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): >> [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 >> [ 49.212148] kmemleak: min_count = 2 >> [ 49.212852] kmemleak: count = 0 >> [ 49.213363] kmemleak: flags = 0x1 >> [ 49.213363] kmemleak: checksum = 0 >> [ 49.213363] kmemleak: backtrace: >> [ 49.213363] kmemleak_alloc+0x4a/0xa0 >> [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 >> [ 49.213363] module_alloc+0x67/0xc0 >> [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 >> [ 49.213363] ftrace_startup+0x90/0x210 >> [ 49.213363] register_ftrace_function+0x4b/0x60 >> [ 49.213363] arm_kprobe+0x84/0xe0 >> [ 49.213363] register_kprobe+0x56e/0x5b0 >> [ 49.213363] init_test_probes+0x61/0x560 >> [ 49.213363] init_kprobes+0x1e3/0x206 >> [ 49.213363] do_one_initcall+0x52/0x1a0 >> [ 49.213363] kernel_init_freeable+0x178/0x200 >> [ 49.213363] kernel_init+0xe/0x100 >> [ 49.213363] ret_from_fork+0x2c/0x40 >> [ 49.213363] 0xffffffffffffffff > > You could as well use kmemleak this way since it tracks the memory > allocations. However, it doesn't track alloc_pages and also doesn't > track mapping existing pages (vmap etc.) One thing I've pondered: can we make some debugging mode (kmemleak, perhaps?) check that freed memory is RW at the time it's freed? I once wrote some buggy code that freed an R page and caused an OOPS much later, and this bug here seems likely to be some code that frees RWX memory. --Andy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 18:27 ` Andy Lutomirski @ 2017-05-19 19:16 ` Kees Cook 2017-05-19 19:18 ` Andy Lutomirski 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2017-05-19 19:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Catalin Marinas, Luis R. Rodriguez, Steven Rostedt, Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote: > One thing I've pondered: can we make some debugging mode (kmemleak, > perhaps?) check that freed memory is RW at the time it's freed? I > once wrote some buggy code that freed an R page and caused an OOPS > much later, and this bug here seems likely to be some code that frees > RWX memory. Which begs for even more checks: nothing should ever make a page RWX. Either R, RW, or RX only... (or X too I guess, in the future). -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 19:16 ` Kees Cook @ 2017-05-19 19:18 ` Andy Lutomirski 2017-05-19 19:29 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Andy Lutomirski @ 2017-05-19 19:18 UTC (permalink / raw) To: Kees Cook Cc: Andy Lutomirski, Catalin Marinas, Luis R. Rodriguez, Steven Rostedt, Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 12:16 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote: >> One thing I've pondered: can we make some debugging mode (kmemleak, >> perhaps?) check that freed memory is RW at the time it's freed? I >> once wrote some buggy code that freed an R page and caused an OOPS >> much later, and this bug here seems likely to be some code that frees >> RWX memory. > > Which begs for even more checks: nothing should ever make a page RWX. > Either R, RW, or RX only... (or X too I guess, in the future). I could see pages being RWX temporarily during boot. OTOH if we ban RWX outright (after very early boot, anyway), then catching code that messes up and leaves pages RWX gets much easier. --Andy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 19:18 ` Andy Lutomirski @ 2017-05-19 19:29 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2017-05-19 19:29 UTC (permalink / raw) To: Andy Lutomirski, Laura Abbott Cc: Catalin Marinas, Luis R. Rodriguez, Steven Rostedt, Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML On Fri, May 19, 2017 at 12:18 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Fri, May 19, 2017 at 12:16 PM, Kees Cook <keescook@chromium.org> wrote: >> On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote: >>> One thing I've pondered: can we make some debugging mode (kmemleak, >>> perhaps?) check that freed memory is RW at the time it's freed? I >>> once wrote some buggy code that freed an R page and caused an OOPS >>> much later, and this bug here seems likely to be some code that frees >>> RWX memory. >> >> Which begs for even more checks: nothing should ever make a page RWX. >> Either R, RW, or RX only... (or X too I guess, in the future). > > I could see pages being RWX temporarily during boot. OTOH if we ban > RWX outright (after very early boot, anyway), then catching code that > messes up and leaves pages RWX gets much easier. Right, early boot is kind of special. It'd be nice to have there, but I meant during normal runtime. We'd probably need to adjust set_memory_rw/ro/nx/x around to have the correct side-effects, instead of just controlling specific bits: set_memory_rw() (RW_) set_memory_ro() (R__) set_memory_rx() (R_X) set_memory_x() (__X) That kind of refactoring might be not _too_ bad: - add set_memory_rx() - s/\bset_memory_x\b/set_memory_rx/g - fix what breaks from expecting writable-executable memory - adjust set_memory_rw() to drop x - fix what breaks from expecting writable-executable memory - adjust set_memory_ro() to drop x - fix what breaks from expecting executable memory - add set_memory_x() some day... -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-19 17:35 ` Catalin Marinas 2017-05-19 18:27 ` Andy Lutomirski @ 2017-05-26 22:13 ` Luis R. Rodriguez 1 sibling, 0 replies; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-26 22:13 UTC (permalink / raw) To: Catalin Marinas Cc: Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML, Luis R. Rodriguez On Fri, May 19, 2017 at 10:35 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: >> If the following is a legit forced way to get query the kernel to ask it >> who owns a page then perhaps this technique can be used in the future to >> figure out who the hell caused this. Catalin, can you confirm? In this >> case this is perhaps not a leaked page but I am trying to abuse the >> kmemleak debugfs API to query who allocated the page. Is that fine? >> >> [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 >> [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 >> [ 0.918502] Modules linked in: >> [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 >> [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 >> [ 0.920011] Call Trace: >> [ 0.920011] dump_stack+0x63/0x81 >> [ 0.920011] __warn+0xcb/0xf0 >> [ 0.920011] warn_slowpath_fmt+0x5a/0x80 >> [ 0.920011] note_page+0x63c/0x7e0 >> [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 >> [ 0.920011] ? 0xffffffff86c00000 >> [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 >> [ 0.920011] mark_rodata_ro+0xf4/0x100 >> [ 0.920011] ? rest_init+0x80/0x80 >> [ 0.920011] kernel_init+0x2a/0x100 >> [ 0.920011] ret_from_fork+0x2c/0x40 >> [ 0.925474] ---[ end trace dca00cd779490a2b ]--- >> [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. >> >> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak >> dmesg | tail >> >> [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): >> [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 >> [ 49.212148] kmemleak: min_count = 2 >> [ 49.212852] kmemleak: count = 0 >> [ 49.213363] kmemleak: flags = 0x1 >> [ 49.213363] kmemleak: checksum = 0 >> [ 49.213363] kmemleak: backtrace: >> [ 49.213363] kmemleak_alloc+0x4a/0xa0 >> [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 >> [ 49.213363] module_alloc+0x67/0xc0 >> [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 >> [ 49.213363] ftrace_startup+0x90/0x210 >> [ 49.213363] register_ftrace_function+0x4b/0x60 >> [ 49.213363] arm_kprobe+0x84/0xe0 >> [ 49.213363] register_kprobe+0x56e/0x5b0 >> [ 49.213363] init_test_probes+0x61/0x560 >> [ 49.213363] init_kprobes+0x1e3/0x206 >> [ 49.213363] do_one_initcall+0x52/0x1a0 >> [ 49.213363] kernel_init_freeable+0x178/0x200 >> [ 49.213363] kernel_init+0xe/0x100 >> [ 49.213363] ret_from_fork+0x2c/0x40 >> [ 49.213363] 0xffffffffffffffff > > You could as well use kmemleak this way since it tracks the memory > allocations. Great! > However, it doesn't track alloc_pages and also doesn't > track mapping existing pages (vmap etc.) Can we verify that? If so then the splat from the above complaint could include a follow up dump of the trace, no ? That's *much* more useful. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 2017-05-15 22:15 ` Luis R. Rodriguez 2017-05-15 22:57 ` Kees Cook @ 2017-05-15 23:30 ` Luis R. Rodriguez 1 sibling, 0 replies; 21+ messages in thread From: Luis R. Rodriguez @ 2017-05-15 23:30 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko, Andrew Morton, Kees Cook, Eric W. Biederman, Mateusz Guzik, linux-kernel On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote: >> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty) >> >> I will try updating my distro package for qemu and see if perhaps its this >> and for the other odd fork issue I reported [0]. >> >> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com > > Yeah nope, using my distribution latest: > > QEMU emulator version 2.8.0(openSUSE Tumbleweed) > > And still both issues are present. FWIW also compiled and tried to boot with the latest qemu, v2.9.0-rc5 and it also has both issues, so I don't think this is because of the version of qemu. Luis ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-05-27 1:19 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez 2017-05-15 22:15 ` Luis R. Rodriguez 2017-05-15 22:57 ` Kees Cook 2017-05-15 23:45 ` Luis R. Rodriguez 2017-05-16 0:12 ` Kees Cook 2017-05-17 16:40 ` Luis R. Rodriguez 2017-05-17 17:53 ` Kees Cook 2017-05-19 0:44 ` Luis R. Rodriguez 2017-05-19 3:08 ` Luis R. Rodriguez 2017-05-19 15:40 ` Luis R. Rodriguez 2017-05-19 17:28 ` Luis R. Rodriguez 2017-05-20 2:38 ` Masami Hiramatsu 2017-05-23 14:48 ` Luis R. Rodriguez 2017-05-24 17:55 ` Luis R. Rodriguez 2017-05-19 17:35 ` Catalin Marinas 2017-05-19 18:27 ` Andy Lutomirski 2017-05-19 19:16 ` Kees Cook 2017-05-19 19:18 ` Andy Lutomirski 2017-05-19 19:29 ` Kees Cook 2017-05-26 22:13 ` Luis R. Rodriguez 2017-05-15 23:30 ` Luis R. Rodriguez
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.