From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030748AbdEWOsz (ORCPT ); Tue, 23 May 2017 10:48:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:45838 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S966845AbdEWOsx (ORCPT ); Tue, 23 May 2017 10:48:53 -0400 Date: Tue, 23 May 2017 16:48:50 +0200 From: "Luis R. Rodriguez" To: Masami Hiramatsu Cc: "Luis R. Rodriguez" , Jim Keniston , davem@davemloft.net, sagar.abhishek@gmail.com, Catalin Marinas , Steven Rostedt , Kees Cook , Stephen Smalley , Ingo Molnar , Andy Lutomirski , Michal Hocko , Vlastimil Babka , Andrew Morton , "Eric W. Biederman" , Mateusz Guzik , LKML Subject: Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Message-ID: <20170523144850.GP8951@wotan.suse.de> References: <20170517164017.GP17314@wotan.suse.de> <20170519004414.GD8951@wotan.suse.de> <20170519030802.GE8951@wotan.suse.de> <20170519154016.GH8951@wotan.suse.de> <20170519172854.GK8951@wotan.suse.de> <20170520113850.933fff9e9d7f140258544cad@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170520113850.933fff9e9d7f140258544cad@kernel.org> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote: > Hi Luis, > > On Fri, 19 May 2017 19:28:54 +0200 > "Luis R. Rodriguez" wrote: > > > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote: > > > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote: > > > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote: > > > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote: > > > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez wrote: > > > > > > > Yes, but I had killed that boot session again, so upon my next boot > > > > > > > I had a different layout, the ASLR gap was much larger: > > > > > > > > > > > > > > ---[ Modules ]--- > > > > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte > > > > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte > > > > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte > > > > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte > > > > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte > > > > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte > > > > > > > > > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot > > > > > > > warned about: > > > > > > > > > > > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000 > > > > > > > [ 1.451280] ------------[ cut here ]------------ > > > > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 > > > > > > > [ 1.452499] Modules linked in: > > > > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145 > > > > > > > > > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one > > > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific > > > > > > > order other than perhaps being pegged into a linked list of modules once they go live, > > > > > > > and it seems its typically output backwards from when that happened, sorting that > > > > > > > by address we get: > > > > > > > > > > > > Right, sorry, I'd expect it at the bottom of the list in > > > > > > /proc/modules, but that's fine, it's there. > > > > > > > > > > > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E) > > > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E) > > > > > > > > > > > > > > And this then seems to be the first module loaded: > > > > > > > > > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E) > > > > > > > > > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted > > > > > > > as per above. > > > > > > > > > > > > > >> Something touched the module gap and left is RW+x... > > > > > > > > > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes. > > > > > > > > > > > > Is it possible a module got loaded before e1000 and then unloaded? > > > > > > That seems odd, but maybe unload isn't cleaning up? > > > > > > > > > > > > >> Are you able to bisect this? > > > > > > > > > > > > > > This issue has been present for a while so since I recall this I might be > > > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker > > > > > > > a bit and if no clear culprit comes up then will try bisect. > > > > > > > > > > > > Okay, thanks! > > > > > > > > > > Sorry to report that this issue is present since the feature's addition. So > > > > > the issue is there since its addition and is still present today. *But* it > > > > > may also be a configuration issue, given I have booted this guest *without* > > > > > this issue ... > > > > > > > > > > So: > > > > > > > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db > > > > > > > > > > That boots with the warning. To help debug further I've minimized my modules > > > > > to only a few: scsi_mod, e1000, libata. > > > > > > > > > > I suspect at this point this is not the fault of a particular module but > > > > > instead just an accounting semantic (>= or <= on an edge) but let's see. > > > > > > > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X > > > > > mappings") and I with: > > > > > > > > > > [ 0.949435] ------------[ cut here ]------------ > > > > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0() > > > > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000 > > > > > [ 0.951814] Modules linked in: > > > > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365 > > > > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335 > > > > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90 > > > > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > > > > > [ 0.956256] Call Trace: > > > > > [ 0.956496] [] dump_stack+0x44/0x5f > > > > > [ 0.956953] [] warn_slowpath_common+0x82/0xc0 > > > > > [ 0.957519] [] warn_slowpath_fmt+0x5c/0x80 > > > > > [ 0.958066] [] note_page+0x635/0x7e0 > > > > > [ 0.958595] [] ptdump_walk_pgd_level_core+0x2eb/0x410 > > > > > [ 0.959219] [] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > > > [ 0.959856] [] mark_rodata_ro+0xed/0x100 > > > > > [ 0.960372] [] ? rest_init+0x80/0x80 > > > > > [ 0.960869] [] kernel_init+0x1d/0xe0 > > > > > [ 0.961358] [] ret_from_fork+0x3f/0x70 > > > > > [ 0.961900] [] ? rest_init+0x80/0x80 > > > > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]--- > > > > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > > > > > > > > ---[ High Kernel Mapping ]--- > > > > > 0xffffffff80000000-0xffffffff81000000 16M pmd > > > > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd > > > > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd > > > > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte > > > > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd > > > > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte > > > > > 0xffffffff82400000-0xffffffffc0000000 988M pmd > > > > > ---[ Modules ]--- > > > > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3 > > > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E) > > > > > e1000 127757 0 - Live 0xffffffffc004d000 (E) > > > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) > > > > > > > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge > > > > > for a particular reason and it also happens to be on the edge of the high > > > > > kernel mapping. Could it be the boundary semantic issue ? > > > > > > > > > > For instance can it be that since 0xffffffffc0002000 is given to the first > > > > > module by the allocator, scsi_mod, and since that address is *technically* > > > > > part of two boundaries we get a splat ? > > > > > > > > > > 0xffffffffc0001000-0xffffffffc0002000 4K pte > > > > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte > > > > > > > > Note on the latest linux-next and on the commit that introduced this the config > > > > and kernel yields only *one* page: > > > > > > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > > > I believe this is more indications my suspicion might be right. > > > > > > If the following is a legit forced way to get query the kernel to ask it > > > who owns a page then perhaps this technique can be used in the future to > > > figure out who the hell caused this. Catalin, can you confirm? In this > > > case this is perhaps not a leaked page but I am trying to abuse the > > > kmemleak debugfs API to query who allocated the page. Is that fine? > > > > > > [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0 > > > [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000 > > > [ 0.918502] Modules linked in: > > > [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340 > > > [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > > > [ 0.920011] Call Trace: > > > [ 0.920011] dump_stack+0x63/0x81 > > > [ 0.920011] __warn+0xcb/0xf0 > > > [ 0.920011] warn_slowpath_fmt+0x5a/0x80 > > > [ 0.920011] note_page+0x63c/0x7e0 > > > [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460 > > > [ 0.920011] ? 0xffffffff86c00000 > > > [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > > [ 0.920011] mark_rodata_ro+0xf4/0x100 > > > [ 0.920011] ? rest_init+0x80/0x80 > > > [ 0.920011] kernel_init+0x2a/0x100 > > > [ 0.920011] ret_from_fork+0x2c/0x40 > > > [ 0.925474] ---[ end trace dca00cd779490a2b ]--- > > > [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. > > > > > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak > > > dmesg | tail > > > > > > [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335): > > > [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440 > > > [ 49.212148] kmemleak: min_count = 2 > > > [ 49.212852] kmemleak: count = 0 > > > [ 49.213363] kmemleak: flags = 0x1 > > > [ 49.213363] kmemleak: checksum = 0 > > > [ 49.213363] kmemleak: backtrace: > > > [ 49.213363] kmemleak_alloc+0x4a/0xa0 > > > [ 49.213363] __vmalloc_node_range+0x20a/0x2b0 > > > [ 49.213363] module_alloc+0x67/0xc0 > > > [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260 > > > [ 49.213363] ftrace_startup+0x90/0x210 > > > [ 49.213363] register_ftrace_function+0x4b/0x60 > > > [ 49.213363] arm_kprobe+0x84/0xe0 > > > [ 49.213363] register_kprobe+0x56e/0x5b0 > > > [ 49.213363] init_test_probes+0x61/0x560 > > > [ 49.213363] init_kprobes+0x1e3/0x206 > > > [ 49.213363] do_one_initcall+0x52/0x1a0 > > > [ 49.213363] kernel_init_freeable+0x178/0x200 > > > [ 49.213363] kernel_init+0xe/0x100 > > > [ 49.213363] ret_from_fork+0x2c/0x40 > > > [ 49.213363] 0xffffffffffffffff > > > > Aha! And the winner is: > > > > CONFIG_KPROBES_SANITY_TEST > > > > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN. > > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields > > the same trace for both of these kernels. > > > > So -- the above kmemleak hack seems to actually work to seek who owns that page. > > > > Now to figure out how the hell kernel/test_kprobes.c screws around with things. > > Ah, that was fixed recently; > > https://marc.info/?l=linux-kernel&m=149076389011850 > > Note that this patch depends another patch in the series; > > https://marc.info/?l=linux-kernel&m=149076370111812&w=2 I actually boot tested linux-next tag next-20170519 which carries these patches and the WARNING still is there. Please note the issue is with CONFIG_KPROBES_SANITY_TEST enabled. [ 1.025601] x86/mm: Found insecure W+X mapping at address ffffffffc01e7000/0xffffffffc01e7000 [ 1.026429] ------------[ cut here ]------------ [ 1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 [ 1.027711] Modules linked in: [ 1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170519 #151 [ 1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000 [ 1.030509] RIP: 0010:note_page+0x630/0x7e0 [ 1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286 [ 1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708 [ 1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246 [ 1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161 [ 1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000 [ 1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000 [ 1.034927] FS: 0000000000000000(0000) GS:ffff9fd47fc80000(0000) knlGS:0000000000000000 [ 1.035722] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0 [ 1.036839] Call Trace: [ 1.037034] ptdump_walk_pgd_level_core+0x3e7/0x490 [ 1.037367] ? 0xffffffffbaa00000 [ 1.037705] ptdump_walk_pgd_level_checkwx+0x17/0x20 [ 1.038187] mark_rodata_ro+0xf4/0x100 [ 1.038559] ? rest_init+0x80/0x80 [ 1.038890] kernel_init+0x2f/0x100 [ 1.039235] ret_from_fork+0x2c/0x40 [ 1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02 [ 1.041416] ---[ end trace e726c1b63e5a81a9 ]--- [ 1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found. root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak On dmesg: May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335): May 23 07:44:51 piggy kernel: kmemleak: comm "swapper/0", pid 1, jiffies 4294892451 May 23 07:44:51 piggy kernel: kmemleak: min_count = 2 May 23 07:44:51 piggy kernel: kmemleak: count = 2 May 23 07:44:51 piggy kernel: kmemleak: flags = 0x1 May 23 07:44:51 piggy kernel: kmemleak: checksum = 0 May 23 07:44:51 piggy kernel: kmemleak: backtrace: May 23 07:44:51 piggy kernel: kmemleak_alloc+0x4a/0xa0 May 23 07:44:51 piggy kernel: __vmalloc_node_range+0x20c/0x2b0 May 23 07:44:51 piggy kernel: module_alloc+0x67/0xc0 May 23 07:44:51 piggy kernel: arch_ftrace_update_trampoline+0xc1/0x240 May 23 07:44:51 piggy kernel: ftrace_startup+0x92/0x210 May 23 07:44:51 piggy kernel: register_ftrace_function+0x4b/0x60 May 23 07:44:51 piggy kernel: arm_kprobe+0x84/0xc0 May 23 07:44:51 piggy kernel: register_kprobe+0x59c/0x5e0 May 23 07:44:51 piggy kernel: init_test_probes+0x61/0x560 May 23 07:44:51 piggy kernel: init_kprobes+0x1ea/0x20d May 23 07:44:51 piggy kernel: do_one_initcall+0x52/0x1a0 May 23 07:44:51 piggy kernel: kernel_init_freeable+0x17d/0x205 May 23 07:44:51 piggy kernel: kernel_init+0xe/0x100 May 23 07:44:51 piggy kernel: ret_from_fork+0x2c/0x40 May 23 07:44:51 piggy kernel: 0xffffffffffffffff Luis