From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>,
Jim Keniston <jkenisto@us.ibm.com>,
davem@davemloft.net, sagar.abhishek@gmail.com
Cc: Catalin Marinas <catalin.marinas@arm.com>,
mcgrof@kernel.org, Steven Rostedt <srostedt@redhat.com>,
Kees Cook <keescook@chromium.org>,
Stephen Smalley <sds@tycho.nsa.gov>,
Ingo Molnar <mingo@kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
Michal Hocko <mhocko@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Mateusz Guzik <mguzik@redhat.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
Date: Fri, 19 May 2017 19:28:54 +0200 [thread overview]
Message-ID: <20170519172854.GK8951@wotan.suse.de> (raw)
In-Reply-To: <20170519154016.GH8951@wotan.suse.de>
On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > > I had a different layout, the ASLR gap was much larger:
> > > > >
> > > > > ---[ Modules ]---
> > > > > 0xffffffffc0000000-0xffffffffc01b0000 1728K pte
> > > > > 0xffffffffc01b0000-0xffffffffc01b1000 4K RW GLB x pte
> > > > > 0xffffffffc01b1000-0xffffffffc01b2000 4K pte
> > > > > 0xffffffffc01b2000-0xffffffffc01c6000 80K ro GLB x pte
> > > > > 0xffffffffc01c6000-0xffffffffc01cc000 24K ro GLB NX pte
> > > > > 0xffffffffc01cc000-0xffffffffc01d5000 36K RW GLB NX pte
> > > > >
> > > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > > warned about:
> > > > >
> > > > > [ 1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > > [ 1.451280] ------------[ cut here ]------------
> > > > > [ 1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > [ 1.452499] Modules linked in:
> > > > > [ 1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > > >
> > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > > and it seems its typically output backwards from when that happened, sorting that
> > > > > by address we get:
> > > >
> > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > /proc/modules, but that's fine, it's there.
> > > >
> > > > >
> > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > > >
> > > > > And this then seems to be the first module loaded:
> > > > >
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > >
> > > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > > as per above.
> > > > >
> > > > >> Something touched the module gap and left is RW+x...
> > > > >
> > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > >
> > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > That seems odd, but maybe unload isn't cleaning up?
> > > >
> > > > >> Are you able to bisect this?
> > > > >
> > > > > This issue has been present for a while so since I recall this I might be
> > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > > a bit and if no clear culprit comes up then will try bisect.
> > > >
> > > > Okay, thanks!
> > >
> > > Sorry to report that this issue is present since the feature's addition. So
> > > the issue is there since its addition and is still present today. *But* it
> > > may also be a configuration issue, given I have booted this guest *without*
> > > this issue ...
> > >
> > > So:
> > >
> > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > >
> > > That boots with the warning. To help debug further I've minimized my modules
> > > to only a few: scsi_mod, e1000, libata.
> > >
> > > I suspect at this point this is not the fault of a particular module but
> > > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > >
> > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > mappings") and I with:
> > >
> > > [ 0.949435] ------------[ cut here ]------------
> > > [ 0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > [ 0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > > [ 0.951814] Modules linked in:
> > > [ 0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > [ 0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > [ 0.954033] 0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > > [ 0.954742] ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > > [ 0.955522] 0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > > [ 0.956256] Call Trace:
> > > [ 0.956496] [<ffffffff812ff335>] dump_stack+0x44/0x5f
> > > [ 0.956953] [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0
> > > [ 0.957519] [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80
> > > [ 0.958066] [<ffffffff8106c155>] note_page+0x635/0x7e0
> > > [ 0.958595] [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410
> > > [ 0.959219] [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
> > > [ 0.959856] [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100
> > > [ 0.960372] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80
> > > [ 0.960869] [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0
> > > [ 0.961358] [<ffffffff815b798f>] ret_from_fork+0x3f/0x70
> > > [ 0.961900] [<ffffffff815aa7d0>] ? rest_init+0x80/0x80
> > > [ 0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---
> > > [ 0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > >
> > >
> > > ---[ High Kernel Mapping ]---
> > > 0xffffffff80000000-0xffffffff81000000 16M pmd
> > > 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
> > > 0xffffffff81600000-0xffffffff81a00000 4M ro PSE GLB NX pmd
> > > 0xffffffff81a00000-0xffffffff81c00000 2M RW GLB NX pte
> > > 0xffffffff81c00000-0xffffffff82200000 6M RW PSE GLB NX pmd
> > > 0xffffffff82200000-0xffffffff82400000 2M RW GLB NX pte
> > > 0xffffffff82400000-0xffffffffc0000000 988M pmd
> > > ---[ Modules ]---
> > > 0xffffffffc0000000-0xffffffffc0001000 4K RW GLB x pte
> > > 0xffffffffc0001000-0xffffffffc0002000 4K pte
> > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte
> > >
> > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)
> > > e1000 127757 0 - Live 0xffffffffc004d000 (E)
> > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E)
> > >
> > > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > > for a particular reason and it also happens to be on the edge of the high
> > > kernel mapping. Could it be the boundary semantic issue ?
> > >
> > > For instance can it be that since 0xffffffffc0002000 is given to the first
> > > module by the allocator, scsi_mod, and since that address is *technically*
> > > part of two boundaries we get a splat ?
> > >
> > > 0xffffffffc0001000-0xffffffffc0002000 4K pte
> > > 0xffffffffc0002000-0xffffffffc0039000 220K RW GLB x pte
> >
> > Note on the latest linux-next and on the commit that introduced this the config
> > and kernel yields only *one* page:
> >
> > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> >
> > I believe this is more indications my suspicion might be right.
>
> If the following is a legit forced way to get query the kernel to ask it
> who owns a page then perhaps this technique can be used in the future to
> figure out who the hell caused this. Catalin, can you confirm? In this
> case this is perhaps not a leaked page but I am trying to abuse the
> kmemleak debugfs API to query who allocated the page. Is that fine?
>
> [ 0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> [ 0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> [ 0.918502] Modules linked in:
> [ 0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> [ 0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [ 0.920011] Call Trace:
> [ 0.920011] dump_stack+0x63/0x81
> [ 0.920011] __warn+0xcb/0xf0
> [ 0.920011] warn_slowpath_fmt+0x5a/0x80
> [ 0.920011] note_page+0x63c/0x7e0
> [ 0.920011] ptdump_walk_pgd_level_core+0x3b1/0x460
> [ 0.920011] ? 0xffffffff86c00000
> [ 0.920011] ptdump_walk_pgd_level_checkwx+0x17/0x20
> [ 0.920011] mark_rodata_ro+0xf4/0x100
> [ 0.920011] ? rest_init+0x80/0x80
> [ 0.920011] kernel_init+0x2a/0x100
> [ 0.920011] ret_from_fork+0x2c/0x40
> [ 0.925474] ---[ end trace dca00cd779490a2b ]---
> [ 0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
>
> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> dmesg | tail
>
> [ 49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> [ 49.210814] kmemleak: comm "swapper/0", pid 1, jiffies 4294892440
> [ 49.212148] kmemleak: min_count = 2
> [ 49.212852] kmemleak: count = 0
> [ 49.213363] kmemleak: flags = 0x1
> [ 49.213363] kmemleak: checksum = 0
> [ 49.213363] kmemleak: backtrace:
> [ 49.213363] kmemleak_alloc+0x4a/0xa0
> [ 49.213363] __vmalloc_node_range+0x20a/0x2b0
> [ 49.213363] module_alloc+0x67/0xc0
> [ 49.213363] arch_ftrace_update_trampoline+0xba/0x260
> [ 49.213363] ftrace_startup+0x90/0x210
> [ 49.213363] register_ftrace_function+0x4b/0x60
> [ 49.213363] arm_kprobe+0x84/0xe0
> [ 49.213363] register_kprobe+0x56e/0x5b0
> [ 49.213363] init_test_probes+0x61/0x560
> [ 49.213363] init_kprobes+0x1e3/0x206
> [ 49.213363] do_one_initcall+0x52/0x1a0
> [ 49.213363] kernel_init_freeable+0x178/0x200
> [ 49.213363] kernel_init+0xe/0x100
> [ 49.213363] ret_from_fork+0x2c/0x40
> [ 49.213363] 0xffffffffffffffff
Aha! And the winner is:
CONFIG_KPROBES_SANITY_TEST
I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
the same trace for both of these kernels.
So -- the above kmemleak hack seems to actually work to seek who owns that page.
Now to figure out how the hell kernel/test_kprobes.c screws around with things.
Luis
next prev parent reply other threads:[~2017-05-19 17:29 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez
2017-05-15 22:15 ` Luis R. Rodriguez
2017-05-15 22:57 ` Kees Cook
2017-05-15 23:45 ` Luis R. Rodriguez
2017-05-16 0:12 ` Kees Cook
2017-05-17 16:40 ` Luis R. Rodriguez
2017-05-17 17:53 ` Kees Cook
2017-05-19 0:44 ` Luis R. Rodriguez
2017-05-19 3:08 ` Luis R. Rodriguez
2017-05-19 15:40 ` Luis R. Rodriguez
2017-05-19 17:28 ` Luis R. Rodriguez [this message]
2017-05-20 2:38 ` Masami Hiramatsu
2017-05-23 14:48 ` Luis R. Rodriguez
2017-05-24 17:55 ` Luis R. Rodriguez
2017-05-19 17:35 ` Catalin Marinas
2017-05-19 18:27 ` Andy Lutomirski
2017-05-19 19:16 ` Kees Cook
2017-05-19 19:18 ` Andy Lutomirski
2017-05-19 19:29 ` Kees Cook
2017-05-26 22:13 ` Luis R. Rodriguez
2017-05-15 23:30 ` Luis R. Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170519172854.GK8951@wotan.suse.de \
--to=mcgrof@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=jkenisto@us.ibm.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mguzik@redhat.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=sagar.abhishek@gmail.com \
--cc=sds@tycho.nsa.gov \
--cc=srostedt@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.