All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>,
	Jim Keniston <jkenisto@us.ibm.com>,
	davem@davemloft.net, sagar.abhishek@gmail.com
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	mcgrof@kernel.org, Steven Rostedt <srostedt@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Ingo Molnar <mingo@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Mateusz Guzik <mguzik@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
Date: Fri, 19 May 2017 19:28:54 +0200	[thread overview]
Message-ID: <20170519172854.GK8951@wotan.suse.de> (raw)
In-Reply-To: <20170519154016.GH8951@wotan.suse.de>

On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > > I had a different layout, the ASLR gap was much larger:
> > > > >
> > > > > ---[ Modules ]---
> > > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > > > >
> > > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > > warned about:
> > > > >
> > > > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > > [    1.451280] ------------[ cut here ]------------
> > > > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > [    1.452499] Modules linked in:
> > > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > > >
> > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > > and it seems its typically output backwards from when that happened, sorting that
> > > > > by address we get:
> > > > 
> > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > /proc/modules, but that's fine, it's there.
> > > > 
> > > > >
> > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > > >
> > > > > And this then seems to be the first module loaded:
> > > > >
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > >
> > > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > > as per above.
> > > > >
> > > > >> Something touched the module gap and left is RW+x...
> > > > >
> > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > > 
> > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > That seems odd, but maybe unload isn't cleaning up?
> > > > 
> > > > >> Are you able to bisect this?
> > > > >
> > > > > This issue has been present for a while so since I recall this I might be
> > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > > a bit and if no clear culprit comes up then will try bisect.
> > > > 
> > > > Okay, thanks!
> > > 
> > > Sorry to report that this issue is present since the feature's addition. So
> > > the issue is there since its addition and is still present today. *But* it
> > > may also be a configuration issue, given I have booted this guest *without*
> > > this issue ...
> > > 
> > > So:
> > > 
> > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > > 
> > > That boots with the warning. To help debug further I've minimized my modules
> > > to only a few: scsi_mod, e1000, libata.
> > > 
> > > I suspect at this point this is not the fault of a particular module but
> > > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > > 
> > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > mappings") and I with:
> > > 
> > > [    0.949435] ------------[ cut here ]------------                             
> > > [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > > [    0.951814] Modules linked in:                                               
> > > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > > [    0.956256] Call Trace:                                                      
> > > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> > > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> > > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> > > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> > > [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> > > [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> > > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> > > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> > > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> > > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
> > >                                                                                 
> > >                                                                                 
> > > ---[ High Kernel Mapping ]---                                                   
> > > 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> > > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> > > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> > > 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> > > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> > > 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> > > 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> > > ---[ Modules ]---                                                               
> > > 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > 
> > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> > > e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > > 
> > > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > > for a particular reason and it also happens to be on the edge of the high
> > > kernel mapping. Could it be the boundary semantic issue ?
> > > 
> > > For instance can it be that since 0xffffffffc0002000 is given to the first
> > > module by the allocator, scsi_mod, and since that address is *technically*
> > > part of two boundaries we get a splat ?
> > > 
> > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > 
> > Note on the latest linux-next and on the commit that introduced this the config
> > and kernel yields only *one* page:
> > 
> > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > 
> > I believe this is more indications my suspicion might be right.
> 
> If the following is a legit forced way to get query the kernel to ask it 
> who owns a page then perhaps this technique can be used in the future to
> figure out who the hell caused this. Catalin, can you confirm? In this
> case this is perhaps not a leaked page but I am trying to abuse the
> kmemleak debugfs API to query who allocated the page. Is that fine?
> 
> [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> [    0.918502] Modules linked in:
> [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.920011] Call Trace:
> [    0.920011]  dump_stack+0x63/0x81
> [    0.920011]  __warn+0xcb/0xf0
> [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> [    0.920011]  note_page+0x63c/0x7e0
> [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> [    0.920011]  ? 0xffffffff86c00000
> [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> [    0.920011]  mark_rodata_ro+0xf4/0x100
> [    0.920011]  ? rest_init+0x80/0x80
> [    0.920011]  kernel_init+0x2a/0x100
> [    0.920011]  ret_from_fork+0x2c/0x40
> [    0.925474] ---[ end trace dca00cd779490a2b ]---
> [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> 
> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> dmesg | tail
> 
> [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> [   49.212148] kmemleak:   min_count = 2
> [   49.212852] kmemleak:   count = 0
> [   49.213363] kmemleak:   flags = 0x1
> [   49.213363] kmemleak:   checksum = 0
> [   49.213363] kmemleak:   backtrace:
> [   49.213363]      kmemleak_alloc+0x4a/0xa0
> [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> [   49.213363]      module_alloc+0x67/0xc0
> [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> [   49.213363]      ftrace_startup+0x90/0x210
> [   49.213363]      register_ftrace_function+0x4b/0x60
> [   49.213363]      arm_kprobe+0x84/0xe0
> [   49.213363]      register_kprobe+0x56e/0x5b0
> [   49.213363]      init_test_probes+0x61/0x560
> [   49.213363]      init_kprobes+0x1e3/0x206
> [   49.213363]      do_one_initcall+0x52/0x1a0
> [   49.213363]      kernel_init_freeable+0x178/0x200
> [   49.213363]      kernel_init+0xe/0x100
> [   49.213363]      ret_from_fork+0x2c/0x40
> [   49.213363]      0xffffffffffffffff

Aha! And the winner is:

CONFIG_KPROBES_SANITY_TEST

I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
the same trace for both of these kernels.

So -- the above kmemleak hack seems to actually work to seek who owns that page.

Now to figure out how the hell kernel/test_kprobes.c screws around with things.

  Luis

  reply	other threads:[~2017-05-19 17:29 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez
2017-05-15 22:15 ` Luis R. Rodriguez
2017-05-15 22:57   ` Kees Cook
2017-05-15 23:45     ` Luis R. Rodriguez
2017-05-16  0:12       ` Kees Cook
2017-05-17 16:40         ` Luis R. Rodriguez
2017-05-17 17:53           ` Kees Cook
2017-05-19  0:44             ` Luis R. Rodriguez
2017-05-19  3:08               ` Luis R. Rodriguez
2017-05-19 15:40                 ` Luis R. Rodriguez
2017-05-19 17:28                   ` Luis R. Rodriguez [this message]
2017-05-20  2:38                     ` Masami Hiramatsu
2017-05-23 14:48                       ` Luis R. Rodriguez
2017-05-24 17:55                         ` Luis R. Rodriguez
2017-05-19 17:35                   ` Catalin Marinas
2017-05-19 18:27                     ` Andy Lutomirski
2017-05-19 19:16                       ` Kees Cook
2017-05-19 19:18                         ` Andy Lutomirski
2017-05-19 19:29                           ` Kees Cook
2017-05-26 22:13                     ` Luis R. Rodriguez
2017-05-15 23:30   ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170519172854.GK8951@wotan.suse.de \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=jkenisto@us.ibm.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mguzik@redhat.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=sagar.abhishek@gmail.com \
    --cc=sds@tycho.nsa.gov \
    --cc=srostedt@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.