next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note

All of lore.kernel.org
 help / color / mirror / Atom feed

* next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
@ 2017-05-15 22:06 Luis R. Rodriguez
  2017-05-15 22:15 ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-15 22:06 UTC (permalink / raw)
  To: Stephen Smalley, Ingo Molnar
  Cc: Andy Lutomirski, Michal Hocko, Andrew Morton, Kees Cook,
	Eric W. Biederman, Mateusz Guzik, mcgrof, linux-kernel

For a few kernel releases now I have managed to trigger the warning added via
commit e1a58320a38dfa ("x86/mm: Warn on W^X mappings", merged upstream since
v4.4) on my KVM qemu x86_64 system. Since I just booted into the shiny new
linux-next tag next-20170515 (based on v4.12-rc1) and this is still triggering
I figured its time to tackle this.

Let me know if this is already known or what can be done to try to fix this.

Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)

I will try updating my distro package for qemu and see if perhaps its this
and for the other odd fork issue I reported [0].

[0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com

My config:

http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/configs/piggy-x86_64_qemu_fork_kmemleak.config

The splat:

[    0.911209] x86/mm: Found insecure W+X mapping at address ffffffffc0288000/0xffffffffc0288000
[    0.912066] ------------[ cut here ]------------
[    0.912544] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
[    0.913381] Modules linked in:
[    0.913672] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #144
[    0.914434] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
[    0.915595] task: ffff98d43a5eac80 task.stack: ffffad22c0630000
[    0.916174] RIP: 0010:note_page+0x630/0x7e0
[    0.916595] RSP: 0018:ffffad22c0633df0 EFLAGS: 00010286
[    0.917101] RAX: 0000000000000051 RBX: ffffad22c0633e88 RCX: ffffffff91256708
[    0.917805] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
[    0.918511] RBP: ffffad22c0633e28 R08: 6666666666666678 R09: 0000000000000160
[    0.919214] R10: ffffad22c0633dd8 R11: 3030303838323063 R12: 0000000000000000
[    0.919917] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
[    0.920615] FS:  0000000000000000(0000) GS:ffff98d43fc00000(0000) knlGS:0000000000000000
[    0.921384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.921943] CR2: 0000000000000000 CR3: 00000000a3a09000 CR4: 00000000000006f0
[    0.922657] Call Trace:
[    0.922901]  ptdump_walk_pgd_level_core+0x3e7/0x490
[    0.923354]  ? 0xffffffff90600000
[    0.923662]  ptdump_walk_pgd_level_checkwx+0x17/0x20
[    0.924145]  mark_rodata_ro+0xf4/0x100
[    0.924536]  ? rest_init+0x80/0x80
[    0.924862]  kernel_init+0x2f/0x100
[    0.925197]  ret_from_fork+0x2c/0x40
[    0.925552] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 c8 34 fe 90 c6 05 c8 eb bc 00 01 48 89 f2 e8 8d fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 05 b1 fe 90 e8 76 fc
[    0.927368] ---[ end trace 97137ae213b9cb25 ]---
[    0.927830] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez
@ 2017-05-15 22:15 ` Luis R. Rodriguez
  2017-05-15 22:57   ` Kees Cook
  2017-05-15 23:30   ` Luis R. Rodriguez
  0 siblings, 2 replies; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-15 22:15 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Kees Cook, Eric W. Biederman, Mateusz Guzik,
	linux-kernel

On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
> 
> I will try updating my distro package for qemu and see if perhaps its this
> and for the other odd fork issue I reported [0].
> 
> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com

Yeah nope, using my distribution latest:

QEMU emulator version 2.8.0(openSUSE Tumbleweed)

And still both issues are present.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-15 22:15 ` Luis R. Rodriguez
@ 2017-05-15 22:57   ` Kees Cook
  2017-05-15 23:45     ` Luis R. Rodriguez
  2017-05-15 23:30   ` Luis R. Rodriguez
  1 sibling, 1 reply; 21+ messages in thread
From: Kees Cook @ 2017-05-15 22:57 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
>>
>> I will try updating my distro package for qemu and see if perhaps its this
>> and for the other odd fork issue I reported [0].
>>
>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com
>
> Yeah nope, using my distribution latest:
>
> QEMU emulator version 2.8.0(openSUSE Tumbleweed)
>
> And still both issues are present.
>
>   Luis

Can you enable CONFIG_X86_PTDUMP=y and then find out what is located
at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-15 22:57   ` Kees Cook
@ 2017-05-15 23:45     ` Luis R. Rodriguez
  2017-05-16  0:12       ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-15 23:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote:
> On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
>>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
>>>
>>> I will try updating my distro package for qemu and see if perhaps its this
>>> and for the other odd fork issue I reported [0].
>>>
>>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com
>>
>> Yeah nope, using my distribution latest:
>>
>> QEMU emulator version 2.8.0(openSUSE Tumbleweed)
>>
>> And still both issues are present.
>>
>>   Luis
>
> Can you enable CONFIG_X86_PTDUMP=y and then find out what is located
> at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ?

Sure thing.

Recompiled with this enabled, new warning:

[    0.891559] x86/mm: Found insecure W+X mapping at address
ffffffffc00e4000/0xffffffffc00e4000
[    0.892394] ------------[ cut here ]------------
[    0.892834] WARNING: CPU: 0 PID: 1 at
arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
[    0.893674] Modules linked in:
[    0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.12.0-rc1-next-20170515+ #145
[    0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000
[    0.896403] RIP: 0010:note_page+0x630/0x7e0
[    0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286
[    0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708
[    0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
[    0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165
[    0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000
[    0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
[    0.900629] FS:  0000000000000000(0000) GS:ffff8ed7ffc00000(0000)
knlGS:0000000000000000
[    0.901398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0
[    0.902590] Call Trace:
[    0.902827]  ptdump_walk_pgd_level_core+0x3e7/0x490
[    0.903274]  ? 0xffffffff9a800000
[    0.903595]  ptdump_walk_pgd_level_checkwx+0x17/0x20
[    0.904064]  mark_rodata_ro+0xf4/0x100
[    0.904423]  ? rest_init+0x80/0x80
[    0.904744]  kernel_init+0x2f/0x100
[    0.905068]  ret_from_fork+0x2c/0x40
[    0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff
ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8
cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8
b6 fc
[    0.907173] ---[ end trace 878b39cb0c248e66 ]---
[    0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

And ffffffffc00e4000 is:

---[ Modules ]---
0xffffffffc0000000-0xffffffffc00e4000         912K
          pte
0xffffffffc00e4000-0xffffffffc00e5000           4K     RW
   GLB x  pte

In case someone needs the full /sys/kernel/debug/kernel_page_tables file:

http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt

 Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-15 23:45     ` Luis R. Rodriguez
@ 2017-05-16  0:12       ` Kees Cook
  2017-05-17 16:40         ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2017-05-16  0:12 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Mon, May 15, 2017 at 4:45 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
>>>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
>>>>
>>>> I will try updating my distro package for qemu and see if perhaps its this
>>>> and for the other odd fork issue I reported [0].
>>>>
>>>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com
>>>
>>> Yeah nope, using my distribution latest:
>>>
>>> QEMU emulator version 2.8.0(openSUSE Tumbleweed)
>>>
>>> And still both issues are present.
>>>
>>>   Luis
>>
>> Can you enable CONFIG_X86_PTDUMP=y and then find out what is located
>> at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ?
>
> Sure thing.
>
> Recompiled with this enabled, new warning:
>
> [    0.891559] x86/mm: Found insecure W+X mapping at address
> ffffffffc00e4000/0xffffffffc00e4000
> [    0.892394] ------------[ cut here ]------------
> [    0.892834] WARNING: CPU: 0 PID: 1 at
> arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> [    0.893674] Modules linked in:
> [    0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.12.0-rc1-next-20170515+ #145
> [    0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000
> [    0.896403] RIP: 0010:note_page+0x630/0x7e0
> [    0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286
> [    0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708
> [    0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
> [    0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165
> [    0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000
> [    0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
> [    0.900629] FS:  0000000000000000(0000) GS:ffff8ed7ffc00000(0000)
> knlGS:0000000000000000
> [    0.901398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0
> [    0.902590] Call Trace:
> [    0.902827]  ptdump_walk_pgd_level_core+0x3e7/0x490
> [    0.903274]  ? 0xffffffff9a800000
> [    0.903595]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> [    0.904064]  mark_rodata_ro+0xf4/0x100
> [    0.904423]  ? rest_init+0x80/0x80
> [    0.904744]  kernel_init+0x2f/0x100
> [    0.905068]  ret_from_fork+0x2c/0x40
> [    0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff
> ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8
> cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8
> b6 fc
> [    0.907173] ---[ end trace 878b39cb0c248e66 ]---
> [    0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
>
> And ffffffffc00e4000 is:
>
> ---[ Modules ]---
> 0xffffffffc0000000-0xffffffffc00e4000         912K
>           pte
> 0xffffffffc00e4000-0xffffffffc00e5000           4K     RW
>    GLB x  pte
>
> In case someone needs the full /sys/kernel/debug/kernel_page_tables file:
>
> http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt

---[ Modules ]---
0xffffffffc0000000-0xffffffffc00e4000         912K
          pte

This should be the modules ASLR gap

0xffffffffc00e4000-0xffffffffc00e5000           4K     RW
   GLB x  pte

This is part of the same gap, but it's RW+x strangely?

0xffffffffc00e5000-0xffffffffc00e6000           4K
          pte

This is more of the gap?

0xffffffffc00e6000-0xffffffffc00fa000          80K     ro
   GLB x  pte
0xffffffffc00fa000-0xffffffffc010c000          72K     ro
   GLB NX pte
0xffffffffc010c000-0xffffffffc011b000          60K     RW
   GLB NX pte

This should be the first loaded module. Can you check that
0xffffffffc00e6000 matches the first module in /proc/modules?

Something touched the module gap and left is RW+x...

Are you able to bisect this?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-16  0:12       ` Kees Cook
@ 2017-05-17 16:40         ` Luis R. Rodriguez
  2017-05-17 17:53           ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-17 16:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Stephen Smalley, Ingo Molnar, Andy Lutomirski,
	Michal Hocko, Andrew Morton, Eric W. Biederman, Mateusz Guzik,
	LKML

On Mon, May 15, 2017 at 05:12:18PM -0700, Kees Cook wrote:
> On Mon, May 15, 2017 at 4:45 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Mon, May 15, 2017 at 3:57 PM, Kees Cook <keescook@chromium.org> wrote:
> >> On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >>> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
> >>>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
> >>>>
> >>>> I will try updating my distro package for qemu and see if perhaps its this
> >>>> and for the other odd fork issue I reported [0].
> >>>>
> >>>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com
> >>>
> >>> Yeah nope, using my distribution latest:
> >>>
> >>> QEMU emulator version 2.8.0(openSUSE Tumbleweed)
> >>>
> >>> And still both issues are present.
> >>>
> >>>   Luis
> >>
> >> Can you enable CONFIG_X86_PTDUMP=y and then find out what is located
> >> at ffffffffc0288000 via /sys/kernel/debug/kernel_page_tables ?
> >
> > Sure thing.
> >
> > Recompiled with this enabled, new warning:
> >
> > [    0.891559] x86/mm: Found insecure W+X mapping at address
> > ffffffffc00e4000/0xffffffffc00e4000
> > [    0.892394] ------------[ cut here ]------------
> > [    0.892834] WARNING: CPU: 0 PID: 1 at
> > arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > [    0.893674] Modules linked in:
> > [    0.893972] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 4.12.0-rc1-next-20170515+ #145
> > [    0.894687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > [    0.895828] task: ffff8ed7fa5ccc80 task.stack: ffffae3900630000
> > [    0.896403] RIP: 0010:note_page+0x630/0x7e0
> > [    0.896780] RSP: 0018:ffffae3900633df0 EFLAGS: 00010286
> > [    0.897271] RAX: 0000000000000051 RBX: ffffae3900633e88 RCX: ffffffff9b456708
> > [    0.897940] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
> > [    0.898624] RBP: ffffae3900633e28 R08: 203a6d6d2f363878 R09: 0000000000000165
> > [    0.899314] R10: ffffae3900633dd8 R11: 736e6920646e756f R12: 0000000000000000
> > [    0.899987] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
> > [    0.900629] FS:  0000000000000000(0000) GS:ffff8ed7ffc00000(0000)
> > knlGS:0000000000000000
> > [    0.901398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.901908] CR2: 0000000000000000 CR3: 0000000118009000 CR4: 00000000000006f0
> > [    0.902590] Call Trace:
> > [    0.902827]  ptdump_walk_pgd_level_core+0x3e7/0x490
> > [    0.903274]  ? 0xffffffff9a800000
> > [    0.903595]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> > [    0.904064]  mark_rodata_ro+0xf4/0x100
> > [    0.904423]  ? rest_init+0x80/0x80
> > [    0.904744]  kernel_init+0x2f/0x100
> > [    0.905068]  ret_from_fork+0x2c/0x40
> > [    0.905393] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff
> > ff 48 8b 73 10 48 c7 c7 28 36 1e 9b c6 05 c8 eb bc 00 01 48 89 f2 e8
> > cd fc 11 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 65 b2 1e 9b e8
> > b6 fc
> > [    0.907173] ---[ end trace 878b39cb0c248e66 ]---
> > [    0.907655] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> >
> > And ffffffffc00e4000 is:
> >
> > ---[ Modules ]---
> > 0xffffffffc0000000-0xffffffffc00e4000         912K
> >           pte
> > 0xffffffffc00e4000-0xffffffffc00e5000           4K     RW
> >    GLB x  pte
> >
> > In case someone needs the full /sys/kernel/debug/kernel_page_tables file:
> >
> > http://drvbp1.linux-foundation.org/~mcgrof/2017/05/15/kernel_page_tables/piggy-4.12.0-rc1-next-20170515-page-tables.txt
> 
> ---[ Modules ]---
> 0xffffffffc0000000-0xffffffffc00e4000         912K
>           pte
> 
> This should be the modules ASLR gap
> 
> 0xffffffffc00e4000-0xffffffffc00e5000           4K     RW
>    GLB x  pte
> 
> This is part of the same gap, but it's RW+x strangely?
> 
> 0xffffffffc00e5000-0xffffffffc00e6000           4K
>           pte
> 
> This is more of the gap?
> 
> 0xffffffffc00e6000-0xffffffffc00fa000          80K     ro
>    GLB x  pte
> 0xffffffffc00fa000-0xffffffffc010c000          72K     ro
>    GLB NX pte
> 0xffffffffc010c000-0xffffffffc011b000          60K     RW
>    GLB NX pte
> 
> This should be the first loaded module. Can you check that
> 0xffffffffc00e6000 matches the first module in /proc/modules?

Yes, but I had killed that boot session again, so upon my next boot
I had a different layout, the ASLR gap was much larger:

---[ Modules ]---
0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte

As you can guess if we follow similar pattern the RW hole is the one this boot
warned about:

[    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
[    1.451280] ------------[ cut here ]------------
[    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
[    1.452499] Modules linked in:
[    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145

I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
on the /proc/modules list but then again /proc/modules does not seem to have a specific
order other than perhaps being pegged into a linked list of modules once they go live,
and it seems its typically output backwards from when that happened, sorting that
by address we get:

root@piggy:~# cat /proc/modules | sort -k 6 | head -3
e1000 143360 0 - Live 0xffffffffc01b2000 (E)
mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)

And this then seems to be the first module loaded:

e1000 143360 0 - Live 0xffffffffc01b2000 (E)

The output of dmesg seems to confirm this as per the list of modules sorted
as per above.

> Something touched the module gap and left is RW+x...

Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.

> Are you able to bisect this?

This issue has been present for a while so since I recall this I might be
able to reduce the number of needed target kernels to bisect. Lemme tinker
a bit and if no clear culprit comes up then will try bisect.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-17 16:40         ` Luis R. Rodriguez
@ 2017-05-17 17:53           ` Kees Cook
  2017-05-19  0:44             ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2017-05-17 17:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> Yes, but I had killed that boot session again, so upon my next boot
> I had a different layout, the ASLR gap was much larger:
>
> ---[ Modules ]---
> 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
>
> As you can guess if we follow similar pattern the RW hole is the one this boot
> warned about:
>
> [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> [    1.451280] ------------[ cut here ]------------
> [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> [    1.452499] Modules linked in:
> [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
>
> I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> on the /proc/modules list but then again /proc/modules does not seem to have a specific
> order other than perhaps being pegged into a linked list of modules once they go live,
> and it seems its typically output backwards from when that happened, sorting that
> by address we get:

Right, sorry, I'd expect it at the bottom of the list in
/proc/modules, but that's fine, it's there.

>
> root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
>
> And this then seems to be the first module loaded:
>
> e1000 143360 0 - Live 0xffffffffc01b2000 (E)
>
> The output of dmesg seems to confirm this as per the list of modules sorted
> as per above.
>
>> Something touched the module gap and left is RW+x...
>
> Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.

Is it possible a module got loaded before e1000 and then unloaded?
That seems odd, but maybe unload isn't cleaning up?

>> Are you able to bisect this?
>
> This issue has been present for a while so since I recall this I might be
> able to reduce the number of needed target kernels to bisect. Lemme tinker
> a bit and if no clear culprit comes up then will try bisect.

Okay, thanks!

-Kees


-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-17 17:53           ` Kees Cook
@ 2017-05-19  0:44             ` Luis R. Rodriguez
  2017-05-19  3:08               ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-19  0:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Stephen Smalley, Ingo Molnar, Andy Lutomirski,
	Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman,
	Mateusz Guzik, LKML

On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > Yes, but I had killed that boot session again, so upon my next boot
> > I had a different layout, the ASLR gap was much larger:
> >
> > ---[ Modules ]---
> > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> >
> > As you can guess if we follow similar pattern the RW hole is the one this boot
> > warned about:
> >
> > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > [    1.451280] ------------[ cut here ]------------
> > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > [    1.452499] Modules linked in:
> > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> >
> > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > order other than perhaps being pegged into a linked list of modules once they go live,
> > and it seems its typically output backwards from when that happened, sorting that
> > by address we get:
> 
> Right, sorry, I'd expect it at the bottom of the list in
> /proc/modules, but that's fine, it's there.
> 
> >
> > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> >
> > And this then seems to be the first module loaded:
> >
> > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> >
> > The output of dmesg seems to confirm this as per the list of modules sorted
> > as per above.
> >
> >> Something touched the module gap and left is RW+x...
> >
> > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> 
> Is it possible a module got loaded before e1000 and then unloaded?
> That seems odd, but maybe unload isn't cleaning up?
> 
> >> Are you able to bisect this?
> >
> > This issue has been present for a while so since I recall this I might be
> > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > a bit and if no clear culprit comes up then will try bisect.
> 
> Okay, thanks!

Sorry to report that this issue is present since the feature's addition. So
the issue is there since its addition and is still present today. *But* it
may also be a configuration issue, given I have booted this guest *without*
this issue ...

So:

git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db

That boots with the warning. To help debug further I've minimized my modules
to only a few: scsi_mod, e1000, libata.

I suspect at this point this is not the fault of a particular module but
instead just an accounting semantic (>= or <= on an edge) but let's see.

I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
mappings") and I with:

[    0.949435] ------------[ cut here ]------------                             
[    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
[    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
[    0.951814] Modules linked in:                                               
[    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
[    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
[    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
[    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
[    0.956256] Call Trace:                                                      
[    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
[    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
[    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
[    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
[    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
[    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
[    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
[    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
[    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
[    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
[    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
[    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
[    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
                                                                                
                                                                                
---[ High Kernel Mapping ]---                                                   
0xffffffff80000000-0xffffffff81000000          16M                               pmd
0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
0xffffffff82400000-0xffffffffc0000000         988M                               pmd
---[ Modules ]---                                                               
0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
0xffffffffc0001000-0xffffffffc0002000           4K                               pte
0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte

root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 

So that 4K RW seems suspect of getting used for allocation purpose on edge
for a particular reason and it also happens to be on the edge of the high
kernel mapping. Could it be the boundary semantic issue ?

For instance can it be that since 0xffffffffc0002000 is given to the first
module by the allocator, scsi_mod, and since that address is *technically*
part of two boundaries we get a splat ?

0xffffffffc0001000-0xffffffffc0002000           4K                               pte
0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19  0:44             ` Luis R. Rodriguez
@ 2017-05-19  3:08               ` Luis R. Rodriguez
  2017-05-19 15:40                 ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-19  3:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Stephen Smalley, Ingo Molnar, Andy Lutomirski,
	Michal Hocko, Vlastimil Babka, Andrew Morton, Eric W. Biederman,
	Mateusz Guzik, LKML

On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > Yes, but I had killed that boot session again, so upon my next boot
> > > I had a different layout, the ASLR gap was much larger:
> > >
> > > ---[ Modules ]---
> > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > >
> > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > warned about:
> > >
> > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > [    1.451280] ------------[ cut here ]------------
> > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > [    1.452499] Modules linked in:
> > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > >
> > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > and it seems its typically output backwards from when that happened, sorting that
> > > by address we get:
> > 
> > Right, sorry, I'd expect it at the bottom of the list in
> > /proc/modules, but that's fine, it's there.
> > 
> > >
> > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > >
> > > And this then seems to be the first module loaded:
> > >
> > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > >
> > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > as per above.
> > >
> > >> Something touched the module gap and left is RW+x...
> > >
> > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > 
> > Is it possible a module got loaded before e1000 and then unloaded?
> > That seems odd, but maybe unload isn't cleaning up?
> > 
> > >> Are you able to bisect this?
> > >
> > > This issue has been present for a while so since I recall this I might be
> > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > a bit and if no clear culprit comes up then will try bisect.
> > 
> > Okay, thanks!
> 
> Sorry to report that this issue is present since the feature's addition. So
> the issue is there since its addition and is still present today. *But* it
> may also be a configuration issue, given I have booted this guest *without*
> this issue ...
> 
> So:
> 
> git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> 
> That boots with the warning. To help debug further I've minimized my modules
> to only a few: scsi_mod, e1000, libata.
> 
> I suspect at this point this is not the fault of a particular module but
> instead just an accounting semantic (>= or <= on an edge) but let's see.
> 
> I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> mappings") and I with:
> 
> [    0.949435] ------------[ cut here ]------------                             
> [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> [    0.951814] Modules linked in:                                               
> [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> [    0.956256] Call Trace:                                                      
> [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
>                                                                                 
>                                                                                 
> ---[ High Kernel Mapping ]---                                                   
> 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> ---[ Modules ]---                                                               
> 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> 
> root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> 
> So that 4K RW seems suspect of getting used for allocation purpose on edge
> for a particular reason and it also happens to be on the edge of the high
> kernel mapping. Could it be the boundary semantic issue ?
> 
> For instance can it be that since 0xffffffffc0002000 is given to the first
> module by the allocator, scsi_mod, and since that address is *technically*
> part of two boundaries we get a splat ?
> 
> 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte

Note on the latest linux-next and on the commit that introduced this the config
and kernel yields only *one* page:

x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

I believe this is more indications my suspicion might be right.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19  3:08               ` Luis R. Rodriguez
@ 2017-05-19 15:40                 ` Luis R. Rodriguez
  2017-05-19 17:28                   ` Luis R. Rodriguez
  2017-05-19 17:35                   ` Catalin Marinas
  0 siblings, 2 replies; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-19 15:40 UTC (permalink / raw)
  To: Catalin Marinas, Steven Rostedt
  Cc: Luis R. Rodriguez, Kees Cook, Stephen Smalley, Ingo Molnar,
	Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton,
	Eric W. Biederman, Mateusz Guzik, LKML

On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > I had a different layout, the ASLR gap was much larger:
> > > >
> > > > ---[ Modules ]---
> > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > > >
> > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > warned about:
> > > >
> > > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > [    1.451280] ------------[ cut here ]------------
> > > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > [    1.452499] Modules linked in:
> > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > >
> > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > and it seems its typically output backwards from when that happened, sorting that
> > > > by address we get:
> > > 
> > > Right, sorry, I'd expect it at the bottom of the list in
> > > /proc/modules, but that's fine, it's there.
> > > 
> > > >
> > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > >
> > > > And this then seems to be the first module loaded:
> > > >
> > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > >
> > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > as per above.
> > > >
> > > >> Something touched the module gap and left is RW+x...
> > > >
> > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > 
> > > Is it possible a module got loaded before e1000 and then unloaded?
> > > That seems odd, but maybe unload isn't cleaning up?
> > > 
> > > >> Are you able to bisect this?
> > > >
> > > > This issue has been present for a while so since I recall this I might be
> > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > a bit and if no clear culprit comes up then will try bisect.
> > > 
> > > Okay, thanks!
> > 
> > Sorry to report that this issue is present since the feature's addition. So
> > the issue is there since its addition and is still present today. *But* it
> > may also be a configuration issue, given I have booted this guest *without*
> > this issue ...
> > 
> > So:
> > 
> > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > 
> > That boots with the warning. To help debug further I've minimized my modules
> > to only a few: scsi_mod, e1000, libata.
> > 
> > I suspect at this point this is not the fault of a particular module but
> > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > 
> > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > mappings") and I with:
> > 
> > [    0.949435] ------------[ cut here ]------------                             
> > [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > [    0.951814] Modules linked in:                                               
> > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > [    0.956256] Call Trace:                                                      
> > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> > [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> > [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
> >                                                                                 
> >                                                                                 
> > ---[ High Kernel Mapping ]---                                                   
> > 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> > 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> > 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> > 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> > ---[ Modules ]---                                                               
> > 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > 
> > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> > e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > 
> > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > for a particular reason and it also happens to be on the edge of the high
> > kernel mapping. Could it be the boundary semantic issue ?
> > 
> > For instance can it be that since 0xffffffffc0002000 is given to the first
> > module by the allocator, scsi_mod, and since that address is *technically*
> > part of two boundaries we get a splat ?
> > 
> > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> 
> Note on the latest linux-next and on the commit that introduced this the config
> and kernel yields only *one* page:
> 
> x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> 
> I believe this is more indications my suspicion might be right.

If the following is a legit forced way to get query the kernel to ask it 
who owns a page then perhaps this technique can be used in the future to
figure out who the hell caused this. Catalin, can you confirm? In this
case this is perhaps not a leaked page but I am trying to abuse the
kmemleak debugfs API to query who allocated the page. Is that fine?

[    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
[    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
[    0.918502] Modules linked in:
[    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
[    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    0.920011] Call Trace:
[    0.920011]  dump_stack+0x63/0x81
[    0.920011]  __warn+0xcb/0xf0
[    0.920011]  warn_slowpath_fmt+0x5a/0x80
[    0.920011]  note_page+0x63c/0x7e0
[    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
[    0.920011]  ? 0xffffffff86c00000
[    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
[    0.920011]  mark_rodata_ro+0xf4/0x100
[    0.920011]  ? rest_init+0x80/0x80
[    0.920011]  kernel_init+0x2a/0x100
[    0.920011]  ret_from_fork+0x2c/0x40
[    0.925474] ---[ end trace dca00cd779490a2b ]---
[    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
dmesg | tail

[   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
[   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
[   49.212148] kmemleak:   min_count = 2
[   49.212852] kmemleak:   count = 0
[   49.213363] kmemleak:   flags = 0x1
[   49.213363] kmemleak:   checksum = 0
[   49.213363] kmemleak:   backtrace:
[   49.213363]      kmemleak_alloc+0x4a/0xa0
[   49.213363]      __vmalloc_node_range+0x20a/0x2b0
[   49.213363]      module_alloc+0x67/0xc0
[   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
[   49.213363]      ftrace_startup+0x90/0x210
[   49.213363]      register_ftrace_function+0x4b/0x60
[   49.213363]      arm_kprobe+0x84/0xe0
[   49.213363]      register_kprobe+0x56e/0x5b0
[   49.213363]      init_test_probes+0x61/0x560
[   49.213363]      init_kprobes+0x1e3/0x206
[   49.213363]      do_one_initcall+0x52/0x1a0
[   49.213363]      kernel_init_freeable+0x178/0x200
[   49.213363]      kernel_init+0xe/0x100
[   49.213363]      ret_from_fork+0x2c/0x40
[   49.213363]      0xffffffffffffffff

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 15:40                 ` Luis R. Rodriguez
@ 2017-05-19 17:28                   ` Luis R. Rodriguez
  2017-05-20  2:38                     ` Masami Hiramatsu
  2017-05-19 17:35                   ` Catalin Marinas
  1 sibling, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-19 17:28 UTC (permalink / raw)
  To: Masami Hiramatsu, Jim Keniston, davem, sagar.abhishek
  Cc: Catalin Marinas, mcgrof, Steven Rostedt, Kees Cook,
	Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik,
	LKML

On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > > I had a different layout, the ASLR gap was much larger:
> > > > >
> > > > > ---[ Modules ]---
> > > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > > > >
> > > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > > warned about:
> > > > >
> > > > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > > [    1.451280] ------------[ cut here ]------------
> > > > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > [    1.452499] Modules linked in:
> > > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > > >
> > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > > and it seems its typically output backwards from when that happened, sorting that
> > > > > by address we get:
> > > > 
> > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > /proc/modules, but that's fine, it's there.
> > > > 
> > > > >
> > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > > >
> > > > > And this then seems to be the first module loaded:
> > > > >
> > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > >
> > > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > > as per above.
> > > > >
> > > > >> Something touched the module gap and left is RW+x...
> > > > >
> > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > > 
> > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > That seems odd, but maybe unload isn't cleaning up?
> > > > 
> > > > >> Are you able to bisect this?
> > > > >
> > > > > This issue has been present for a while so since I recall this I might be
> > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > > a bit and if no clear culprit comes up then will try bisect.
> > > > 
> > > > Okay, thanks!
> > > 
> > > Sorry to report that this issue is present since the feature's addition. So
> > > the issue is there since its addition and is still present today. *But* it
> > > may also be a configuration issue, given I have booted this guest *without*
> > > this issue ...
> > > 
> > > So:
> > > 
> > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > > 
> > > That boots with the warning. To help debug further I've minimized my modules
> > > to only a few: scsi_mod, e1000, libata.
> > > 
> > > I suspect at this point this is not the fault of a particular module but
> > > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > > 
> > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > mappings") and I with:
> > > 
> > > [    0.949435] ------------[ cut here ]------------                             
> > > [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > > [    0.951814] Modules linked in:                                               
> > > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > > [    0.956256] Call Trace:                                                      
> > > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> > > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> > > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> > > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> > > [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> > > [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> > > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> > > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> > > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> > > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
> > >                                                                                 
> > >                                                                                 
> > > ---[ High Kernel Mapping ]---                                                   
> > > 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> > > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> > > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> > > 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> > > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> > > 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> > > 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> > > ---[ Modules ]---                                                               
> > > 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > 
> > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> > > e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > > 
> > > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > > for a particular reason and it also happens to be on the edge of the high
> > > kernel mapping. Could it be the boundary semantic issue ?
> > > 
> > > For instance can it be that since 0xffffffffc0002000 is given to the first
> > > module by the allocator, scsi_mod, and since that address is *technically*
> > > part of two boundaries we get a splat ?
> > > 
> > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > 
> > Note on the latest linux-next and on the commit that introduced this the config
> > and kernel yields only *one* page:
> > 
> > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > 
> > I believe this is more indications my suspicion might be right.
> 
> If the following is a legit forced way to get query the kernel to ask it 
> who owns a page then perhaps this technique can be used in the future to
> figure out who the hell caused this. Catalin, can you confirm? In this
> case this is perhaps not a leaked page but I am trying to abuse the
> kmemleak debugfs API to query who allocated the page. Is that fine?
> 
> [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> [    0.918502] Modules linked in:
> [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.920011] Call Trace:
> [    0.920011]  dump_stack+0x63/0x81
> [    0.920011]  __warn+0xcb/0xf0
> [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> [    0.920011]  note_page+0x63c/0x7e0
> [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> [    0.920011]  ? 0xffffffff86c00000
> [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> [    0.920011]  mark_rodata_ro+0xf4/0x100
> [    0.920011]  ? rest_init+0x80/0x80
> [    0.920011]  kernel_init+0x2a/0x100
> [    0.920011]  ret_from_fork+0x2c/0x40
> [    0.925474] ---[ end trace dca00cd779490a2b ]---
> [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> 
> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> dmesg | tail
> 
> [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> [   49.212148] kmemleak:   min_count = 2
> [   49.212852] kmemleak:   count = 0
> [   49.213363] kmemleak:   flags = 0x1
> [   49.213363] kmemleak:   checksum = 0
> [   49.213363] kmemleak:   backtrace:
> [   49.213363]      kmemleak_alloc+0x4a/0xa0
> [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> [   49.213363]      module_alloc+0x67/0xc0
> [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> [   49.213363]      ftrace_startup+0x90/0x210
> [   49.213363]      register_ftrace_function+0x4b/0x60
> [   49.213363]      arm_kprobe+0x84/0xe0
> [   49.213363]      register_kprobe+0x56e/0x5b0
> [   49.213363]      init_test_probes+0x61/0x560
> [   49.213363]      init_kprobes+0x1e3/0x206
> [   49.213363]      do_one_initcall+0x52/0x1a0
> [   49.213363]      kernel_init_freeable+0x178/0x200
> [   49.213363]      kernel_init+0xe/0x100
> [   49.213363]      ret_from_fork+0x2c/0x40
> [   49.213363]      0xffffffffffffffff

Aha! And the winner is:

CONFIG_KPROBES_SANITY_TEST

I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
the same trace for both of these kernels.

So -- the above kmemleak hack seems to actually work to seek who owns that page.

Now to figure out how the hell kernel/test_kprobes.c screws around with things.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 17:28                   ` Luis R. Rodriguez
@ 2017-05-20  2:38                     ` Masami Hiramatsu
  2017-05-23 14:48                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Masami Hiramatsu @ 2017-05-20  2:38 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Jim Keniston, davem, sagar.abhishek, Catalin Marinas,
	Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar,
	Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton,
	Eric W. Biederman, Mateusz Guzik, LKML

Hi Luis,

On Fri, 19 May 2017 19:28:54 +0200
"Luis R. Rodriguez" <mcgrof@kernel.org> wrote:

> On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > > > I had a different layout, the ASLR gap was much larger:
> > > > > >
> > > > > > ---[ Modules ]---
> > > > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > > > > >
> > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > > > warned about:
> > > > > >
> > > > > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > > > [    1.451280] ------------[ cut here ]------------
> > > > > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > > [    1.452499] Modules linked in:
> > > > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > > > >
> > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > > > and it seems its typically output backwards from when that happened, sorting that
> > > > > > by address we get:
> > > > > 
> > > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > > /proc/modules, but that's fine, it's there.
> > > > > 
> > > > > >
> > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > > > >
> > > > > > And this then seems to be the first module loaded:
> > > > > >
> > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > >
> > > > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > > > as per above.
> > > > > >
> > > > > >> Something touched the module gap and left is RW+x...
> > > > > >
> > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > > > 
> > > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > > That seems odd, but maybe unload isn't cleaning up?
> > > > > 
> > > > > >> Are you able to bisect this?
> > > > > >
> > > > > > This issue has been present for a while so since I recall this I might be
> > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > > > a bit and if no clear culprit comes up then will try bisect.
> > > > > 
> > > > > Okay, thanks!
> > > > 
> > > > Sorry to report that this issue is present since the feature's addition. So
> > > > the issue is there since its addition and is still present today. *But* it
> > > > may also be a configuration issue, given I have booted this guest *without*
> > > > this issue ...
> > > > 
> > > > So:
> > > > 
> > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > > > 
> > > > That boots with the warning. To help debug further I've minimized my modules
> > > > to only a few: scsi_mod, e1000, libata.
> > > > 
> > > > I suspect at this point this is not the fault of a particular module but
> > > > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > > > 
> > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > > mappings") and I with:
> > > > 
> > > > [    0.949435] ------------[ cut here ]------------                             
> > > > [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > > [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > > > [    0.951814] Modules linked in:                                               
> > > > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > > > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > > > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > > > [    0.956256] Call Trace:                                                      
> > > > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> > > > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> > > > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> > > > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> > > > [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> > > > [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > > > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> > > > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> > > > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> > > > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> > > > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
> > > >                                                                                 
> > > >                                                                                 
> > > > ---[ High Kernel Mapping ]---                                                   
> > > > 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> > > > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> > > > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> > > > 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> > > > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> > > > 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> > > > 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> > > > ---[ Modules ]---                                                               
> > > > 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> > > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > > 
> > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> > > > e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > > > 
> > > > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > > > for a particular reason and it also happens to be on the edge of the high
> > > > kernel mapping. Could it be the boundary semantic issue ?
> > > > 
> > > > For instance can it be that since 0xffffffffc0002000 is given to the first
> > > > module by the allocator, scsi_mod, and since that address is *technically*
> > > > part of two boundaries we get a splat ?
> > > > 
> > > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > 
> > > Note on the latest linux-next and on the commit that introduced this the config
> > > and kernel yields only *one* page:
> > > 
> > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > > 
> > > I believe this is more indications my suspicion might be right.
> > 
> > If the following is a legit forced way to get query the kernel to ask it 
> > who owns a page then perhaps this technique can be used in the future to
> > figure out who the hell caused this. Catalin, can you confirm? In this
> > case this is perhaps not a leaked page but I am trying to abuse the
> > kmemleak debugfs API to query who allocated the page. Is that fine?
> > 
> > [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> > [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> > [    0.918502] Modules linked in:
> > [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> > [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > [    0.920011] Call Trace:
> > [    0.920011]  dump_stack+0x63/0x81
> > [    0.920011]  __warn+0xcb/0xf0
> > [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> > [    0.920011]  note_page+0x63c/0x7e0
> > [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> > [    0.920011]  ? 0xffffffff86c00000
> > [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> > [    0.920011]  mark_rodata_ro+0xf4/0x100
> > [    0.920011]  ? rest_init+0x80/0x80
> > [    0.920011]  kernel_init+0x2a/0x100
> > [    0.920011]  ret_from_fork+0x2c/0x40
> > [    0.925474] ---[ end trace dca00cd779490a2b ]---
> > [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > 
> > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> > dmesg | tail
> > 
> > [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> > [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> > [   49.212148] kmemleak:   min_count = 2
> > [   49.212852] kmemleak:   count = 0
> > [   49.213363] kmemleak:   flags = 0x1
> > [   49.213363] kmemleak:   checksum = 0
> > [   49.213363] kmemleak:   backtrace:
> > [   49.213363]      kmemleak_alloc+0x4a/0xa0
> > [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> > [   49.213363]      module_alloc+0x67/0xc0
> > [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> > [   49.213363]      ftrace_startup+0x90/0x210
> > [   49.213363]      register_ftrace_function+0x4b/0x60
> > [   49.213363]      arm_kprobe+0x84/0xe0
> > [   49.213363]      register_kprobe+0x56e/0x5b0
> > [   49.213363]      init_test_probes+0x61/0x560
> > [   49.213363]      init_kprobes+0x1e3/0x206
> > [   49.213363]      do_one_initcall+0x52/0x1a0
> > [   49.213363]      kernel_init_freeable+0x178/0x200
> > [   49.213363]      kernel_init+0xe/0x100
> > [   49.213363]      ret_from_fork+0x2c/0x40
> > [   49.213363]      0xffffffffffffffff
> 
> Aha! And the winner is:
> 
> CONFIG_KPROBES_SANITY_TEST
> 
> I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
> I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
> the same trace for both of these kernels.
> 
> So -- the above kmemleak hack seems to actually work to seek who owns that page.
> 
> Now to figure out how the hell kernel/test_kprobes.c screws around with things.

Ah, that was fixed recently;

https://marc.info/?l=linux-kernel&m=149076389011850

Note that this patch depends another patch in the series;

https://marc.info/?l=linux-kernel&m=149076370111812&w=2

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-20  2:38                     ` Masami Hiramatsu
@ 2017-05-23 14:48                       ` Luis R. Rodriguez
  2017-05-24 17:55                         ` Luis R. Rodriguez
  0 siblings, 1 reply; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-23 14:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Luis R. Rodriguez, Jim Keniston, davem, sagar.abhishek,
	Catalin Marinas, Steven Rostedt, Kees Cook, Stephen Smalley,
	Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote:
> Hi Luis,
> 
> On Fri, 19 May 2017 19:28:54 +0200
> "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> 
> > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> > > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > > Yes, but I had killed that boot session again, so upon my next boot
> > > > > > > I had a different layout, the ASLR gap was much larger:
> > > > > > >
> > > > > > > ---[ Modules ]---
> > > > > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > > > > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > > > > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > > > > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > > > > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > > > > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> > > > > > >
> > > > > > > As you can guess if we follow similar pattern the RW hole is the one this boot
> > > > > > > warned about:
> > > > > > >
> > > > > > > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > > > > > > [    1.451280] ------------[ cut here ]------------
> > > > > > > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > > > [    1.452499] Modules linked in:
> > > > > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> > > > > > >
> > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > > > > > > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > > > > > > order other than perhaps being pegged into a linked list of modules once they go live,
> > > > > > > and it seems its typically output backwards from when that happened, sorting that
> > > > > > > by address we get:
> > > > > > 
> > > > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > > > /proc/modules, but that's fine, it's there.
> > > > > > 
> > > > > > >
> > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> > > > > > >
> > > > > > > And this then seems to be the first module loaded:
> > > > > > >
> > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > > >
> > > > > > > The output of dmesg seems to confirm this as per the list of modules sorted
> > > > > > > as per above.
> > > > > > >
> > > > > > >> Something touched the module gap and left is RW+x...
> > > > > > >
> > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> > > > > > 
> > > > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > > > That seems odd, but maybe unload isn't cleaning up?
> > > > > > 
> > > > > > >> Are you able to bisect this?
> > > > > > >
> > > > > > > This issue has been present for a while so since I recall this I might be
> > > > > > > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > > > > > > a bit and if no clear culprit comes up then will try bisect.
> > > > > > 
> > > > > > Okay, thanks!
> > > > > 
> > > > > Sorry to report that this issue is present since the feature's addition. So
> > > > > the issue is there since its addition and is still present today. *But* it
> > > > > may also be a configuration issue, given I have booted this guest *without*
> > > > > this issue ...
> > > > > 
> > > > > So:
> > > > > 
> > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > > > > 
> > > > > That boots with the warning. To help debug further I've minimized my modules
> > > > > to only a few: scsi_mod, e1000, libata.
> > > > > 
> > > > > I suspect at this point this is not the fault of a particular module but
> > > > > instead just an accounting semantic (>= or <= on an edge) but let's see.
> > > > > 
> > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > > > mappings") and I with:
> > > > > 
> > > > > [    0.949435] ------------[ cut here ]------------                             
> > > > > [    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > > > [    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
> > > > > [    0.951814] Modules linked in:                                               
> > > > > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > > > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > > > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
> > > > > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
> > > > > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
> > > > > [    0.956256] Call Trace:                                                      
> > > > > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
> > > > > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
> > > > > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
> > > > > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
> > > > > [    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
> > > > > [    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > > > > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
> > > > > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > > > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
> > > > > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
> > > > > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
> > > > > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
> > > > > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
> > > > >                                                                                 
> > > > >                                                                                 
> > > > > ---[ High Kernel Mapping ]---                                                   
> > > > > 0xffffffff80000000-0xffffffff81000000          16M                               pmd
> > > > > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
> > > > > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
> > > > > 0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
> > > > > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
> > > > > 0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
> > > > > 0xffffffff82400000-0xffffffffc0000000         988M                               pmd
> > > > > ---[ Modules ]---                                                               
> > > > > 0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
> > > > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > > > 
> > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
> > > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
> > > > > e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
> > > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > > > > 
> > > > > So that 4K RW seems suspect of getting used for allocation purpose on edge
> > > > > for a particular reason and it also happens to be on the edge of the high
> > > > > kernel mapping. Could it be the boundary semantic issue ?
> > > > > 
> > > > > For instance can it be that since 0xffffffffc0002000 is given to the first
> > > > > module by the allocator, scsi_mod, and since that address is *technically*
> > > > > part of two boundaries we get a splat ?
> > > > > 
> > > > > 0xffffffffc0001000-0xffffffffc0002000           4K                               pte
> > > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte
> > > > 
> > > > Note on the latest linux-next and on the commit that introduced this the config
> > > > and kernel yields only *one* page:
> > > > 
> > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > > > 
> > > > I believe this is more indications my suspicion might be right.
> > > 
> > > If the following is a legit forced way to get query the kernel to ask it 
> > > who owns a page then perhaps this technique can be used in the future to
> > > figure out who the hell caused this. Catalin, can you confirm? In this
> > > case this is perhaps not a leaked page but I am trying to abuse the
> > > kmemleak debugfs API to query who allocated the page. Is that fine?
> > > 
> > > [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> > > [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> > > [    0.918502] Modules linked in:
> > > [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> > > [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > [    0.920011] Call Trace:
> > > [    0.920011]  dump_stack+0x63/0x81
> > > [    0.920011]  __warn+0xcb/0xf0
> > > [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> > > [    0.920011]  note_page+0x63c/0x7e0
> > > [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> > > [    0.920011]  ? 0xffffffff86c00000
> > > [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> > > [    0.920011]  mark_rodata_ro+0xf4/0x100
> > > [    0.920011]  ? rest_init+0x80/0x80
> > > [    0.920011]  kernel_init+0x2a/0x100
> > > [    0.920011]  ret_from_fork+0x2c/0x40
> > > [    0.925474] ---[ end trace dca00cd779490a2b ]---
> > > [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > > 
> > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> > > dmesg | tail
> > > 
> > > [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> > > [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> > > [   49.212148] kmemleak:   min_count = 2
> > > [   49.212852] kmemleak:   count = 0
> > > [   49.213363] kmemleak:   flags = 0x1
> > > [   49.213363] kmemleak:   checksum = 0
> > > [   49.213363] kmemleak:   backtrace:
> > > [   49.213363]      kmemleak_alloc+0x4a/0xa0
> > > [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> > > [   49.213363]      module_alloc+0x67/0xc0
> > > [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> > > [   49.213363]      ftrace_startup+0x90/0x210
> > > [   49.213363]      register_ftrace_function+0x4b/0x60
> > > [   49.213363]      arm_kprobe+0x84/0xe0
> > > [   49.213363]      register_kprobe+0x56e/0x5b0
> > > [   49.213363]      init_test_probes+0x61/0x560
> > > [   49.213363]      init_kprobes+0x1e3/0x206
> > > [   49.213363]      do_one_initcall+0x52/0x1a0
> > > [   49.213363]      kernel_init_freeable+0x178/0x200
> > > [   49.213363]      kernel_init+0xe/0x100
> > > [   49.213363]      ret_from_fork+0x2c/0x40
> > > [   49.213363]      0xffffffffffffffff
> > 
> > Aha! And the winner is:
> > 
> > CONFIG_KPROBES_SANITY_TEST
> > 
> > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
> > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
> > the same trace for both of these kernels.
> > 
> > So -- the above kmemleak hack seems to actually work to seek who owns that page.
> > 
> > Now to figure out how the hell kernel/test_kprobes.c screws around with things.
> 
> Ah, that was fixed recently;
> 
> https://marc.info/?l=linux-kernel&m=149076389011850
> 
> Note that this patch depends another patch in the series;
> 
> https://marc.info/?l=linux-kernel&m=149076370111812&w=2

I actually boot tested linux-next tag next-20170519 which carries these
patches and the WARNING still is there. Please note the issue is with
CONFIG_KPROBES_SANITY_TEST enabled.

[    1.025601] x86/mm: Found insecure W+X mapping at address ffffffffc01e7000/0xffffffffc01e7000
[    1.026429] ------------[ cut here ]------------
[    1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
[    1.027711] Modules linked in:
[    1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170519 #151
[    1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000
[    1.030509] RIP: 0010:note_page+0x630/0x7e0
[    1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286
[    1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708
[    1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
[    1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161
[    1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000
[    1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
[    1.034927] FS:  0000000000000000(0000) GS:ffff9fd47fc80000(0000) knlGS:0000000000000000
[    1.035722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0
[    1.036839] Call Trace:
[    1.037034]  ptdump_walk_pgd_level_core+0x3e7/0x490
[    1.037367]  ? 0xffffffffbaa00000
[    1.037705]  ptdump_walk_pgd_level_checkwx+0x17/0x20
[    1.038187]  mark_rodata_ro+0xf4/0x100
[    1.038559]  ? rest_init+0x80/0x80
[    1.038890]  kernel_init+0x2f/0x100
[    1.039235]  ret_from_fork+0x2c/0x40
[    1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02
[    1.041416] ---[ end trace e726c1b63e5a81a9 ]---
[    1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak

On dmesg:

May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335):
May 23 07:44:51 piggy kernel: kmemleak:   comm "swapper/0", pid 1, jiffies 4294892451
May 23 07:44:51 piggy kernel: kmemleak:   min_count = 2
May 23 07:44:51 piggy kernel: kmemleak:   count = 2
May 23 07:44:51 piggy kernel: kmemleak:   flags = 0x1
May 23 07:44:51 piggy kernel: kmemleak:   checksum = 0
May 23 07:44:51 piggy kernel: kmemleak:   backtrace:
May 23 07:44:51 piggy kernel:      kmemleak_alloc+0x4a/0xa0
May 23 07:44:51 piggy kernel:      __vmalloc_node_range+0x20c/0x2b0
May 23 07:44:51 piggy kernel:      module_alloc+0x67/0xc0
May 23 07:44:51 piggy kernel:      arch_ftrace_update_trampoline+0xc1/0x240
May 23 07:44:51 piggy kernel:      ftrace_startup+0x92/0x210
May 23 07:44:51 piggy kernel:      register_ftrace_function+0x4b/0x60
May 23 07:44:51 piggy kernel:      arm_kprobe+0x84/0xc0
May 23 07:44:51 piggy kernel:      register_kprobe+0x59c/0x5e0
May 23 07:44:51 piggy kernel:      init_test_probes+0x61/0x560
May 23 07:44:51 piggy kernel:      init_kprobes+0x1ea/0x20d
May 23 07:44:51 piggy kernel:      do_one_initcall+0x52/0x1a0
May 23 07:44:51 piggy kernel:      kernel_init_freeable+0x17d/0x205
May 23 07:44:51 piggy kernel:      kernel_init+0xe/0x100
May 23 07:44:51 piggy kernel:      ret_from_fork+0x2c/0x40
May 23 07:44:51 piggy kernel:      0xffffffffffffffff

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-23 14:48                       ` Luis R. Rodriguez
@ 2017-05-24 17:55                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-24 17:55 UTC (permalink / raw)
  To: Luis R. Rodriguez, Thomas Gleixner
  Cc: Masami Hiramatsu, Jim Keniston, davem, sagar.abhishek,
	Catalin Marinas, Steven Rostedt, Kees Cook, Stephen Smalley,
	Ingo Molnar, Andy Lutomirski, Michal Hocko, Vlastimil Babka,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Tue, May 23, 2017 at 04:48:50PM +0200, Luis R. Rodriguez wrote:
> On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote:
> > Hi Luis,
> > 
> > On Fri, 19 May 2017 19:28:54 +0200
> > "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> > > 
> > > Aha! And the winner is:
> > > 
> > > CONFIG_KPROBES_SANITY_TEST
> > > 
> > > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids the WARN.
> > > I also can confirm using the 'echo dump=mem-area > /sys/kernel/debug/kmemleak' yields
> > > the same trace for both of these kernels.
> > > 
> > > So -- the above kmemleak hack seems to actually work to seek who owns that page.
> > > 
> > > Now to figure out how the hell kernel/test_kprobes.c screws around with things.
> > 
> > Ah, that was fixed recently;
> > 
> > https://marc.info/?l=linux-kernel&m=149076389011850
> > 
> > Note that this patch depends another patch in the series;
> > 
> > https://marc.info/?l=linux-kernel&m=149076370111812&w=2
> 
> I actually boot tested linux-next tag next-20170519 which carries these
> patches and the WARNING still is there. Please note the issue is with
> CONFIG_KPROBES_SANITY_TEST enabled.
> 
> [    1.025601] x86/mm: Found insecure W+X mapping at address ffffffffc01e7000/0xffffffffc01e7000
> [    1.026429] ------------[ cut here ]------------
> [    1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> [    1.027711] Modules linked in:
> [    1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170519 #151
> [    1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000
> [    1.030509] RIP: 0010:note_page+0x630/0x7e0
> [    1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286
> [    1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708
> [    1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
> [    1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161
> [    1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000
> [    1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
> [    1.034927] FS:  0000000000000000(0000) GS:ffff9fd47fc80000(0000) knlGS:0000000000000000
> [    1.035722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0
> [    1.036839] Call Trace:
> [    1.037034]  ptdump_walk_pgd_level_core+0x3e7/0x490
> [    1.037367]  ? 0xffffffffbaa00000
> [    1.037705]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> [    1.038187]  mark_rodata_ro+0xf4/0x100
> [    1.038559]  ? rest_init+0x80/0x80
> [    1.038890]  kernel_init+0x2f/0x100
> [    1.039235]  ret_from_fork+0x2c/0x40
> [    1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02
> [    1.041416] ---[ end trace e726c1b63e5a81a9 ]---
> [    1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> 
> root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak
> 
> On dmesg:
> 
> May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335):
> May 23 07:44:51 piggy kernel: kmemleak:   comm "swapper/0", pid 1, jiffies 4294892451
> May 23 07:44:51 piggy kernel: kmemleak:   min_count = 2
> May 23 07:44:51 piggy kernel: kmemleak:   count = 2
> May 23 07:44:51 piggy kernel: kmemleak:   flags = 0x1
> May 23 07:44:51 piggy kernel: kmemleak:   checksum = 0
> May 23 07:44:51 piggy kernel: kmemleak:   backtrace:
> May 23 07:44:51 piggy kernel:      kmemleak_alloc+0x4a/0xa0
> May 23 07:44:51 piggy kernel:      __vmalloc_node_range+0x20c/0x2b0
> May 23 07:44:51 piggy kernel:      module_alloc+0x67/0xc0
> May 23 07:44:51 piggy kernel:      arch_ftrace_update_trampoline+0xc1/0x240
> May 23 07:44:51 piggy kernel:      ftrace_startup+0x92/0x210
> May 23 07:44:51 piggy kernel:      register_ftrace_function+0x4b/0x60
> May 23 07:44:51 piggy kernel:      arm_kprobe+0x84/0xc0
> May 23 07:44:51 piggy kernel:      register_kprobe+0x59c/0x5e0
> May 23 07:44:51 piggy kernel:      init_test_probes+0x61/0x560
> May 23 07:44:51 piggy kernel:      init_kprobes+0x1ea/0x20d
> May 23 07:44:51 piggy kernel:      do_one_initcall+0x52/0x1a0
> May 23 07:44:51 piggy kernel:      kernel_init_freeable+0x17d/0x205
> May 23 07:44:51 piggy kernel:      kernel_init+0xe/0x100
> May 23 07:44:51 piggy kernel:      ret_from_fork+0x2c/0x40
> May 23 07:44:51 piggy kernel:      0xffffffffffffffff

Turns out that Thomas Gleixner's patch from today [0] fixes this as the same
module_alloc() path was the culprit of the issue. Steven Rostedt however just
reported that this patch crashes on his ftracetests, so it would seem we just
need to address that last kink to fix this properly.

We can take this further on, in that other thread.

[0] https://lkml.kernel.org/r/alpine.DEB.2.20.1705241459480.2201@nanos
[1] https://lkml.kernel.org/r/20170524134728.61a896c9@vmware.local.home

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 15:40                 ` Luis R. Rodriguez
  2017-05-19 17:28                   ` Luis R. Rodriguez
@ 2017-05-19 17:35                   ` Catalin Marinas
  2017-05-19 18:27                     ` Andy Lutomirski
  2017-05-26 22:13                     ` Luis R. Rodriguez
  1 sibling, 2 replies; 21+ messages in thread
From: Catalin Marinas @ 2017-05-19 17:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar,
	Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton,
	Eric W. Biederman, Mateusz Guzik, LKML

On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> If the following is a legit forced way to get query the kernel to ask it 
> who owns a page then perhaps this technique can be used in the future to
> figure out who the hell caused this. Catalin, can you confirm? In this
> case this is perhaps not a leaked page but I am trying to abuse the
> kmemleak debugfs API to query who allocated the page. Is that fine?
> 
> [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
> [    0.918502] Modules linked in:
> [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
> [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.920011] Call Trace:
> [    0.920011]  dump_stack+0x63/0x81
> [    0.920011]  __warn+0xcb/0xf0
> [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> [    0.920011]  note_page+0x63c/0x7e0
> [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> [    0.920011]  ? 0xffffffff86c00000
> [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> [    0.920011]  mark_rodata_ro+0xf4/0x100
> [    0.920011]  ? rest_init+0x80/0x80
> [    0.920011]  kernel_init+0x2a/0x100
> [    0.920011]  ret_from_fork+0x2c/0x40
> [    0.925474] ---[ end trace dca00cd779490a2b ]---
> [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> 
> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> dmesg | tail
> 
> [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> [   49.212148] kmemleak:   min_count = 2
> [   49.212852] kmemleak:   count = 0
> [   49.213363] kmemleak:   flags = 0x1
> [   49.213363] kmemleak:   checksum = 0
> [   49.213363] kmemleak:   backtrace:
> [   49.213363]      kmemleak_alloc+0x4a/0xa0
> [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> [   49.213363]      module_alloc+0x67/0xc0
> [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> [   49.213363]      ftrace_startup+0x90/0x210
> [   49.213363]      register_ftrace_function+0x4b/0x60
> [   49.213363]      arm_kprobe+0x84/0xe0
> [   49.213363]      register_kprobe+0x56e/0x5b0
> [   49.213363]      init_test_probes+0x61/0x560
> [   49.213363]      init_kprobes+0x1e3/0x206
> [   49.213363]      do_one_initcall+0x52/0x1a0
> [   49.213363]      kernel_init_freeable+0x178/0x200
> [   49.213363]      kernel_init+0xe/0x100
> [   49.213363]      ret_from_fork+0x2c/0x40
> [   49.213363]      0xffffffffffffffff

You could as well use kmemleak this way since it tracks the memory
allocations. However, it doesn't track alloc_pages and also doesn't
track mapping existing pages (vmap etc.)

-- 
Catalin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 17:35                   ` Catalin Marinas
@ 2017-05-19 18:27                     ` Andy Lutomirski
  2017-05-19 19:16                       ` Kees Cook
  2017-05-26 22:13                     ` Luis R. Rodriguez
  1 sibling, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2017-05-19 18:27 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Luis R. Rodriguez, Steven Rostedt, Kees Cook, Stephen Smalley,
	Ingo Molnar, Michal Hocko, Vlastimil Babka, Andrew Morton,
	Eric W. Biederman, Mateusz Guzik, LKML

On Fri, May 19, 2017 at 10:35 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
>> If the following is a legit forced way to get query the kernel to ask it
>> who owns a page then perhaps this technique can be used in the future to
>> figure out who the hell caused this. Catalin, can you confirm? In this
>> case this is perhaps not a leaked page but I am trying to abuse the
>> kmemleak debugfs API to query who allocated the page. Is that fine?
>>
>> [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
>> [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
>> [    0.918502] Modules linked in:
>> [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
>> [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> [    0.920011] Call Trace:
>> [    0.920011]  dump_stack+0x63/0x81
>> [    0.920011]  __warn+0xcb/0xf0
>> [    0.920011]  warn_slowpath_fmt+0x5a/0x80
>> [    0.920011]  note_page+0x63c/0x7e0
>> [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
>> [    0.920011]  ? 0xffffffff86c00000
>> [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
>> [    0.920011]  mark_rodata_ro+0xf4/0x100
>> [    0.920011]  ? rest_init+0x80/0x80
>> [    0.920011]  kernel_init+0x2a/0x100
>> [    0.920011]  ret_from_fork+0x2c/0x40
>> [    0.925474] ---[ end trace dca00cd779490a2b ]---
>> [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
>>
>> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
>> dmesg | tail
>>
>> [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
>> [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
>> [   49.212148] kmemleak:   min_count = 2
>> [   49.212852] kmemleak:   count = 0
>> [   49.213363] kmemleak:   flags = 0x1
>> [   49.213363] kmemleak:   checksum = 0
>> [   49.213363] kmemleak:   backtrace:
>> [   49.213363]      kmemleak_alloc+0x4a/0xa0
>> [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
>> [   49.213363]      module_alloc+0x67/0xc0
>> [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
>> [   49.213363]      ftrace_startup+0x90/0x210
>> [   49.213363]      register_ftrace_function+0x4b/0x60
>> [   49.213363]      arm_kprobe+0x84/0xe0
>> [   49.213363]      register_kprobe+0x56e/0x5b0
>> [   49.213363]      init_test_probes+0x61/0x560
>> [   49.213363]      init_kprobes+0x1e3/0x206
>> [   49.213363]      do_one_initcall+0x52/0x1a0
>> [   49.213363]      kernel_init_freeable+0x178/0x200
>> [   49.213363]      kernel_init+0xe/0x100
>> [   49.213363]      ret_from_fork+0x2c/0x40
>> [   49.213363]      0xffffffffffffffff
>
> You could as well use kmemleak this way since it tracks the memory
> allocations. However, it doesn't track alloc_pages and also doesn't
> track mapping existing pages (vmap etc.)

One thing I've pondered: can we make some debugging mode (kmemleak,
perhaps?) check that freed memory is RW at the time it's freed?  I
once wrote some buggy code that freed an R page and caused an OOPS
much later, and this bug here seems likely to be some code that frees
RWX memory.

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 18:27                     ` Andy Lutomirski
@ 2017-05-19 19:16                       ` Kees Cook
  2017-05-19 19:18                         ` Andy Lutomirski
  0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2017-05-19 19:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Luis R. Rodriguez, Steven Rostedt,
	Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote:
> One thing I've pondered: can we make some debugging mode (kmemleak,
> perhaps?) check that freed memory is RW at the time it's freed?  I
> once wrote some buggy code that freed an R page and caused an OOPS
> much later, and this bug here seems likely to be some code that frees
> RWX memory.

Which begs for even more checks: nothing should ever make a page RWX.
Either R, RW, or RX only... (or X too I guess, in the future).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 19:16                       ` Kees Cook
@ 2017-05-19 19:18                         ` Andy Lutomirski
  2017-05-19 19:29                           ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2017-05-19 19:18 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Catalin Marinas, Luis R. Rodriguez,
	Steven Rostedt, Stephen Smalley, Ingo Molnar, Michal Hocko,
	Vlastimil Babka, Andrew Morton, Eric W. Biederman, Mateusz Guzik,
	LKML

On Fri, May 19, 2017 at 12:16 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> One thing I've pondered: can we make some debugging mode (kmemleak,
>> perhaps?) check that freed memory is RW at the time it's freed?  I
>> once wrote some buggy code that freed an R page and caused an OOPS
>> much later, and this bug here seems likely to be some code that frees
>> RWX memory.
>
> Which begs for even more checks: nothing should ever make a page RWX.
> Either R, RW, or RX only... (or X too I guess, in the future).

I could see pages being RWX temporarily during boot.  OTOH if we ban
RWX outright (after very early boot, anyway), then catching code that
messes up and leaves pages RWX gets much easier.

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 19:18                         ` Andy Lutomirski
@ 2017-05-19 19:29                           ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2017-05-19 19:29 UTC (permalink / raw)
  To: Andy Lutomirski, Laura Abbott
  Cc: Catalin Marinas, Luis R. Rodriguez, Steven Rostedt,
	Stephen Smalley, Ingo Molnar, Michal Hocko, Vlastimil Babka,
	Andrew Morton, Eric W. Biederman, Mateusz Guzik, LKML

On Fri, May 19, 2017 at 12:18 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Fri, May 19, 2017 at 12:16 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, May 19, 2017 at 11:27 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>> One thing I've pondered: can we make some debugging mode (kmemleak,
>>> perhaps?) check that freed memory is RW at the time it's freed?  I
>>> once wrote some buggy code that freed an R page and caused an OOPS
>>> much later, and this bug here seems likely to be some code that frees
>>> RWX memory.
>>
>> Which begs for even more checks: nothing should ever make a page RWX.
>> Either R, RW, or RX only... (or X too I guess, in the future).
>
> I could see pages being RWX temporarily during boot.  OTOH if we ban
> RWX outright (after very early boot, anyway), then catching code that
> messes up and leaves pages RWX gets much easier.

Right, early boot is kind of special. It'd be nice to have there, but
I meant during normal runtime. We'd probably need to adjust
set_memory_rw/ro/nx/x around to have the correct side-effects, instead
of just controlling specific bits:

set_memory_rw() (RW_)
set_memory_ro() (R__)
set_memory_rx() (R_X)
set_memory_x() (__X)

That kind of refactoring might be not _too_ bad:

- add set_memory_rx()
- s/\bset_memory_x\b/set_memory_rx/g
- fix what breaks from expecting writable-executable memory
- adjust set_memory_rw() to drop x
- fix what breaks from expecting writable-executable memory
- adjust set_memory_ro() to drop x
- fix what breaks from expecting executable memory
- add set_memory_x() some day...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-19 17:35                   ` Catalin Marinas
  2017-05-19 18:27                     ` Andy Lutomirski
@ 2017-05-26 22:13                     ` Luis R. Rodriguez
  1 sibling, 0 replies; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-26 22:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Steven Rostedt, Kees Cook, Stephen Smalley, Ingo Molnar,
	Andy Lutomirski, Michal Hocko, Vlastimil Babka, Andrew Morton,
	Eric W. Biederman, Mateusz Guzik, LKML, Luis R. Rodriguez

On Fri, May 19, 2017 at 10:35 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
>> If the following is a legit forced way to get query the kernel to ask it
>> who owns a page then perhaps this technique can be used in the future to
>> figure out who the hell caused this. Catalin, can you confirm? In this
>> case this is perhaps not a leaked page but I am trying to abuse the
>> kmemleak debugfs API to query who allocated the page. Is that fine?
>>
>> [    0.916771] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
>> [    0.917636] x86/mm: Found insecure W+X mapping at address ffffffffc03d5000/0xffffffffc03d5000
>> [    0.918502] Modules linked in:
>> [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-mcgrof-force-config #340
>> [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> [    0.920011] Call Trace:
>> [    0.920011]  dump_stack+0x63/0x81
>> [    0.920011]  __warn+0xcb/0xf0
>> [    0.920011]  warn_slowpath_fmt+0x5a/0x80
>> [    0.920011]  note_page+0x63c/0x7e0
>> [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
>> [    0.920011]  ? 0xffffffff86c00000
>> [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
>> [    0.920011]  mark_rodata_ro+0xf4/0x100
>> [    0.920011]  ? rest_init+0x80/0x80
>> [    0.920011]  kernel_init+0x2a/0x100
>> [    0.920011]  ret_from_fork+0x2c/0x40
>> [    0.925474] ---[ end trace dca00cd779490a2b ]---
>> [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
>>
>> echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
>> dmesg | tail
>>
>> [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
>> [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
>> [   49.212148] kmemleak:   min_count = 2
>> [   49.212852] kmemleak:   count = 0
>> [   49.213363] kmemleak:   flags = 0x1
>> [   49.213363] kmemleak:   checksum = 0
>> [   49.213363] kmemleak:   backtrace:
>> [   49.213363]      kmemleak_alloc+0x4a/0xa0
>> [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
>> [   49.213363]      module_alloc+0x67/0xc0
>> [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
>> [   49.213363]      ftrace_startup+0x90/0x210
>> [   49.213363]      register_ftrace_function+0x4b/0x60
>> [   49.213363]      arm_kprobe+0x84/0xe0
>> [   49.213363]      register_kprobe+0x56e/0x5b0
>> [   49.213363]      init_test_probes+0x61/0x560
>> [   49.213363]      init_kprobes+0x1e3/0x206
>> [   49.213363]      do_one_initcall+0x52/0x1a0
>> [   49.213363]      kernel_init_freeable+0x178/0x200
>> [   49.213363]      kernel_init+0xe/0x100
>> [   49.213363]      ret_from_fork+0x2c/0x40
>> [   49.213363]      0xffffffffffffffff
>
> You could as well use kmemleak this way since it tracks the memory
> allocations.

Great!

> However, it doesn't track alloc_pages and also doesn't
> track mapping existing pages (vmap etc.)

Can we verify that? If so then the splat from the above complaint
could include a follow up dump of the trace, no ? That's *much* more
useful.

 Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
  2017-05-15 22:15 ` Luis R. Rodriguez
  2017-05-15 22:57   ` Kees Cook
@ 2017-05-15 23:30   ` Luis R. Rodriguez
  1 sibling, 0 replies; 21+ messages in thread
From: Luis R. Rodriguez @ 2017-05-15 23:30 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Stephen Smalley, Ingo Molnar, Andy Lutomirski, Michal Hocko,
	Andrew Morton, Kees Cook, Eric W. Biederman, Mateusz Guzik,
	linux-kernel

On Mon, May 15, 2017 at 3:15 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, May 16, 2017 at 12:06:50AM +0200, Luis R. Rodriguez wrote:
>> Using QEMU emulator version 2.7.94 (v2.8.0-rc4-dirty)
>>
>> I will try updating my distro package for qemu and see if perhaps its this
>> and for the other odd fork issue I reported [0].
>>
>> [0] https://lkml.kernel.org/r/CAB=NE6VZXq3y-3pfouYTBUco2Cq2xqoLZrgDFdVx+_=_=SwG_Q@mail.gmail.com
>
> Yeah nope, using my distribution latest:
>
> QEMU emulator version 2.8.0(openSUSE Tumbleweed)
>
> And still both issues are present.

FWIW also compiled and tried to boot with the latest qemu, v2.9.0-rc5
and it also has both issues, so I don't think this is because of the
version of qemu.

 Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-05-27  1:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 22:06 next-20170515: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0 Luis R. Rodriguez
2017-05-15 22:15 ` Luis R. Rodriguez
2017-05-15 22:57   ` Kees Cook
2017-05-15 23:45     ` Luis R. Rodriguez
2017-05-16  0:12       ` Kees Cook
2017-05-17 16:40         ` Luis R. Rodriguez
2017-05-17 17:53           ` Kees Cook
2017-05-19  0:44             ` Luis R. Rodriguez
2017-05-19  3:08               ` Luis R. Rodriguez
2017-05-19 15:40                 ` Luis R. Rodriguez
2017-05-19 17:28                   ` Luis R. Rodriguez
2017-05-20  2:38                     ` Masami Hiramatsu
2017-05-23 14:48                       ` Luis R. Rodriguez
2017-05-24 17:55                         ` Luis R. Rodriguez
2017-05-19 17:35                   ` Catalin Marinas
2017-05-19 18:27                     ` Andy Lutomirski
2017-05-19 19:16                       ` Kees Cook
2017-05-19 19:18                         ` Andy Lutomirski
2017-05-19 19:29                           ` Kees Cook
2017-05-26 22:13                     ` Luis R. Rodriguez
2017-05-15 23:30   ` Luis R. Rodriguez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.