linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN & the vmalloc area
@ 2016-11-08 19:03 Mark Rutland
  2016-11-08 22:09 ` Dmitry Vyukov
  2016-11-09 16:53 ` Andrey Ryabinin
  0 siblings, 2 replies; 8+ messages in thread
From: Mark Rutland @ 2016-11-08 19:03 UTC (permalink / raw)
  To: Dmitry Vyukov, Andy Lutomirski
  Cc: Andrey Ryabinin, Laura Abbott, Ard Biesheuvel, linux-kernel,
	linux-arm-kernel

Hi,

I see a while back [1] there was a discussion of what to do about KASAN
and vmapped stacks, but it doesn't look like that was solved, judging by
the vmapped stacks pull [2] for v4.9.

I wondered whether anyone had looked at that since?

I have an additional reason to want to dynamically allocate the vmalloc
area shadow: it turns out that KASAN currently interacts rather poorly
with the arm64 ptdump code.

When KASAN is selected, we allocate shadow for the whole vmalloc area,
using common zero pte, pmd, pud tables. Walking over these in the ptdump
code takes a *very* long time (I've seen up to 15 minutes with
KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
long, too.

If I don't allocate vmalloc shadow (and remove the apparently pointlesss
shadow of the shadow area), and only allocate shadow for the image,
fixmap, vmemmap and so on, that delay gets cut to a few seconds, which
is tolerable for a debug configuration...

... however, things blow up when the kernel touches vmalloc'd memory for
the first time, as we don't install shadow for that dynamically.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/CALCETrWucrYp+yq8RHSDqf93xtg793duByirurzJbLRhrz=tcA@mail.gmail.com
[2] https://lkml.kernel.org/r/20161003092940.GA691@gmail.com
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-October/464191.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-08 19:03 KASAN & the vmalloc area Mark Rutland
@ 2016-11-08 22:09 ` Dmitry Vyukov
  2016-11-09 10:56   ` Mark Rutland
  2016-11-09 16:53 ` Andrey Ryabinin
  1 sibling, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2016-11-08 22:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andy Lutomirski, Andrey Ryabinin, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel, kasan-dev

On Tue, Nov 8, 2016 at 11:03 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> I see a while back [1] there was a discussion of what to do about KASAN
> and vmapped stacks, but it doesn't look like that was solved, judging by
> the vmapped stacks pull [2] for v4.9.
>
> I wondered whether anyone had looked at that since?
>
> I have an additional reason to want to dynamically allocate the vmalloc
> area shadow: it turns out that KASAN currently interacts rather poorly
> with the arm64 ptdump code.
>
> When KASAN is selected, we allocate shadow for the whole vmalloc area,
> using common zero pte, pmd, pud tables. Walking over these in the ptdump
> code takes a *very* long time (I've seen up to 15 minutes with
> KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
> long, too.
>
> If I don't allocate vmalloc shadow (and remove the apparently pointlesss
> shadow of the shadow area), and only allocate shadow for the image,
> fixmap, vmemmap and so on, that delay gets cut to a few seconds, which
> is tolerable for a debug configuration...
>
> ... however, things blow up when the kernel touches vmalloc'd memory for
> the first time, as we don't install shadow for that dynamically.


I've seen the same iteration slowness problem on x86 with
CONFIG_DEBUG_RODATA which walks all pages. The is about 1 minute, but
it is enough to trigger rcu stall warning.

The zero pud and vmalloc-ed stacks looks like different problems.
To overcome the slowness we could map zero shadow for vmalloc area lazily.
However for vmalloc-ed stacks we need to map actual memory, because
stack instrumentation will read/write into the shadow. One downside
here is that vmalloc shadow can be as large as 1:1 (if we allocate 1
page in vmalloc area we need to allocate 1 page for shadow).

Re slowness: could we just skip the KASAN zero puds (the top level)
while walking? Can they be interesting for anybody? We can just
pretend that they are not there. Looks like a trivial solution for the
problem at hand.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-08 22:09 ` Dmitry Vyukov
@ 2016-11-09 10:56   ` Mark Rutland
  2016-11-09 18:16     ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2016-11-09 10:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andy Lutomirski, Andrey Ryabinin, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel, kasan-dev

On Tue, Nov 08, 2016 at 02:09:27PM -0800, Dmitry Vyukov wrote:
> On Tue, Nov 8, 2016 at 11:03 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > When KASAN is selected, we allocate shadow for the whole vmalloc area,
> > using common zero pte, pmd, pud tables. Walking over these in the ptdump
> > code takes a *very* long time (I've seen up to 15 minutes with
> > KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
> > long, too.

[...]
 
> I've seen the same iteration slowness problem on x86 with
> CONFIG_DEBUG_RODATA which walks all pages. The is about 1 minute, but
> it is enough to trigger rcu stall warning.

Interesting; do you know where that happens? I can't spot any obvious
case where we'd have to walk all the page tables for DEBUG_RODATA.

> The zero pud and vmalloc-ed stacks looks like different problems.
> To overcome the slowness we could map zero shadow for vmalloc area lazily.
> However for vmalloc-ed stacks we need to map actual memory, because
> stack instrumentation will read/write into the shadow. 

Sure. The point I was trying to make is that there' be fewer page tables
to walk (unless the vmalloc area was exhausted), assuming we also lazily
mapped the common zero shadow for the vmalloc area.

> One downside here is that vmalloc shadow can be as large as 1:1 (if we
> allocate 1 page in vmalloc area we need to allocate 1 page for
> shadow).

I thought per prior discussion we'd only need to allocate new pages for
the stacks in the vmalloc region, and we could re-use the zero pages?

... or are you trying to quantify the cost of the page tables?

> Re slowness: could we just skip the KASAN zero puds (the top level)
> while walking? Can they be interesting for anybody?

They're interesting for the ptdump case (which allows privileged users
to dump the tables via /sys/kernel/debug/kernel_page_tables). I've seen
25+ minute hangs there.

> We can just pretend that they are not there. Looks like a trivial
> solution for the problem at hand.

For the boot time hang it's option. Though I'd prefer that the sanity
checks applied to all of tables, shadow regions included.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-08 19:03 KASAN & the vmalloc area Mark Rutland
  2016-11-08 22:09 ` Dmitry Vyukov
@ 2016-11-09 16:53 ` Andrey Ryabinin
  2016-11-09 18:19   ` Dmitry Vyukov
  1 sibling, 1 reply; 8+ messages in thread
From: Andrey Ryabinin @ 2016-11-09 16:53 UTC (permalink / raw)
  To: Mark Rutland, Dmitry Vyukov, Andy Lutomirski
  Cc: Laura Abbott, Ard Biesheuvel, linux-kernel, linux-arm-kernel

On 11/08/2016 10:03 PM, Mark Rutland wrote:
> Hi,
> 
> I see a while back [1] there was a discussion of what to do about KASAN
> and vmapped stacks, but it doesn't look like that was solved, judging by
> the vmapped stacks pull [2] for v4.9.
> 
> I wondered whether anyone had looked at that since?
> 

Sadly, but I didn't have much time for this yet, so it's in an initial state.

> I have an additional reason to want to dynamically allocate the vmalloc
> area shadow: it turns out that KASAN currently interacts rather poorly
> with the arm64 ptdump code.
> 
> When KASAN is selected, we allocate shadow for the whole vmalloc area,
> using common zero pte, pmd, pud tables. Walking over these in the ptdump
> code takes a *very* long time (I've seen up to 15 minutes with
> KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
> long, too.
> 

I didn't look at any code, but we probably could can remember last visited pgd
and skip next pgd if it's the same as previous.
Alternatively - just skip kasan_zero_p*d in ptdump walker.

> If I don't allocate vmalloc shadow (and remove the apparently pointlesss
> shadow of the shadow area), and only allocate shadow for the image,
> fixmap, vmemmap and so on, that delay gets cut to a few seconds, which
> is tolerable for a debug configuration...
> 
> ... however, things blow up when the kernel touches vmalloc'd memory for
> the first time, as we don't install shadow for that dynamically.
> 
> Thanks,
> Mark.
> 
> [1] https://lkml.kernel.org/r/CALCETrWucrYp+yq8RHSDqf93xtg793duByirurzJbLRhrz=tcA@mail.gmail.com
> [2] https://lkml.kernel.org/r/20161003092940.GA691@gmail.com
> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-October/464191.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-09 10:56   ` Mark Rutland
@ 2016-11-09 18:16     ` Dmitry Vyukov
  2016-11-09 18:30       ` Mark Rutland
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2016-11-09 18:16 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andy Lutomirski, Andrey Ryabinin, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel, kasan-dev

On Wed, Nov 9, 2016 at 2:56 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Nov 08, 2016 at 02:09:27PM -0800, Dmitry Vyukov wrote:
>> On Tue, Nov 8, 2016 at 11:03 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > When KASAN is selected, we allocate shadow for the whole vmalloc area,
>> > using common zero pte, pmd, pud tables. Walking over these in the ptdump
>> > code takes a *very* long time (I've seen up to 15 minutes with
>> > KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
>> > long, too.
>
> [...]
>
>> I've seen the same iteration slowness problem on x86 with
>> CONFIG_DEBUG_RODATA which walks all pages. The is about 1 minute, but
>> it is enough to trigger rcu stall warning.
>
> Interesting; do you know where that happens? I can't spot any obvious
> case where we'd have to walk all the page tables for DEBUG_RODATA.

As far as I remember it was this path:

mark_readonly in main.c -> mark_rodata_ro -> debug_checkwx ->
ptdump_walk_pgd_level_checkwx -> ptdump_walk_pgd_level_core.


>> The zero pud and vmalloc-ed stacks looks like different problems.
>> To overcome the slowness we could map zero shadow for vmalloc area lazily.
>> However for vmalloc-ed stacks we need to map actual memory, because
>> stack instrumentation will read/write into the shadow.
>
> Sure. The point I was trying to make is that there' be fewer page tables
> to walk (unless the vmalloc area was exhausted), assuming we also lazily
> mapped the common zero shadow for the vmalloc area.
>
>> One downside here is that vmalloc shadow can be as large as 1:1 (if we
>> allocate 1 page in vmalloc area we need to allocate 1 page for
>> shadow).
>
> I thought per prior discussion we'd only need to allocate new pages for
> the stacks in the vmalloc region, and we could re-use the zero pages?

We can't reuse zero ro pages for stacks, because stack instrumentation
writes to stack shadow.
When we have a large continuous range of memory, shadow for it is
1/8th. However, if we have a separate page, we will need to map whole
page of shadow for it, i.e. 1:1 shadow overhead.


> ... or are you trying to quantify the cost of the page tables?
>
>> Re slowness: could we just skip the KASAN zero puds (the top level)
>> while walking? Can they be interesting for anybody?
>
> They're interesting for the ptdump case (which allows privileged users
> to dump the tables via /sys/kernel/debug/kernel_page_tables). I've seen
> 25+ minute hangs there.
>
>> We can just pretend that they are not there. Looks like a trivial
>> solution for the problem at hand.
>
> For the boot time hang it's option. Though I'd prefer that the sanity
> checks applied to all of tables, shadow regions included.
>
> Thanks,
> Mark.
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To post to this group, send email to kasan-dev@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20161109105624.GA17020%40leverpostej.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-09 16:53 ` Andrey Ryabinin
@ 2016-11-09 18:19   ` Dmitry Vyukov
  0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Vyukov @ 2016-11-09 18:19 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Mark Rutland, Andy Lutomirski, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel

On Wed, Nov 9, 2016 at 8:53 AM, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> On 11/08/2016 10:03 PM, Mark Rutland wrote:
>> Hi,
>>
>> I see a while back [1] there was a discussion of what to do about KASAN
>> and vmapped stacks, but it doesn't look like that was solved, judging by
>> the vmapped stacks pull [2] for v4.9.
>>
>> I wondered whether anyone had looked at that since?
>>
>
> Sadly, but I didn't have much time for this yet, so it's in an initial state.
>
>> I have an additional reason to want to dynamically allocate the vmalloc
>> area shadow: it turns out that KASAN currently interacts rather poorly
>> with the arm64 ptdump code.
>>
>> When KASAN is selected, we allocate shadow for the whole vmalloc area,
>> using common zero pte, pmd, pud tables. Walking over these in the ptdump
>> code takes a *very* long time (I've seen up to 15 minutes with
>> KASAN_OUTLINE enabled). For DEBUG_WX [3], this means boot hangs for that
>> long, too.
>>
>
> I didn't look at any code, but we probably could can remember last visited pgd
> and skip next pgd if it's the same as previous.

Good idea.

> Alternatively - just skip kasan_zero_p*d in ptdump walker.

Mark have concern with the fact that we hide the mapping from the
walker this way. But your above idea with deduping pgd's during walk
both fast and doesn't hide anything from walker.



>> If I don't allocate vmalloc shadow (and remove the apparently pointlesss
>> shadow of the shadow area), and only allocate shadow for the image,
>> fixmap, vmemmap and so on, that delay gets cut to a few seconds, which
>> is tolerable for a debug configuration...
>>
>> ... however, things blow up when the kernel touches vmalloc'd memory for
>> the first time, as we don't install shadow for that dynamically.
>>
>> Thanks,
>> Mark.
>>
>> [1] https://lkml.kernel.org/r/CALCETrWucrYp+yq8RHSDqf93xtg793duByirurzJbLRhrz=tcA@mail.gmail.com
>> [2] https://lkml.kernel.org/r/20161003092940.GA691@gmail.com
>> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-October/464191.html
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-09 18:16     ` Dmitry Vyukov
@ 2016-11-09 18:30       ` Mark Rutland
  2016-11-09 18:42         ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2016-11-09 18:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andy Lutomirski, Andrey Ryabinin, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel, kasan-dev

On Wed, Nov 09, 2016 at 10:16:03AM -0800, Dmitry Vyukov wrote:
> On Wed, Nov 9, 2016 at 2:56 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Nov 08, 2016 at 02:09:27PM -0800, Dmitry Vyukov wrote:
> >> On Tue, Nov 8, 2016 at 11:03 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> >> I've seen the same iteration slowness problem on x86 with
> >> CONFIG_DEBUG_RODATA which walks all pages. The is about 1 minute, but
> >> it is enough to trigger rcu stall warning.
> >
> > Interesting; do you know where that happens? I can't spot any obvious
> > case where we'd have to walk all the page tables for DEBUG_RODATA.
> 
> As far as I remember it was this path:
> 
> mark_readonly in main.c -> mark_rodata_ro -> debug_checkwx ->
> ptdump_walk_pgd_level_checkwx -> ptdump_walk_pgd_level_core.

Ah, that's x86's equivalent DEBUG_WX checks.

> >> The zero pud and vmalloc-ed stacks looks like different problems.
> >> To overcome the slowness we could map zero shadow for vmalloc area lazily.
> >> However for vmalloc-ed stacks we need to map actual memory, because
> >> stack instrumentation will read/write into the shadow.
> >
> > Sure. The point I was trying to make is that there' be fewer page tables
> > to walk (unless the vmalloc area was exhausted), assuming we also lazily
> > mapped the common zero shadow for the vmalloc area.
> >
> >> One downside here is that vmalloc shadow can be as large as 1:1 (if we
> >> allocate 1 page in vmalloc area we need to allocate 1 page for
> >> shadow).
> >
> > I thought per prior discussion we'd only need to allocate new pages for
> > the stacks in the vmalloc region, and we could re-use the zero pages?
> 
> We can't reuse zero ro pages for stacks, because stack instrumentation
> writes to stack shadow.

Sorry, I'd meant we'd use the zero pages for everything else but stacks.
I understand we'd have to allocate real shadow for the stacks.

> When we have a large continuous range of memory, shadow for it is
> 1/8th. However, if we have a separate page, we will need to map whole
> page of shadow for it, i.e. 1:1 shadow overhead.

Sure, but for everything but stacks we can re-use the same zero pages,
no?

For everything else, the cost would be dominated by the page tables for
the shadow.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN & the vmalloc area
  2016-11-09 18:30       ` Mark Rutland
@ 2016-11-09 18:42         ` Dmitry Vyukov
  0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Vyukov @ 2016-11-09 18:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andy Lutomirski, Andrey Ryabinin, Laura Abbott, Ard Biesheuvel,
	LKML, linux-arm-kernel, kasan-dev

On Wed, Nov 9, 2016 at 10:30 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> I've seen the same iteration slowness problem on x86 with
>> >> CONFIG_DEBUG_RODATA which walks all pages. The is about 1 minute, but
>> >> it is enough to trigger rcu stall warning.
>> >
>> > Interesting; do you know where that happens? I can't spot any obvious
>> > case where we'd have to walk all the page tables for DEBUG_RODATA.
>>
>> As far as I remember it was this path:
>>
>> mark_readonly in main.c -> mark_rodata_ro -> debug_checkwx ->
>> ptdump_walk_pgd_level_checkwx -> ptdump_walk_pgd_level_core.
>
> Ah, that's x86's equivalent DEBUG_WX checks.
>
>> >> The zero pud and vmalloc-ed stacks looks like different problems.
>> >> To overcome the slowness we could map zero shadow for vmalloc area lazily.
>> >> However for vmalloc-ed stacks we need to map actual memory, because
>> >> stack instrumentation will read/write into the shadow.
>> >
>> > Sure. The point I was trying to make is that there' be fewer page tables
>> > to walk (unless the vmalloc area was exhausted), assuming we also lazily
>> > mapped the common zero shadow for the vmalloc area.
>> >
>> >> One downside here is that vmalloc shadow can be as large as 1:1 (if we
>> >> allocate 1 page in vmalloc area we need to allocate 1 page for
>> >> shadow).
>> >
>> > I thought per prior discussion we'd only need to allocate new pages for
>> > the stacks in the vmalloc region, and we could re-use the zero pages?
>>
>> We can't reuse zero ro pages for stacks, because stack instrumentation
>> writes to stack shadow.
>
> Sorry, I'd meant we'd use the zero pages for everything else but stacks.
> I understand we'd have to allocate real shadow for the stacks.
>
>> When we have a large continuous range of memory, shadow for it is
>> 1/8th. However, if we have a separate page, we will need to map whole
>> page of shadow for it, i.e. 1:1 shadow overhead.
>
> Sure, but for everything but stacks we can re-use the same zero pages,
> no?
>
> For everything else, the cost would be dominated by the page tables for
> the shadow.


Can we estimate the memory overhead?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-09 18:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-08 19:03 KASAN & the vmalloc area Mark Rutland
2016-11-08 22:09 ` Dmitry Vyukov
2016-11-09 10:56   ` Mark Rutland
2016-11-09 18:16     ` Dmitry Vyukov
2016-11-09 18:30       ` Mark Rutland
2016-11-09 18:42         ` Dmitry Vyukov
2016-11-09 16:53 ` Andrey Ryabinin
2016-11-09 18:19   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).