All of lore.kernel.org
 help / color / mirror / Atom feed
* vmalloced stacks on x86_64?
@ 2014-10-25  0:22 Andy Lutomirski
  2014-10-25  2:38 ` H. Peter Anvin
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-25  0:22 UTC (permalink / raw)
  To: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds

Is there any good reason not to use vmalloc for x86_64 stacks?

The tricky bits I've thought of are:

 - On any context switch, we probably need to probe the new stack
before switching to it.  That way, if it's going to fault due to an
out-of-sync pgd, we still have a stack available to handle the fault.

 - Any time we change cr3, we may need to check that the pgd
corresponding to rsp is there.  If now, we need to sync it over.

 - For simplicity, we probably want all stack ptes to be present all
the time.  This is fine; vmalloc already works that way.

 - If we overrun the stack, we double-fault.  This should be easy to
detect: any double-fault where rsp is less than 20 bytes from the
bottom of the stack is a failure to deliver a non-IST exception due to
 a stack overflow.  The question is: what do we do if this happens?
We could just panic (guaranteed to work).  We could also try to
recover by killing the offending task, but that might be a bit
challenging, since we're in IST context.  We could do something truly
awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
return from the double fault.

Thoughts?  This shouldn't be all that much code.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  0:22 vmalloced stacks on x86_64? Andy Lutomirski
@ 2014-10-25  2:38 ` H. Peter Anvin
  2014-10-25  4:42   ` Andy Lutomirski
  2014-10-26 16:46   ` Eric Dumazet
  2014-10-25  9:15 ` Ingo Molnar
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: H. Peter Anvin @ 2014-10-25  2:38 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML, linux-kernel, Linus Torvalds

On 10/24/2014 05:22 PM, Andy Lutomirski wrote:
> Is there any good reason not to use vmalloc for x86_64 stacks?

Additional TLB pressure if anything else.

Now, on the flipside: what is the *benefit*?

	-hpa


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  2:38 ` H. Peter Anvin
@ 2014-10-25  4:42   ` Andy Lutomirski
  2014-10-26 16:46   ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-25  4:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, linux-kernel, X86 ML

On Oct 24, 2014 7:38 PM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> On 10/24/2014 05:22 PM, Andy Lutomirski wrote:
> > Is there any good reason not to use vmalloc for x86_64 stacks?
>
> Additional TLB pressure if anything else.

I wonder how much this matters.  It certainly helps on context
switches if the new stack is in the same TLB entry.  But, for entries
that use less than one page of stack, I can imagine this making almost
no difference.

>
> Now, on the flipside: what is the *benefit*?

Immediate exception on overflow, and no high order allocation issues.
The former is a nice mitigation against exploits based on overflowing
the stack.

--Andy

>
>         -hpa
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  0:22 vmalloced stacks on x86_64? Andy Lutomirski
  2014-10-25  2:38 ` H. Peter Anvin
@ 2014-10-25  9:15 ` Ingo Molnar
  2014-10-25 16:05   ` Andy Lutomirski
  2014-10-25 22:26 ` Richard Weinberger
  2014-10-26  4:11 ` Frederic Weisbecker
  3 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2014-10-25  9:15 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds


* Andy Lutomirski <luto@amacapital.net> wrote:

> Is there any good reason not to use vmalloc for x86_64 stacks?

In addition to what hpa mentioned, __pa()/__va() on-kstack DMA 
gets tricky, for legacy drivers. (Not sure how many of these are 
left though.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  9:15 ` Ingo Molnar
@ 2014-10-25 16:05   ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-25 16:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, H. Peter Anvin, X86 ML

On Oct 25, 2014 2:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
> > Is there any good reason not to use vmalloc for x86_64 stacks?
>
> In addition to what hpa mentioned, __pa()/__va() on-kstack DMA
> gets tricky, for legacy drivers. (Not sure how many of these are
> left though.)

Hopefully very few.  DMA debugging warns if the driver uses the DMA
API, and if the driver doesn't, then IOMMUs will break it.

virtio-net is an oddball offender.  I have a patch.

--Andy

>
> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  0:22 vmalloced stacks on x86_64? Andy Lutomirski
  2014-10-25  2:38 ` H. Peter Anvin
  2014-10-25  9:15 ` Ingo Molnar
@ 2014-10-25 22:26 ` Richard Weinberger
  2014-10-25 23:16   ` Andy Lutomirski
  2014-10-26  4:11 ` Frederic Weisbecker
  3 siblings, 1 reply; 14+ messages in thread
From: Richard Weinberger @ 2014-10-25 22:26 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds

On Sat, Oct 25, 2014 at 2:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Is there any good reason not to use vmalloc for x86_64 stacks?
>
> The tricky bits I've thought of are:
>
>  - On any context switch, we probably need to probe the new stack
> before switching to it.  That way, if it's going to fault due to an
> out-of-sync pgd, we still have a stack available to handle the fault.
>
>  - Any time we change cr3, we may need to check that the pgd
> corresponding to rsp is there.  If now, we need to sync it over.
>
>  - For simplicity, we probably want all stack ptes to be present all
> the time.  This is fine; vmalloc already works that way.
>
>  - If we overrun the stack, we double-fault.  This should be easy to
> detect: any double-fault where rsp is less than 20 bytes from the
> bottom of the stack is a failure to deliver a non-IST exception due to
>  a stack overflow.  The question is: what do we do if this happens?
> We could just panic (guaranteed to work).  We could also try to
> recover by killing the offending task, but that might be a bit
> challenging, since we're in IST context.  We could do something truly
> awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
> return from the double fault.
>
> Thoughts?  This shouldn't be all that much code.

FWIW, grsecurity has this already.
Maybe we can reuse their GRKERNSEC_KSTACKOVERFLOW feature.
It allocates the kernel stack using vmalloc() and installs guard pages.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25 22:26 ` Richard Weinberger
@ 2014-10-25 23:16   ` Andy Lutomirski
  2014-10-25 23:31     ` Richard Weinberger
  2014-10-26 18:16     ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-25 23:16 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds

On Sat, Oct 25, 2014 at 3:26 PM, Richard Weinberger
<richard.weinberger@gmail.com> wrote:
> On Sat, Oct 25, 2014 at 2:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> Is there any good reason not to use vmalloc for x86_64 stacks?
>>
>> The tricky bits I've thought of are:
>>
>>  - On any context switch, we probably need to probe the new stack
>> before switching to it.  That way, if it's going to fault due to an
>> out-of-sync pgd, we still have a stack available to handle the fault.
>>
>>  - Any time we change cr3, we may need to check that the pgd
>> corresponding to rsp is there.  If now, we need to sync it over.
>>
>>  - For simplicity, we probably want all stack ptes to be present all
>> the time.  This is fine; vmalloc already works that way.
>>
>>  - If we overrun the stack, we double-fault.  This should be easy to
>> detect: any double-fault where rsp is less than 20 bytes from the
>> bottom of the stack is a failure to deliver a non-IST exception due to
>>  a stack overflow.  The question is: what do we do if this happens?
>> We could just panic (guaranteed to work).  We could also try to
>> recover by killing the offending task, but that might be a bit
>> challenging, since we're in IST context.  We could do something truly
>> awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
>> return from the double fault.
>>
>> Thoughts?  This shouldn't be all that much code.
>
> FWIW, grsecurity has this already.
> Maybe we can reuse their GRKERNSEC_KSTACKOVERFLOW feature.
> It allocates the kernel stack using vmalloc() and installs guard pages.
>

On brief inspection, grsecurity isn't actually vmallocing the stack.
It seems to be allocating it the normal way and then vmapping it.
That allows it to modify sg_set_buf to work on stack addresses (sigh).

After each switch_mm, it probes the whole kernel stack.  (This seems
dangerous to me -- if the live stack isn't mapped in the new mm, won't
that double-fault?)  I also see no evidence that it probes the new
stack when switching stacks.  I suspect that it only works because it
gets lucky.

If we're worried about on-stack DMA, we could (by config option or
otherwise) allow DMA on a vmalloced stack, at least through the sg
interfaces.  And we could WARN and fix it :)

--Andy

P.S.  I see what appears to be some of my code in grsec.  I feel
entirely justified in taking good bits of grsec and sticking them in
the upstream kernel.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25 23:16   ` Andy Lutomirski
@ 2014-10-25 23:31     ` Richard Weinberger
  2014-10-26 18:16     ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Weinberger @ 2014-10-25 23:31 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds

Am 26.10.2014 um 01:16 schrieb Andy Lutomirski:
> On Sat, Oct 25, 2014 at 3:26 PM, Richard Weinberger
> <richard.weinberger@gmail.com> wrote:
>> On Sat, Oct 25, 2014 at 2:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> Is there any good reason not to use vmalloc for x86_64 stacks?
>>>
>>> The tricky bits I've thought of are:
>>>
>>>  - On any context switch, we probably need to probe the new stack
>>> before switching to it.  That way, if it's going to fault due to an
>>> out-of-sync pgd, we still have a stack available to handle the fault.
>>>
>>>  - Any time we change cr3, we may need to check that the pgd
>>> corresponding to rsp is there.  If now, we need to sync it over.
>>>
>>>  - For simplicity, we probably want all stack ptes to be present all
>>> the time.  This is fine; vmalloc already works that way.
>>>
>>>  - If we overrun the stack, we double-fault.  This should be easy to
>>> detect: any double-fault where rsp is less than 20 bytes from the
>>> bottom of the stack is a failure to deliver a non-IST exception due to
>>>  a stack overflow.  The question is: what do we do if this happens?
>>> We could just panic (guaranteed to work).  We could also try to
>>> recover by killing the offending task, but that might be a bit
>>> challenging, since we're in IST context.  We could do something truly
>>> awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
>>> return from the double fault.
>>>
>>> Thoughts?  This shouldn't be all that much code.
>>
>> FWIW, grsecurity has this already.
>> Maybe we can reuse their GRKERNSEC_KSTACKOVERFLOW feature.
>> It allocates the kernel stack using vmalloc() and installs guard pages.
>>
> 
> On brief inspection, grsecurity isn't actually vmallocing the stack.
> It seems to be allocating it the normal way and then vmapping it.
> That allows it to modify sg_set_buf to work on stack addresses (sigh).

Oh, you're right. They have changed it. (But not the Kconfig help of course)
Last time I looked they did a vmalloc().
I'm not sure which version of the patch was but I think it was code like that one:
http://www.grsecurity.net/~spender/kstackovf32.diff

Thanks,
//richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  0:22 vmalloced stacks on x86_64? Andy Lutomirski
                   ` (2 preceding siblings ...)
  2014-10-25 22:26 ` Richard Weinberger
@ 2014-10-26  4:11 ` Frederic Weisbecker
  2014-10-26  5:49   ` Andy Lutomirski
  3 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2014-10-26  4:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, X86 ML, linux-kernel, Linus Torvalds,
	Ingo Molnar, Richard Weinberger

2014-10-25 2:22 GMT+02:00 Andy Lutomirski <luto@amacapital.net>:
> Is there any good reason not to use vmalloc for x86_64 stacks?
>
> The tricky bits I've thought of are:
>
>  - On any context switch, we probably need to probe the new stack
> before switching to it.  That way, if it's going to fault due to an
> out-of-sync pgd, we still have a stack available to handle the fault.

Would that prevent from any further fault on a vmalloc'ed kernel
stack? We would need to ensure that pre-faulting, say the first byte,
is enough to sync the whole new stack entirely otherwise we risk
another future fault and some places really aren't safely faulted.

>
>  - Any time we change cr3, we may need to check that the pgd
> corresponding to rsp is there.  If now, we need to sync it over.
>
>  - For simplicity, we probably want all stack ptes to be present all
> the time.  This is fine; vmalloc already works that way.
>
>  - If we overrun the stack, we double-fault.  This should be easy to
> detect: any double-fault where rsp is less than 20 bytes from the
> bottom of the stack is a failure to deliver a non-IST exception due to
>  a stack overflow.  The question is: what do we do if this happens?
> We could just panic (guaranteed to work).  We could also try to
> recover by killing the offending task, but that might be a bit
> challenging, since we're in IST context.  We could do something truly
> awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
> return from the double fault.
>
> Thoughts?  This shouldn't be all that much code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-26  4:11 ` Frederic Weisbecker
@ 2014-10-26  5:49   ` Andy Lutomirski
  2014-10-26 20:29     ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-26  5:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, H. Peter Anvin, X86 ML, Linus Torvalds,
	Richard Weinberger, Ingo Molnar

On Oct 25, 2014 9:11 PM, "Frederic Weisbecker" <fweisbec@gmail.com> wrote:
>
> 2014-10-25 2:22 GMT+02:00 Andy Lutomirski <luto@amacapital.net>:
> > Is there any good reason not to use vmalloc for x86_64 stacks?
> >
> > The tricky bits I've thought of are:
> >
> >  - On any context switch, we probably need to probe the new stack
> > before switching to it.  That way, if it's going to fault due to an
> > out-of-sync pgd, we still have a stack available to handle the fault.
>
> Would that prevent from any further fault on a vmalloc'ed kernel
> stack? We would need to ensure that pre-faulting, say the first byte,
> is enough to sync the whole new stack entirely otherwise we risk
> another future fault and some places really aren't safely faulted.
>

I think so.  The vmalloc faults only happen when the entire top-level
page table entry is missing, and those cover giant swaths of address
space.

I don't know whether the vmalloc code guarantees not to span a pmd
(pud? why couldn't these be called pte0, pte1, pte2, etc.?) boundary.

--Andy

> >
> >  - Any time we change cr3, we may need to check that the pgd
> > corresponding to rsp is there.  If now, we need to sync it over.
> >
> >  - For simplicity, we probably want all stack ptes to be present all
> > the time.  This is fine; vmalloc already works that way.
> >
> >  - If we overrun the stack, we double-fault.  This should be easy to
> > detect: any double-fault where rsp is less than 20 bytes from the
> > bottom of the stack is a failure to deliver a non-IST exception due to
> >  a stack overflow.  The question is: what do we do if this happens?
> > We could just panic (guaranteed to work).  We could also try to
> > recover by killing the offending task, but that might be a bit
> > challenging, since we're in IST context.  We could do something truly
> > awful: increment RSP by a few hundred bytes, point RIP at do_exit, and
> > return from the double fault.
> >
> > Thoughts?  This shouldn't be all that much code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25  2:38 ` H. Peter Anvin
  2014-10-25  4:42   ` Andy Lutomirski
@ 2014-10-26 16:46   ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2014-10-26 16:46 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andy Lutomirski, X86 ML, linux-kernel, Linus Torvalds

On Fri, 2014-10-24 at 19:38 -0700, H. Peter Anvin wrote:
> On 10/24/2014 05:22 PM, Andy Lutomirski wrote:
> > Is there any good reason not to use vmalloc for x86_64 stacks?
> 
> Additional TLB pressure if anything else.

It seems TLB pressure gets less and less interest these days...

Is it still worth trying to reduce it ?

I was wondering for example why 'hashdist' is not cleared if current
host runs a NUMA enabled kernel, but has a single node.

Something like following maybe ?

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7dbe5ec9d9cd08afac13797e2adac291fb703eec..0846ef054b0620a7be0c6f69b1a2f21c78d57d3b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1181,7 +1181,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	hashdist=	[KNL,NUMA] Large hashes allocated during boot
 			are distributed across NUMA nodes.  Defaults on
 			for 64-bit NUMA, off otherwise.
-			Format: 0 | 1 (for off | on)
+			Format: 0 | 1 | 2 (for off | on if NUMA host | on)
 
 	hcl=		[IA-64] SGI's Hardware Graph compatibility layer
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 1a883705a12a8a12410914be93b2ee65807cc423..8aded4c11c8c1cc5778e9ae2b9cd5146070b5b03 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -668,7 +668,8 @@ static int __init dummy_numa_init(void)
 
 	node_set(0, numa_nodes_parsed);
 	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
-
+	if (hashdist == HASHDIST_DEFAULT)
+		hashdist = 0;
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-25 23:16   ` Andy Lutomirski
  2014-10-25 23:31     ` Richard Weinberger
@ 2014-10-26 18:16     ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-10-26 18:16 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Richard Weinberger, H. Peter Anvin, X86 ML, linux-kernel

On Sat, Oct 25, 2014 at 4:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> On brief inspection, grsecurity isn't actually vmallocing the stack.
> It seems to be allocating it the normal way and then vmapping it.
> That allows it to modify sg_set_buf to work on stack addresses (sigh).

Perhaps more importantly, the vmalloc space is a limited resource (at
least on 32-bit), and using vmap probably results in less
fragmentation.

I don't think either is really even an option on 32-bit due to the
limited address space. On 64-bit, maybe a virtually remapped stack
would be ok.

                   Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-26  5:49   ` Andy Lutomirski
@ 2014-10-26 20:29     ` Frederic Weisbecker
  2014-10-27  1:12       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2014-10-26 20:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, H. Peter Anvin, X86 ML, Linus Torvalds,
	Richard Weinberger, Ingo Molnar

On Sat, Oct 25, 2014 at 10:49:25PM -0700, Andy Lutomirski wrote:
> On Oct 25, 2014 9:11 PM, "Frederic Weisbecker" <fweisbec@gmail.com> wrote:
> >
> > 2014-10-25 2:22 GMT+02:00 Andy Lutomirski <luto@amacapital.net>:
> > > Is there any good reason not to use vmalloc for x86_64 stacks?
> > >
> > > The tricky bits I've thought of are:
> > >
> > >  - On any context switch, we probably need to probe the new stack
> > > before switching to it.  That way, if it's going to fault due to an
> > > out-of-sync pgd, we still have a stack available to handle the fault.
> >
> > Would that prevent from any further fault on a vmalloc'ed kernel
> > stack? We would need to ensure that pre-faulting, say the first byte,
> > is enough to sync the whole new stack entirely otherwise we risk
> > another future fault and some places really aren't safely faulted.
> >
> 
> I think so.  The vmalloc faults only happen when the entire top-level
> page table entry is missing, and those cover giant swaths of address
> space.
> 
> I don't know whether the vmalloc code guarantees not to span a pmd
> (pud? why couldn't these be called pte0, pte1, pte2, etc.?) boundary.

So dereferencing stack[0] is probably enough for 8KB worth of stack. I think
we have vmalloc_sync_all() but I heard this only work on x86-64.

Too bad we don't have a universal solution, I have that problem with per cpu allocated
memory faulting at random places. I hit at least two places where it got harmful:
context tracking and perf callchains. We fixed the latter using open-coded per cpu
allocation. I still haven't found a solution for context tracking.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: vmalloced stacks on x86_64?
  2014-10-26 20:29     ` Frederic Weisbecker
@ 2014-10-27  1:12       ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-27  1:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, H. Peter Anvin, X86 ML, Linus Torvalds,
	Richard Weinberger, Ingo Molnar

On Sun, Oct 26, 2014 at 1:29 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Sat, Oct 25, 2014 at 10:49:25PM -0700, Andy Lutomirski wrote:
>> On Oct 25, 2014 9:11 PM, "Frederic Weisbecker" <fweisbec@gmail.com> wrote:
>> >
>> > 2014-10-25 2:22 GMT+02:00 Andy Lutomirski <luto@amacapital.net>:
>> > > Is there any good reason not to use vmalloc for x86_64 stacks?
>> > >
>> > > The tricky bits I've thought of are:
>> > >
>> > >  - On any context switch, we probably need to probe the new stack
>> > > before switching to it.  That way, if it's going to fault due to an
>> > > out-of-sync pgd, we still have a stack available to handle the fault.
>> >
>> > Would that prevent from any further fault on a vmalloc'ed kernel
>> > stack? We would need to ensure that pre-faulting, say the first byte,
>> > is enough to sync the whole new stack entirely otherwise we risk
>> > another future fault and some places really aren't safely faulted.
>> >
>>
>> I think so.  The vmalloc faults only happen when the entire top-level
>> page table entry is missing, and those cover giant swaths of address
>> space.
>>
>> I don't know whether the vmalloc code guarantees not to span a pmd
>> (pud? why couldn't these be called pte0, pte1, pte2, etc.?) boundary.
>
> So dereferencing stack[0] is probably enough for 8KB worth of stack. I think
> we have vmalloc_sync_all() but I heard this only work on x86-64.
>

I have no desire to do this for 32-bit.  But we don't need
vmalloc_sync_all -- we just need to sync the ony required entry.

> Too bad we don't have a universal solution, I have that problem with per cpu allocated
> memory faulting at random places. I hit at least two places where it got harmful:
> context tracking and perf callchains. We fixed the latter using open-coded per cpu
> allocation. I still haven't found a solution for context tracking.

In principle, we could pre-populate all top-level pgd entries at boot,
but that would cost up to 256 pages of memory, I think.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-10-27  1:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-25  0:22 vmalloced stacks on x86_64? Andy Lutomirski
2014-10-25  2:38 ` H. Peter Anvin
2014-10-25  4:42   ` Andy Lutomirski
2014-10-26 16:46   ` Eric Dumazet
2014-10-25  9:15 ` Ingo Molnar
2014-10-25 16:05   ` Andy Lutomirski
2014-10-25 22:26 ` Richard Weinberger
2014-10-25 23:16   ` Andy Lutomirski
2014-10-25 23:31     ` Richard Weinberger
2014-10-26 18:16     ` Linus Torvalds
2014-10-26  4:11 ` Frederic Weisbecker
2014-10-26  5:49   ` Andy Lutomirski
2014-10-26 20:29     ` Frederic Weisbecker
2014-10-27  1:12       ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.