All of lore.kernel.org
 help / color / mirror / Atom feed
* revisit arm64 per-task stack canaries
@ 2018-02-13 12:36 Ard Biesheuvel
  2018-02-13 12:52 ` Mark Rutland
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 12:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

I guess my most recent emails kind of got lost in the avalanche of
Meltdown/Spectre related activities.

Now that the dust seems to have settled a bit, may I kindly bring this
topic to your attention again?

In summary, the default aarch64 way of using a single value for the
stack canary for all threads sharing an address space severely limits
the kernel's ability to implement stack canaries in a meaningful way.

Originally, we assumed that the only way to overcome this was to
implement per-CPU stack canaries, where each CPU loads the stack
canary of the task it executes at context switch. This is racy and
cumbersome in the presence of kernel support of VHE, which means the
per-CPU thread ID register is not fixed at compile time.

Instead, I have proposed a proof of concept [0] where the compiler
emits an instruction sequence that loads the canary directly from the
task struct, which is the per-thread data structure maintained by the
kernel. Accessing that can be done safely without any of the
limitations per-CPU variables have. The task struct pointer is kept in
system register sp_el0 while running in the kernel.

The purpose of this email to reach agreement between the various
stakeholders (mainly the arm64 linux maintainers and the ARM GCC
maintainers) on a way to proceed with implementing this in GCC.

So please, do share your opinions and questions on this matter, so
that we can settle this matter asap.

Kind regards,
Ard.


[0] http://www.workofard.com/2018/01/per-task-stack-canaries-for-arm64/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-02-13 12:36 revisit arm64 per-task stack canaries Ard Biesheuvel
@ 2018-02-13 12:52 ` Mark Rutland
  2018-02-13 13:56   ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2018-02-13 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
> Hello all,

Hi,

> In summary, the default aarch64 way of using a single value for the
> stack canary for all threads sharing an address space severely limits
> the kernel's ability to implement stack canaries in a meaningful way.
> 
> Originally, we assumed that the only way to overcome this was to
> implement per-CPU stack canaries, where each CPU loads the stack
> canary of the task it executes at context switch. This is racy and
> cumbersome in the presence of kernel support of VHE, which means the
> per-CPU thread ID register is not fixed at compile time.
> 
> Instead, I have proposed a proof of concept [0] where the compiler
> emits an instruction sequence that loads the canary directly from the
> task struct, which is the per-thread data structure maintained by the
> kernel. Accessing that can be done safely without any of the
> limitations per-CPU variables have. The task struct pointer is kept in
> system register sp_el0 while running in the kernel.

My major concern here is being tied to using sysregs in a particular
way. We might want to fiddle with that in future (e.g. using the
platform register as an optimization, or switching to a different sysreg
due to architectural extensions).

> The purpose of this email to reach agreement between the various
> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
> maintainers) on a way to proceed with implementing this in GCC.

Would it be possible to have an always inline function to get the
canary, which GCC would implicitly fold into functions as necessary?

... then we could have something in a header, like:

static __always_inline unsigned long get_task_canary(void)
{
	return current->canary;
}

... which we could change in future as needs change.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-02-13 12:52 ` Mark Rutland
@ 2018-02-13 13:56   ` Ard Biesheuvel
  2018-02-13 18:04     ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 13:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 February 2018 at 12:52, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
>> Hello all,
>
> Hi,
>
>> In summary, the default aarch64 way of using a single value for the
>> stack canary for all threads sharing an address space severely limits
>> the kernel's ability to implement stack canaries in a meaningful way.
>>
>> Originally, we assumed that the only way to overcome this was to
>> implement per-CPU stack canaries, where each CPU loads the stack
>> canary of the task it executes at context switch. This is racy and
>> cumbersome in the presence of kernel support of VHE, which means the
>> per-CPU thread ID register is not fixed at compile time.
>>
>> Instead, I have proposed a proof of concept [0] where the compiler
>> emits an instruction sequence that loads the canary directly from the
>> task struct, which is the per-thread data structure maintained by the
>> kernel. Accessing that can be done safely without any of the
>> limitations per-CPU variables have. The task struct pointer is kept in
>> system register sp_el0 while running in the kernel.
>
> My major concern here is being tied to using sysregs in a particular
> way. We might want to fiddle with that in future (e.g. using the
> platform register as an optimization, or switching to a different sysreg
> due to architectural extensions).
>

Yes. Also, there is a disparity between userland (using tpidr_el0) and
kernel (using sp_el0), and perhaps it would make more sense to switch
to tpidr_el0 in the kernel as well. But the general objection remains.

>> The purpose of this email to reach agreement between the various
>> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
>> maintainers) on a way to proceed with implementing this in GCC.
>
> Would it be possible to have an always inline function to get the
> canary, which GCC would implicitly fold into functions as necessary?
>
> ... then we could have something in a header, like:
>
> static __always_inline unsigned long get_task_canary(void)
> {
>         return current->canary;
> }
>
> ... which we could change in future as needs change.
>

This is a question to the compiler folks, I suppose, but I'd venture a
guess that this is rather hard. Perhaps a true function call would be
better if it is done in a way that can be optimized by LTO (this is of
course assuming that by GCC 9, this is something we are likely to use
in the kernel)

An alternative could be to decide to rely on a GCC plugin instead
(although this would not be my preference). My poc implementation is a
bit clunky, but I did not spend a lot of time on it. If we can refine
it to replace the high/lo ref to __stack_chk_guard with something more
robust, then we remain in control of which register and/or symbol ref
we use and we don't paint ourselves into a corner.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-02-13 13:56   ` Ard Biesheuvel
@ 2018-02-13 18:04     ` Will Deacon
  2018-10-12 10:37       ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2018-02-13 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Tue, Feb 13, 2018 at 01:56:49PM +0000, Ard Biesheuvel wrote:
> On 13 February 2018 at 12:52, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
> >> Instead, I have proposed a proof of concept [0] where the compiler
> >> emits an instruction sequence that loads the canary directly from the
> >> task struct, which is the per-thread data structure maintained by the
> >> kernel. Accessing that can be done safely without any of the
> >> limitations per-CPU variables have. The task struct pointer is kept in
> >> system register sp_el0 while running in the kernel.
> >
> > My major concern here is being tied to using sysregs in a particular
> > way. We might want to fiddle with that in future (e.g. using the
> > platform register as an optimization, or switching to a different sysreg
> > due to architectural extensions).
> >
> 
> Yes. Also, there is a disparity between userland (using tpidr_el0) and
> kernel (using sp_el0), and perhaps it would make more sense to switch
> to tpidr_el0 in the kernel as well. But the general objection remains.

Indeed, and I share Mark's view that we don't want to commit to a specific
sequence here. Ideally, we'd have a way to pass whatever thunk we need to
the compiler and have the freedom to implement it as we see fit (and to
change that implementation at a whim).

> >> The purpose of this email to reach agreement between the various
> >> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
> >> maintainers) on a way to proceed with implementing this in GCC.
> >
> > Would it be possible to have an always inline function to get the
> > canary, which GCC would implicitly fold into functions as necessary?
> >
> > ... then we could have something in a header, like:
> >
> > static __always_inline unsigned long get_task_canary(void)
> > {
> >         return current->canary;
> > }
> >
> > ... which we could change in future as needs change.
> >
> 
> This is a question to the compiler folks, I suppose, but I'd venture a
> guess that this is rather hard. Perhaps a true function call would be
> better if it is done in a way that can be optimized by LTO (this is of
> course assuming that by GCC 9, this is something we are likely to use
> in the kernel)
> 
> An alternative could be to decide to rely on a GCC plugin instead
> (although this would not be my preference). My poc implementation is a
> bit clunky, but I did not spend a lot of time on it. If we can refine
> it to replace the high/lo ref to __stack_chk_guard with something more
> robust, then we remain in control of which register and/or symbol ref
> we use and we don't paint ourselves into a corner.

I'm wary of our ability to maintain a GCC plugin in the kernel source tree.
I would *much* prefer to have proper support in the compiler.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-02-13 18:04     ` Will Deacon
@ 2018-10-12 10:37       ` Ard Biesheuvel
  2018-10-15 21:47         ` Kees Cook
  2018-10-30 10:35         ` Will Deacon
  0 siblings, 2 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2018-10-12 10:37 UTC (permalink / raw)
  To: linux-arm-kernel

(+ Arnd)

On 13 February 2018 at 19:04, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> On Tue, Feb 13, 2018 at 01:56:49PM +0000, Ard Biesheuvel wrote:
>> On 13 February 2018 at 12:52, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
>> >> Instead, I have proposed a proof of concept [0] where the compiler
>> >> emits an instruction sequence that loads the canary directly from the
>> >> task struct, which is the per-thread data structure maintained by the
>> >> kernel. Accessing that can be done safely without any of the
>> >> limitations per-CPU variables have. The task struct pointer is kept in
>> >> system register sp_el0 while running in the kernel.
>> >
>> > My major concern here is being tied to using sysregs in a particular
>> > way. We might want to fiddle with that in future (e.g. using the
>> > platform register as an optimization, or switching to a different sysreg
>> > due to architectural extensions).
>> >
>>
>> Yes. Also, there is a disparity between userland (using tpidr_el0) and
>> kernel (using sp_el0), and perhaps it would make more sense to switch
>> to tpidr_el0 in the kernel as well. But the general objection remains.
>
> Indeed, and I share Mark's view that we don't want to commit to a specific
> sequence here. Ideally, we'd have a way to pass whatever thunk we need to
> the compiler and have the freedom to implement it as we see fit (and to
> change that implementation at a whim).
>
>> >> The purpose of this email to reach agreement between the various
>> >> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
>> >> maintainers) on a way to proceed with implementing this in GCC.
>> >
>> > Would it be possible to have an always inline function to get the
>> > canary, which GCC would implicitly fold into functions as necessary?
>> >
>> > ... then we could have something in a header, like:
>> >
>> > static __always_inline unsigned long get_task_canary(void)
>> > {
>> >         return current->canary;
>> > }
>> >
>> > ... which we could change in future as needs change.
>> >
>>
>> This is a question to the compiler folks, I suppose, but I'd venture a
>> guess that this is rather hard. Perhaps a true function call would be
>> better if it is done in a way that can be optimized by LTO (this is of
>> course assuming that by GCC 9, this is something we are likely to use
>> in the kernel)
>>
>> An alternative could be to decide to rely on a GCC plugin instead
>> (although this would not be my preference). My poc implementation is a
>> bit clunky, but I did not spend a lot of time on it. If we can refine
>> it to replace the high/lo ref to __stack_chk_guard with something more
>> robust, then we remain in control of which register and/or symbol ref
>> we use and we don't paint ourselves into a corner.
>
> I'm wary of our ability to maintain a GCC plugin in the kernel source tree.
> I would *much* prefer to have proper support in the compiler.
>

Hello all,

Ramana, Arnd, Mark and I spoke about this again at Linaro Connect in
Vancouver, and Ramana is still volunteering to work on the GCC side
changes if we can agree on a way to parameterize this.

To recap, my proof of concept was based on using the task struct
pointer directly (which we keep in sp_el0 in the kernel). That way, we
don't care about per-CPU vars and preemption etc, since the stack
canary value is tied to the task anyway, regardless on which CPU it
executes.

So my suggestion was to turn this (the current sequence emitted by the compiler)

adrp    x0, __stack__chk_guard
ldr     x0, [x0, :lo12:__stack_chk_guard]

into this

mrs     x0, <sysreg>
ldr     x0, [x0, :lo12:<offset>]

where 'sysreg' and 'offset' are the names of 1) a system register and
2) an absolute ELF symbol, respectively.

At Ramana's request (and with Arnd's help) I looked into whether we
can provide offsetof(struct task_struct, stack_canary) as a bare
number on the GCC command line instead, but this is turning out to be
rather cumbersome, given that we need to run the compiler to find the
offset in the first place. (The most promising approach would be to
generate a file and include it with @filename so its contents are
parsed as command line options but even that is rather fiddly to add
into our build system)

My approach of exposing the offset via the linker script works fine,
and we wouldn't even need a separate GCC command line option in that
case: we could just settle on __stack_chk_guard_sysreg_offset and
stipulate that it is exposed as an absolute ELF symbol in the range
[0..1023] when -mstack-protector-guard=<sysreg> is passed to GCC.

Ramana, could you elaborate on your preference for a bare number on
the command line? Do you see any alternatives?

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-10-12 10:37       ` Ard Biesheuvel
@ 2018-10-15 21:47         ` Kees Cook
  2018-10-30 10:35         ` Will Deacon
  1 sibling, 0 replies; 13+ messages in thread
From: Kees Cook @ 2018-10-15 21:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 12, 2018 at 3:37 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> (+ Arnd)
>
> On 13 February 2018 at 19:04, Will Deacon <will.deacon@arm.com> wrote:
>> Hi Ard,
>>
>> On Tue, Feb 13, 2018 at 01:56:49PM +0000, Ard Biesheuvel wrote:
>>> On 13 February 2018 at 12:52, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
>>> >> Instead, I have proposed a proof of concept [0] where the compiler
>>> >> emits an instruction sequence that loads the canary directly from the
>>> >> task struct, which is the per-thread data structure maintained by the
>>> >> kernel. Accessing that can be done safely without any of the
>>> >> limitations per-CPU variables have. The task struct pointer is kept in
>>> >> system register sp_el0 while running in the kernel.
>>> >
>>> > My major concern here is being tied to using sysregs in a particular
>>> > way. We might want to fiddle with that in future (e.g. using the
>>> > platform register as an optimization, or switching to a different sysreg
>>> > due to architectural extensions).
>>> >
>>>
>>> Yes. Also, there is a disparity between userland (using tpidr_el0) and
>>> kernel (using sp_el0), and perhaps it would make more sense to switch
>>> to tpidr_el0 in the kernel as well. But the general objection remains.
>>
>> Indeed, and I share Mark's view that we don't want to commit to a specific
>> sequence here. Ideally, we'd have a way to pass whatever thunk we need to
>> the compiler and have the freedom to implement it as we see fit (and to
>> change that implementation at a whim).
>>
>>> >> The purpose of this email to reach agreement between the various
>>> >> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
>>> >> maintainers) on a way to proceed with implementing this in GCC.
>>> >
>>> > Would it be possible to have an always inline function to get the
>>> > canary, which GCC would implicitly fold into functions as necessary?
>>> >
>>> > ... then we could have something in a header, like:
>>> >
>>> > static __always_inline unsigned long get_task_canary(void)
>>> > {
>>> >         return current->canary;
>>> > }
>>> >
>>> > ... which we could change in future as needs change.
>>> >
>>>
>>> This is a question to the compiler folks, I suppose, but I'd venture a
>>> guess that this is rather hard. Perhaps a true function call would be
>>> better if it is done in a way that can be optimized by LTO (this is of
>>> course assuming that by GCC 9, this is something we are likely to use
>>> in the kernel)
>>>
>>> An alternative could be to decide to rely on a GCC plugin instead
>>> (although this would not be my preference). My poc implementation is a
>>> bit clunky, but I did not spend a lot of time on it. If we can refine
>>> it to replace the high/lo ref to __stack_chk_guard with something more
>>> robust, then we remain in control of which register and/or symbol ref
>>> we use and we don't paint ourselves into a corner.
>>
>> I'm wary of our ability to maintain a GCC plugin in the kernel source tree.
>> I would *much* prefer to have proper support in the compiler.
>>
>
> Hello all,
>
> Ramana, Arnd, Mark and I spoke about this again at Linaro Connect in
> Vancouver, and Ramana is still volunteering to work on the GCC side
> changes if we can agree on a way to parameterize this.
>
> To recap, my proof of concept was based on using the task struct
> pointer directly (which we keep in sp_el0 in the kernel). That way, we
> don't care about per-CPU vars and preemption etc, since the stack
> canary value is tied to the task anyway, regardless on which CPU it
> executes.
>
> So my suggestion was to turn this (the current sequence emitted by the compiler)
>
> adrp    x0, __stack__chk_guard
> ldr     x0, [x0, :lo12:__stack_chk_guard]
>
> into this
>
> mrs     x0, <sysreg>
> ldr     x0, [x0, :lo12:<offset>]
>
> where 'sysreg' and 'offset' are the names of 1) a system register and
> 2) an absolute ELF symbol, respectively.
>
> At Ramana's request (and with Arnd's help) I looked into whether we
> can provide offsetof(struct task_struct, stack_canary) as a bare
> number on the GCC command line instead, but this is turning out to be
> rather cumbersome, given that we need to run the compiler to find the
> offset in the first place. (The most promising approach would be to
> generate a file and include it with @filename so its contents are
> parsed as command line options but even that is rather fiddly to add
> into our build system)
>
> My approach of exposing the offset via the linker script works fine,
> and we wouldn't even need a separate GCC command line option in that
> case: we could just settle on __stack_chk_guard_sysreg_offset and
> stipulate that it is exposed as an absolute ELF symbol in the range
> [0..1023] when -mstack-protector-guard=<sysreg> is passed to GCC.
>
> Ramana, could you elaborate on your preference for a bare number on
> the command line? Do you see any alternatives?

I would continue to _love_ to have this all working. :) Getting a
per-thread canary would make me so happy! Please let me know if I can
help in some way.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-10-12 10:37       ` Ard Biesheuvel
  2018-10-15 21:47         ` Kees Cook
@ 2018-10-30 10:35         ` Will Deacon
  2018-10-30 12:45           ` Ard Biesheuvel
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2018-10-30 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 12, 2018 at 12:37:46PM +0200, Ard Biesheuvel wrote:
> To recap, my proof of concept was based on using the task struct
> pointer directly (which we keep in sp_el0 in the kernel). That way, we
> don't care about per-CPU vars and preemption etc, since the stack
> canary value is tied to the task anyway, regardless on which CPU it
> executes.
> 
> So my suggestion was to turn this (the current sequence emitted by the compiler)
> 
> adrp    x0, __stack__chk_guard
> ldr     x0, [x0, :lo12:__stack_chk_guard]
> 
> into this
> 
> mrs     x0, <sysreg>
> ldr     x0, [x0, :lo12:<offset>]
> 
> where 'sysreg' and 'offset' are the names of 1) a system register and
> 2) an absolute ELF symbol, respectively.
> 
> At Ramana's request (and with Arnd's help) I looked into whether we
> can provide offsetof(struct task_struct, stack_canary) as a bare
> number on the GCC command line instead, but this is turning out to be
> rather cumbersome, given that we need to run the compiler to find the
> offset in the first place. (The most promising approach would be to
> generate a file and include it with @filename so its contents are
> parsed as command line options but even that is rather fiddly to add
> into our build system)
> 
> My approach of exposing the offset via the linker script works fine,
> and we wouldn't even need a separate GCC command line option in that
> case: we could just settle on __stack_chk_guard_sysreg_offset and
> stipulate that it is exposed as an absolute ELF symbol in the range
> [0..1023] when -mstack-protector-guard=<sysreg> is passed to GCC.

That sounds like a sensible proposal to me. I'm assuming that the sysreg
is specified on the compiler command line as well?

> Ramana, could you elaborate on your preference for a bare number on
> the command line? Do you see any alternatives?

With CONFIG_THREAD_INFO_IN_TASK I suppose we could bodge things so that
this constant is also zero, but it seems to me that all we gain is
fragility when compared to exposing a named symbol.

Ramana -- was this just a cosmetic preference, or are there good technical
reasons to avoid the new symbol (which isn't an awful lot different to what
we currently use).

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-10-30 10:35         ` Will Deacon
@ 2018-10-30 12:45           ` Ard Biesheuvel
  2018-11-19 16:50             ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2018-10-30 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 October 2018 at 07:35, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Oct 12, 2018 at 12:37:46PM +0200, Ard Biesheuvel wrote:
>> To recap, my proof of concept was based on using the task struct
>> pointer directly (which we keep in sp_el0 in the kernel). That way, we
>> don't care about per-CPU vars and preemption etc, since the stack
>> canary value is tied to the task anyway, regardless on which CPU it
>> executes.
>>
>> So my suggestion was to turn this (the current sequence emitted by the compiler)
>>
>> adrp    x0, __stack__chk_guard
>> ldr     x0, [x0, :lo12:__stack_chk_guard]
>>
>> into this
>>
>> mrs     x0, <sysreg>
>> ldr     x0, [x0, :lo12:<offset>]
>>
>> where 'sysreg' and 'offset' are the names of 1) a system register and
>> 2) an absolute ELF symbol, respectively.
>>
>> At Ramana's request (and with Arnd's help) I looked into whether we
>> can provide offsetof(struct task_struct, stack_canary) as a bare
>> number on the GCC command line instead, but this is turning out to be
>> rather cumbersome, given that we need to run the compiler to find the
>> offset in the first place. (The most promising approach would be to
>> generate a file and include it with @filename so its contents are
>> parsed as command line options but even that is rather fiddly to add
>> into our build system)
>>
>> My approach of exposing the offset via the linker script works fine,
>> and we wouldn't even need a separate GCC command line option in that
>> case: we could just settle on __stack_chk_guard_sysreg_offset and
>> stipulate that it is exposed as an absolute ELF symbol in the range
>> [0..1023] when -mstack-protector-guard=<sysreg> is passed to GCC.
>
> That sounds like a sensible proposal to me. I'm assuming that the sysreg
> is specified on the compiler command line as well?
>

Yes, hence the -mstack-protector-guard=<sysreg>  :-)

>> Ramana, could you elaborate on your preference for a bare number on
>> the command line? Do you see any alternatives?
>
> With CONFIG_THREAD_INFO_IN_TASK I suppose we could bodge things so that
> this constant is also zero, but it seems to me that all we gain is
> fragility when compared to exposing a named symbol.
>

Yeah, I looked into that as well: we could copy the stack_canary
member into our thread_info struct, but it is indeed a bodge.

> Ramana -- was this just a cosmetic preference, or are there good technical
> reasons to avoid the new symbol (which isn't an awful lot different to what
> we currently use).
>

I suppose absolute ELF symbols are somewhat unusual, but they are
clearly defined by the ELF spec and we already use them in other
places as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-10-30 12:45           ` Ard Biesheuvel
@ 2018-11-19 16:50             ` Ramana Radhakrishnan
  2018-11-19 17:04               ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2018-11-19 16:50 UTC (permalink / raw)
  To: linux-arm-kernel




> Ramana -- was this just a cosmetic preference, or are there good 
> technical reasons to avoid the new symbol (which isn't an awful lot 
> different to what we currently use).	
>

I couldn't immediately  remember why I preferred the offset on the 
command line in the conversations those months ago about this. But then ...

After talking a bit with Ard at LPC and then thinking about it while 
hacking up a prototype implementation, I suspect the main reason would 
have been to avoid the standalone relocation in the first place and just 
deal with it as an immediate.

I don't work enough in the kernel but a simple git grep 
mstack-protector-guard-offset in the kernel tree I have checked out did 
find this.

arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS += 
-mstack-protector-guard-offset=$(shell awk '{if ($$2 == "PACA_CANARY") 
print $$3;}' include/generated/asm-offsets.h))
arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS += 
-mstack-protector-guard-offset=$(shell awk '{if ($$2 == "TASK_CANARY") 
print $$3;}' include/generated/asm-offsets.h)

Is there any reason why the same trick wouldn't work for aarch64 ?

I've been hacking up a prototype that seems to work, need to make it 
clean and ready to go upstream on the GCC lists.


regards
Ramana

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-11-19 16:50             ` Ramana Radhakrishnan
@ 2018-11-19 17:04               ` Ard Biesheuvel
  2018-11-19 19:47                 ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2018-11-19 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 19 Nov 2018 at 08:50, Ramana Radhakrishnan
<Ramana.Radhakrishnan@arm.com> wrote:
>
>
>
>
> > Ramana -- was this just a cosmetic preference, or are there good
> > technical reasons to avoid the new symbol (which isn't an awful lot
> > different to what we currently use).
> >
>
> I couldn't immediately  remember why I preferred the offset on the
> command line in the conversations those months ago about this. But then ...
>
> After talking a bit with Ard at LPC and then thinking about it while
> hacking up a prototype implementation, I suspect the main reason would
> have been to avoid the standalone relocation in the first place and just
> deal with it as an immediate.
>
> I don't work enough in the kernel but a simple git grep
> mstack-protector-guard-offset in the kernel tree I have checked out did
> find this.
>
> arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS +=
> -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "PACA_CANARY")
> print $$3;}' include/generated/asm-offsets.h))
> arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS +=
> -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "TASK_CANARY")
> print $$3;}' include/generated/asm-offsets.h)
>
> Is there any reason why the same trick wouldn't work for aarch64 ?
>

What a wonderful hack :-) From a quick test, it looks like this should
work on AArch64 as well, yes. I think it makes sense not to deviate
from other architectures, both on the kernel and on the GCC side. And
as I pointed out the other day, the relocation presents some
complications of its own (due to the fact that we cannot easily export
absolute symbols to kernel modules now that we started using relative
references to refer to them internally in the core kernel)

> I've been hacking up a prototype that seems to work, need to make it
> clean and ready to go upstream on the GCC lists.
>

That's great - thanks. If you send me a source link, I can give it a
spin with my kernel changes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* revisit arm64 per-task stack canaries
  2018-11-19 17:04               ` Ard Biesheuvel
@ 2018-11-19 19:47                 ` Ard Biesheuvel
  2018-12-03 10:16                   ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2018-11-19 19:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 19 Nov 2018 at 09:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On Mon, 19 Nov 2018 at 08:50, Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com> wrote:
> >
> >
> >
> >
> > > Ramana -- was this just a cosmetic preference, or are there good
> > > technical reasons to avoid the new symbol (which isn't an awful lot
> > > different to what we currently use).
> > >
> >
> > I couldn't immediately  remember why I preferred the offset on the
> > command line in the conversations those months ago about this. But then ...
> >
> > After talking a bit with Ard at LPC and then thinking about it while
> > hacking up a prototype implementation, I suspect the main reason would
> > have been to avoid the standalone relocation in the first place and just
> > deal with it as an immediate.
> >
> > I don't work enough in the kernel but a simple git grep
> > mstack-protector-guard-offset in the kernel tree I have checked out did
> > find this.
> >
> > arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS +=
> > -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "PACA_CANARY")
> > print $$3;}' include/generated/asm-offsets.h))
> > arch/powerpc/Makefile:  $(eval KBUILD_CFLAGS +=
> > -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "TASK_CANARY")
> > print $$3;}' include/generated/asm-offsets.h)
> >
> > Is there any reason why the same trick wouldn't work for aarch64 ?
> >
>
> What a wonderful hack :-) From a quick test, it looks like this should
> work on AArch64 as well, yes.

Turns out this code is brand new (introduced in v4.20-rc1) which is
why we didn't spot it before.

> I think it makes sense not to deviate
> from other architectures, both on the kernel and on the GCC side. And
> as I pointed out the other day, the relocation presents some
> complications of its own (due to the fact that we cannot easily export
> absolute symbols to kernel modules now that we started using relative
> references to refer to them internally in the core kernel)
>
> > I've been hacking up a prototype that seems to work, need to make it
> > clean and ready to go upstream on the GCC lists.
> >
>
> That's great - thanks. If you send me a source link, I can give it a
> spin with my kernel changes.

I've pushed a branch here that uses the compiler based support

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=rfc-arm64-per-task-ssp

(assuming the options will be called
-mstack-protector-guard-[reg|offset] like on power)

Just building the defconfig with a compiler that implements this
should be sufficient, like below:

export ARCH=arm64
export CROSS_COMPILE=[/path/to/]aarch64-linux-gnu-
make defconfig
make Image modules

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: revisit arm64 per-task stack canaries
  2018-11-19 19:47                 ` Ard Biesheuvel
@ 2018-12-03 10:16                   ` Ramana Radhakrishnan
  2018-12-03 16:46                     ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2018-12-03 10:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, linux-hardened, nd, Kees Cook, Arnd Bergmann,
	Will Deacon, uros, Laura Abbott, linux-arm-kernel


> Just building the defconfig with a compiler that implements this
> should be sufficient, like below:
> 
> export ARCH=arm64
> export CROSS_COMPILE=[/path/to/]aarch64-linux-gnu-
> make defconfig
> make Image modules
> 

You probably need this patch on top of my patchset to GCC this morning.


regards
Ramana


This is a tiny patch to add the -mstack-protector-guard=sysreg to
indicate the type of the stack protector guard to the compiler.

Signed-off-by:  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
---
  arch/arm64/Kconfig  | 2 +-
  arch/arm64/Makefile | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index faab97226dbb..6151327cfa16 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1250,7 +1250,7 @@ config RANDOMIZE_MODULE_REGION_FULL
  config STACKPROTECTOR_PER_TASK
  	def_bool y
  	depends on STACKPROTECTOR
-	depends on $(cc-option,-mstack-protector-guard-reg=sp_el0)
+	depends on 
$(cc-option,-mstack-protector-guard=sysreg,-mstack-protector-guard-reg=sp_el0)

  endmenu

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index e1494ba5d0e7..be60d64afae6 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -63,7 +63,8 @@ stack_protector_prepare: prepare0
  			awk '{if ($$2 == "TSK_STACK_CANARY") print $$3;}' \
  				include/generated/asm-offsets.h))

-KBUILD_CFLAGS	+= -mstack-protector-guard-reg=sp_el0
+KBUILD_CFLAGS	+= -mstack-protector-guard-reg=sp_el0 \
+	-mstack-protector-guard=sysreg
  endif

  ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
-- 
2.17.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: revisit arm64 per-task stack canaries
  2018-12-03 10:16                   ` Ramana Radhakrishnan
@ 2018-12-03 16:46                     ` Ard Biesheuvel
  0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2018-12-03 16:46 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Mark Rutland, linux-hardened, nd, Kees Cook, Arnd Bergmann,
	Will Deacon, uros, Laura Abbott, linux-arm-kernel

On Mon, 3 Dec 2018 at 11:16, Ramana Radhakrishnan
<Ramana.Radhakrishnan@arm.com> wrote:
>
>
> > Just building the defconfig with a compiler that implements this
> > should be sufficient, like below:
> >
> > export ARCH=arm64
> > export CROSS_COMPILE=[/path/to/]aarch64-linux-gnu-
> > make defconfig
> > make Image modules
> >
>
> You probably need this patch on top of my patchset to GCC this morning.
>
>
> regards
> Ramana
>
>
> This is a tiny patch to add the -mstack-protector-guard=sysreg to
> indicate the type of the stack protector guard to the compiler.
>
> Signed-off-by:  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
> ---
>   arch/arm64/Kconfig  | 2 +-
>   arch/arm64/Makefile | 3 ++-
>   2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index faab97226dbb..6151327cfa16 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1250,7 +1250,7 @@ config RANDOMIZE_MODULE_REGION_FULL
>   config STACKPROTECTOR_PER_TASK
>         def_bool y
>         depends on STACKPROTECTOR
> -       depends on $(cc-option,-mstack-protector-guard-reg=sp_el0)
> +       depends on
> $(cc-option,-mstack-protector-guard=sysreg,-mstack-protector-guard-reg=sp_el0)
>
>   endmenu
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index e1494ba5d0e7..be60d64afae6 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -63,7 +63,8 @@ stack_protector_prepare: prepare0
>                         awk '{if ($$2 == "TSK_STACK_CANARY") print $$3;}' \
>                                 include/generated/asm-offsets.h))
>
> -KBUILD_CFLAGS  += -mstack-protector-guard-reg=sp_el0
> +KBUILD_CFLAGS  += -mstack-protector-guard-reg=sp_el0 \
> +       -mstack-protector-guard=sysreg
>   endif
>
>   ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
> --
> 2.17.1

This isn't quite right. I will send out an update, but note that it
depends on the logic as to which arguments may/must appear in which
combination.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-12-03 16:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-13 12:36 revisit arm64 per-task stack canaries Ard Biesheuvel
2018-02-13 12:52 ` Mark Rutland
2018-02-13 13:56   ` Ard Biesheuvel
2018-02-13 18:04     ` Will Deacon
2018-10-12 10:37       ` Ard Biesheuvel
2018-10-15 21:47         ` Kees Cook
2018-10-30 10:35         ` Will Deacon
2018-10-30 12:45           ` Ard Biesheuvel
2018-11-19 16:50             ` Ramana Radhakrishnan
2018-11-19 17:04               ` Ard Biesheuvel
2018-11-19 19:47                 ` Ard Biesheuvel
2018-12-03 10:16                   ` Ramana Radhakrishnan
2018-12-03 16:46                     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.