All of lore.kernel.org
 help / color / mirror / Atom feed
* per-task stack canaries for arm64
@ 2018-01-17 18:24 Ard Biesheuvel
  2018-01-17 19:10 ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-17 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is a followup to a discussion I had with Ramana in San Francisco
5 months ago. Apologies for the tardiness.

The topic of the discussion was compiler support for per-task stack
cookies in the arm64 kernel. From the compiler side, this would simply
entail offsetting the address of __stack_chk_guard with value held in
tpidr_el1, so we can make it a per-CPU variable. On the kernel side,
we would need fairly straight-forward plumbing to detect the compiler
support, and switching to a per-CPU variable when supported. Beyond
that, we need to update the per-CPU value at context switch time, and
perhaps some handling of the initial state when per-CPU variables are
initialized.

Ramana indicated at the time that he would be up for adding, e.g.,
-fstack-protector-linux-kernel as a command line option, and add the
contents of tpidr_el1 to every reference of __stack_chk_guard when
set.

Would this be sufficient to implement this from the kernel side? Am I
missing anything here? I am missing the cross-arch context entirely,
so are there things we should take into account and/or learn from?

Comments welcome.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-17 18:24 per-task stack canaries for arm64 Ard Biesheuvel
@ 2018-01-17 19:10 ` Kees Cook
  2018-01-17 20:32   ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2018-01-17 19:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> Hi all,
>
> This is a followup to a discussion I had with Ramana in San Francisco
> 5 months ago. Apologies for the tardiness.

Link to prior discussion, just for anyone following along:
https://lkml.org/lkml/2017/6/27/227

> The topic of the discussion was compiler support for per-task stack
> cookies in the arm64 kernel. From the compiler side, this would simply
> entail offsetting the address of __stack_chk_guard with value held in
> tpidr_el1, so we can make it a per-CPU variable.

AIUI, some progress was made on this recently, and is somewhat discussed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490
and spawned this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
which was done for x86 only, and provides both:
  -mstack-protector-guard-symbol=...
  -mstack-protector-guard-reg=...

If this could be extended to arm64, I think we'd be in good shape (and
it could be trivially detected at build time).

> On the kernel side,
> we would need fairly straight-forward plumbing to detect the compiler
> support, and switching to a per-CPU variable when supported. Beyond
> that, we need to update the per-CPU value at context switch time, and
> perhaps some handling of the initial state when per-CPU variables are
> initialized.

Right. My mental list has been:

We'll need to adjust how __stack_chk_guard is defined (and tweak
boot_init_stack_canary()).

x86 makes the canary updates in arch/x86/entry/entry_*.S as part of
__switch_to's __switch_to_asm. Looks like arm64 would do it in
arch/arm64/kernel/entry.S cpu_switch_to?

x86 does percpu init in arch/x86/kernel/setup_percpu.c. I'm not sure
where this needs to happen in arm64.

> Ramana indicated at the time that he would be up for adding, e.g.,
> -fstack-protector-linux-kernel as a command line option, and add the
> contents of tpidr_el1 to every reference of __stack_chk_guard when
> set.

I think we want to reuse the command-line names from the x86 options
above, unless there's a good reason not to?

> Would this be sufficient to implement this from the kernel side? Am I
> missing anything here? I am missing the cross-arch context entirely,
> so are there things we should take into account and/or learn from?
>
> Comments welcome.

I think it's very close, yes. Thanks for poking at this!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-17 19:10 ` Kees Cook
@ 2018-01-17 20:32   ` Ard Biesheuvel
  2018-01-17 20:45     ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-17 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> Hi all,
>>
>> This is a followup to a discussion I had with Ramana in San Francisco
>> 5 months ago. Apologies for the tardiness.
>
> Link to prior discussion, just for anyone following along:
> https://lkml.org/lkml/2017/6/27/227
>

Ah yes, I remember now.

We implemented virtually mapped stacks without rearranging the system
registers, and so we still use tpidr_el1 for per-cpu offsets. It does
seem wise to allow for some flexibility in which register to use as
the per-CPU offset, but it doesn't seem likely to me that we'd switch
to a GPR to keep the CPU offset


>> The topic of the discussion was compiler support for per-task stack
>> cookies in the arm64 kernel. From the compiler side, this would simply
>> entail offsetting the address of __stack_chk_guard with value held in
>> tpidr_el1, so we can make it a per-CPU variable.
>
> AIUI, some progress was made on this recently, and is somewhat discussed here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490
> and spawned this:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
> which was done for x86 only, and provides both:
>   -mstack-protector-guard-symbol=...
>   -mstack-protector-guard-reg=...
>
> If this could be extended to arm64, I think we'd be in good shape (and
> it could be trivially detected at build time).
>

I'm not entirely sure what the point is of specifying the name of the
symbol on the command line. It is ultimately up to the GCC developers
to decide how much point there is to maintaining parity with x86 here.

>> On the kernel side,
>> we would need fairly straight-forward plumbing to detect the compiler
>> support, and switching to a per-CPU variable when supported. Beyond
>> that, we need to update the per-CPU value at context switch time, and
>> perhaps some handling of the initial state when per-CPU variables are
>> initialized.
>
> Right. My mental list has been:
>
> We'll need to adjust how __stack_chk_guard is defined (and tweak
> boot_init_stack_canary()).
>
> x86 makes the canary updates in arch/x86/entry/entry_*.S as part of
> __switch_to's __switch_to_asm. Looks like arm64 would do it in
> arch/arm64/kernel/entry.S cpu_switch_to?
>

Yes. SP and the value of the per-cpu var should be updated at the same
time, so that is the only place that makes sense.

> x86 does percpu init in arch/x86/kernel/setup_percpu.c. I'm not sure
> where this needs to happen in arm64.
>

I'm not sure if there's more to it than ensuring that the value in the
percpu template section matches the value for the boot CPU, which
should be the case already unless we update it with a random value at
early boot.

>> Ramana indicated at the time that he would be up for adding, e.g.,
>> -fstack-protector-linux-kernel as a command line option, and add the
>> contents of tpidr_el1 to every reference of __stack_chk_guard when
>> set.
>
> I think we want to reuse the command-line names from the x86 options
> above, unless there's a good reason not to?
>

I'm perfectly happy to settle for whatever the GCC developers manage
to agree on, as long as it gives us the ability to use tpidr_el1 as
the offset.

>> Would this be sufficient to implement this from the kernel side? Am I
>> missing anything here? I am missing the cross-arch context entirely,
>> so are there things we should take into account and/or learn from?
>>
>> Comments welcome.
>
> I think it's very close, yes. Thanks for poking at this!
>
> -Kees
>
> --
> Kees Cook
> Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-17 20:32   ` Ard Biesheuvel
@ 2018-01-17 20:45     ` Kees Cook
  2018-01-18 10:26       ` Ramana Radhakrishnan
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2018-01-17 20:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
> [...]
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
>> which was done for x86 only, and provides both:
>>   -mstack-protector-guard-symbol=...
>>   -mstack-protector-guard-reg=...
>>
>> If this could be extended to arm64, I think we'd be in good shape (and
>> it could be trivially detected at build time).
>>
>
> I'm not entirely sure what the point is of specifying the name of the
> symbol on the command line. It is ultimately up to the GCC developers
> to decide how much point there is to maintaining parity with x86 here.
>
> [...]
>>> Ramana indicated at the time that he would be up for adding, e.g.,
>>> -fstack-protector-linux-kernel as a command line option, and add the
>>> contents of tpidr_el1 to every reference of __stack_chk_guard when
>>> set.
>>
>> I think we want to reuse the command-line names from the x86 options
>> above, unless there's a good reason not to?
>
> I'm perfectly happy to settle for whatever the GCC developers manage
> to agree on, as long as it gives us the ability to use tpidr_el1 as
> the offset.

Ramana, Uro?, what's the best next step? Should we open a GCC bug
specifically for arm64 here?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-17 20:45     ` Kees Cook
@ 2018-01-18 10:26       ` Ramana Radhakrishnan
  2018-01-18 13:19         ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Ramana Radhakrishnan @ 2018-01-18 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/17/18 8:45 PM, Kees Cook wrote:
> On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>> [...]
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
>>> which was done for x86 only, and provides both:
>>>   -mstack-protector-guard-symbol=...
>>>   -mstack-protector-guard-reg=...
>>>
>>> If this could be extended to arm64, I think we'd be in good shape (and
>>> it could be trivially detected at build time).
>>>
>>
>> I'm not entirely sure what the point is of specifying the name of the
>> symbol on the command line. It is ultimately up to the GCC developers
>> to decide how much point there is to maintaining parity with x86 here.
>>
>> [...]
>>>> Ramana indicated at the time that he would be up for adding, e.g.,
>>>> -fstack-protector-linux-kernel as a command line option, and add the
>>>> contents of tpidr_el1 to every reference of __stack_chk_guard when
>>>> set.

Wow, that was a hall-way conversation eons ago. It took me a while to
page that in.


>>>
>>> I think we want to reuse the command-line names from the x86 options
>>> above, unless there's a good reason not to?
>>
>> I'm perfectly happy to settle for whatever the GCC developers manage
>> to agree on, as long as it gives us the ability to use tpidr_el1 as
>> the offset.
> 
> Ramana, Uro?, what's the best next step? Should we open a GCC bug
> specifically for arm64 here?

The next best step is someone opening a GCC feature request with some
more details - CC'ing me on ramana at gcc.gnu.org should work. What I would
like to see is a feature request on GCC bugzilla along with some
comments / buyin from the AArch64 kernel maintainers whether they would
like to see such a feature and what the behaviour should be and get some
feedback from the AArch64 GCC maintainers upstream before starting the
work.

I think the following (unoptimized) or some derivative of this should work.

        adrp    x19, __stack_chk_guard
        add     x19, x19, :lo12:__stack_chk_guard
	mrs 	x2, tpidr_el1
	add	x2, x2, x19
	ldr 	x2, [x2]


It's also not likely that this will be done in time for GCC 8 as we are
now in stage 4 and we're probably looking at GCC 9 for this assuming
this goes ahead.

Hope this helps.

regards
Ramana

> 
> -Kees
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-18 10:26       ` Ramana Radhakrishnan
@ 2018-01-18 13:19         ` Ard Biesheuvel
  2018-01-22 10:59           ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-18 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 January 2018 at 10:26, Ramana Radhakrishnan
<ramana.radhakrishnan@arm.com> wrote:
> On 1/17/18 8:45 PM, Kees Cook wrote:
>> On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote:
>>>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
>>>> <ard.biesheuvel@linaro.org> wrote:
>>> [...]
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
>>>> which was done for x86 only, and provides both:
>>>>   -mstack-protector-guard-symbol=...
>>>>   -mstack-protector-guard-reg=...
>>>>
>>>> If this could be extended to arm64, I think we'd be in good shape (and
>>>> it could be trivially detected at build time).
>>>>
>>>
>>> I'm not entirely sure what the point is of specifying the name of the
>>> symbol on the command line. It is ultimately up to the GCC developers
>>> to decide how much point there is to maintaining parity with x86 here.
>>>
>>> [...]
>>>>> Ramana indicated at the time that he would be up for adding, e.g.,
>>>>> -fstack-protector-linux-kernel as a command line option, and add the
>>>>> contents of tpidr_el1 to every reference of __stack_chk_guard when
>>>>> set.
>
> Wow, that was a hall-way conversation eons ago. It took me a while to
> page that in.
>
>
>>>>
>>>> I think we want to reuse the command-line names from the x86 options
>>>> above, unless there's a good reason not to?
>>>
>>> I'm perfectly happy to settle for whatever the GCC developers manage
>>> to agree on, as long as it gives us the ability to use tpidr_el1 as
>>> the offset.
>>
>> Ramana, Uro?, what's the best next step? Should we open a GCC bug
>> specifically for arm64 here?
>
> The next best step is someone opening a GCC feature request with some
> more details - CC'ing me on ramana at gcc.gnu.org should work. What I would
> like to see is a feature request on GCC bugzilla along with some
> comments / buyin from the AArch64 kernel maintainers whether they would
> like to see such a feature and what the behaviour should be and get some
> feedback from the AArch64 GCC maintainers upstream before starting the
> work.
>
> I think the following (unoptimized) or some derivative of this should work.
>
>         adrp    x19, __stack_chk_guard
>         add     x19, x19, :lo12:__stack_chk_guard
>         mrs     x2, tpidr_el1
>         add     x2, x2, x19
>         ldr     x2, [x2]
>
>
> It's also not likely that this will be done in time for GCC 8 as we are
> now in stage 4 and we're probably looking at GCC 9 for this assuming
> this goes ahead.
>
> Hope this helps.
>

Thanks Ramana.

I guess there is one snag here: if we get scheduled to another CPU
between the mrs and the ldr, we will load the wrong value. I guess
this is what Mark alluded to in his reference to atomics in the quoted
email thread.

Also, we select between tpidr_el2 and tpidr_el1 at runtime depending
on whether VHE is available.

So I guess it isn't as simple as I thought, unfortunately ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-18 13:19         ` Ard Biesheuvel
@ 2018-01-22 10:59           ` Ard Biesheuvel
  2018-01-22 12:26             ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-22 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 January 2018 at 13:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 18 January 2018 at 10:26, Ramana Radhakrishnan
> <ramana.radhakrishnan@arm.com> wrote:
>> On 1/17/18 8:45 PM, Kees Cook wrote:
>>> On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel
>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>> [...]
>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
>>>>> which was done for x86 only, and provides both:
>>>>>   -mstack-protector-guard-symbol=...
>>>>>   -mstack-protector-guard-reg=...
>>>>>
>>>>> If this could be extended to arm64, I think we'd be in good shape (and
>>>>> it could be trivially detected at build time).
>>>>>
>>>>
>>>> I'm not entirely sure what the point is of specifying the name of the
>>>> symbol on the command line. It is ultimately up to the GCC developers
>>>> to decide how much point there is to maintaining parity with x86 here.
>>>>
>>>> [...]
>>>>>> Ramana indicated at the time that he would be up for adding, e.g.,
>>>>>> -fstack-protector-linux-kernel as a command line option, and add the
>>>>>> contents of tpidr_el1 to every reference of __stack_chk_guard when
>>>>>> set.
>>
>> Wow, that was a hall-way conversation eons ago. It took me a while to
>> page that in.
>>
>>
>>>>>
>>>>> I think we want to reuse the command-line names from the x86 options
>>>>> above, unless there's a good reason not to?
>>>>
>>>> I'm perfectly happy to settle for whatever the GCC developers manage
>>>> to agree on, as long as it gives us the ability to use tpidr_el1 as
>>>> the offset.
>>>
>>> Ramana, Uro?, what's the best next step? Should we open a GCC bug
>>> specifically for arm64 here?
>>
>> The next best step is someone opening a GCC feature request with some
>> more details - CC'ing me on ramana at gcc.gnu.org should work. What I would
>> like to see is a feature request on GCC bugzilla along with some
>> comments / buyin from the AArch64 kernel maintainers whether they would
>> like to see such a feature and what the behaviour should be and get some
>> feedback from the AArch64 GCC maintainers upstream before starting the
>> work.
>>
>> I think the following (unoptimized) or some derivative of this should work.
>>
>>         adrp    x19, __stack_chk_guard
>>         add     x19, x19, :lo12:__stack_chk_guard
>>         mrs     x2, tpidr_el1
>>         add     x2, x2, x19
>>         ldr     x2, [x2]
>>
>>
>> It's also not likely that this will be done in time for GCC 8 as we are
>> now in stage 4 and we're probably looking at GCC 9 for this assuming
>> this goes ahead.
>>
>> Hope this helps.
>>
>
> Thanks Ramana.
>
> I guess there is one snag here: if we get scheduled to another CPU
> between the mrs and the ldr, we will load the wrong value. I guess
> this is what Mark alluded to in his reference to atomics in the quoted
> email thread.
>
> Also, we select between tpidr_el2 and tpidr_el1 at runtime depending
> on whether VHE is available.
>
> So I guess it isn't as simple as I thought, unfortunately ...

OK, so I have done a bit of homework, and I think this shouldn't be
too difficult after all.

I realized that we don't really need per-CPU references to
__stack_chk_guard, given that we already keep the task struct pointer
in sp_el0. This means we should be able to do

mrs   x0, sp_el0
movz  x1, :abs_g0:__stack_chk_guard_tsk_offset
add   x0, x0, x1
ldr   x2, [x0]

where __stack_chk_guard_tsk_offset is exposed via an ELF symbol.
(Analogous to x86, this could be implemented at the GCC level using
arbitrary sysreg and symbol names to remain flexible)

So my next question (mainly to Ramana): would it be possible to emit
the mrs/movz/add sequence above from targetm.stack_protect_guard() in
a way that will allow expand_normal() to expand it in a useful manner
(e.g., as an INDIRECT_REF that can be matched by the memory_operand in
Aarch64's stack_protect_set)? If that is the case, we could implement
this in a GCC plugin in the kernel rather than relying on upstream GCC
changes (and having to commit to this ABI long term)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-22 10:59           ` Ard Biesheuvel
@ 2018-01-22 12:26             ` Kees Cook
  2018-01-22 12:42               ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2018-01-22 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> OK, so I have done a bit of homework, and I think this shouldn't be
> too difficult after all.
>
> I realized that we don't really need per-CPU references to
> __stack_chk_guard, given that we already keep the task struct pointer
> in sp_el0. This means we should be able to do
>
> mrs   x0, sp_el0
> movz  x1, :abs_g0:__stack_chk_guard_tsk_offset
> add   x0, x0, x1
> ldr   x2, [x0]
>
> where __stack_chk_guard_tsk_offset is exposed via an ELF symbol.
> (Analogous to x86, this could be implemented at the GCC level using
> arbitrary sysreg and symbol names to remain flexible)
>
> So my next question (mainly to Ramana): would it be possible to emit
> the mrs/movz/add sequence above from targetm.stack_protect_guard() in
> a way that will allow expand_normal() to expand it in a useful manner
> (e.g., as an INDIRECT_REF that can be matched by the memory_operand in
> Aarch64's stack_protect_set)? If that is the case, we could implement
> this in a GCC plugin in the kernel rather than relying on upstream GCC
> changes (and having to commit to this ABI long term)

Though this would ultimately be implemented in core GCC, yes? (I don't
want to have GCC plugins be a dependency for stack protector...)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-22 12:26             ` Kees Cook
@ 2018-01-22 12:42               ` Ard Biesheuvel
  2018-01-22 16:28                 ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-22 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 January 2018 at 12:26, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> OK, so I have done a bit of homework, and I think this shouldn't be
>> too difficult after all.
>>
>> I realized that we don't really need per-CPU references to
>> __stack_chk_guard, given that we already keep the task struct pointer
>> in sp_el0. This means we should be able to do
>>
>> mrs   x0, sp_el0
>> movz  x1, :abs_g0:__stack_chk_guard_tsk_offset
>> add   x0, x0, x1
>> ldr   x2, [x0]
>>
>> where __stack_chk_guard_tsk_offset is exposed via an ELF symbol.
>> (Analogous to x86, this could be implemented at the GCC level using
>> arbitrary sysreg and symbol names to remain flexible)
>>
>> So my next question (mainly to Ramana): would it be possible to emit
>> the mrs/movz/add sequence above from targetm.stack_protect_guard() in
>> a way that will allow expand_normal() to expand it in a useful manner
>> (e.g., as an INDIRECT_REF that can be matched by the memory_operand in
>> Aarch64's stack_protect_set)? If that is the case, we could implement
>> this in a GCC plugin in the kernel rather than relying on upstream GCC
>> changes (and having to commit to this ABI long term)
>
> Though this would ultimately be implemented in core GCC, yes? (I don't
> want to have GCC plugins be a dependency for stack protector...)
>

Oh absolutely. But being able to expose this more widely early on,
while GCC 9 is still under development, would help prevent cockups
where we end up with a flawed ABI and it's too late to do something
about it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* per-task stack canaries for arm64
  2018-01-22 12:42               ` Ard Biesheuvel
@ 2018-01-22 16:28                 ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2018-01-22 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 January 2018 at 12:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 22 January 2018 at 12:26, Kees Cook <keescook@chromium.org> wrote:
>> On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> OK, so I have done a bit of homework, and I think this shouldn't be
>>> too difficult after all.
>>>
>>> I realized that we don't really need per-CPU references to
>>> __stack_chk_guard, given that we already keep the task struct pointer
>>> in sp_el0. This means we should be able to do
>>>
>>> mrs   x0, sp_el0
>>> movz  x1, :abs_g0:__stack_chk_guard_tsk_offset
>>> add   x0, x0, x1
>>> ldr   x2, [x0]
>>>
>>> where __stack_chk_guard_tsk_offset is exposed via an ELF symbol.
>>> (Analogous to x86, this could be implemented at the GCC level using
>>> arbitrary sysreg and symbol names to remain flexible)
>>>
>>> So my next question (mainly to Ramana): would it be possible to emit
>>> the mrs/movz/add sequence above from targetm.stack_protect_guard() in
>>> a way that will allow expand_normal() to expand it in a useful manner
>>> (e.g., as an INDIRECT_REF that can be matched by the memory_operand in
>>> Aarch64's stack_protect_set)? If that is the case, we could implement
>>> this in a GCC plugin in the kernel rather than relying on upstream GCC
>>> changes (and having to commit to this ABI long term)
>>
>> Though this would ultimately be implemented in core GCC, yes? (I don't
>> want to have GCC plugins be a dependency for stack protector...)
>>
>
> Oh absolutely. But being able to expose this more widely early on,
> while GCC 9 is still under development, would help prevent cockups
> where we end up with a flawed ABI and it's too late to do something
> about it.

OK, so it appears the only feasible way to implement this is via a
backend function, which is what i386 and rs6000 do as well.

Taking that code as an example, I managed to piece together a GCC
patch that uses what is basically the above sequence, and
interestingly, I could just compile the mainline kernel with virtually
no changes (see below), and it just works, using different values for
the stack canary for each task. This would mean we would not need any
build time compiler feature test to enable the functionality, which is
an advantage.

So all we need is agree on how to parameterize this so that we don't
paint ourselves into a corner. Using a system register name and offset
symbol name like x86 does sounds flexible enough, although an absolute
offset value such as __stack_chk_guard_tsk_offset is rather different
from an ordinary symbol reference, given that we cannot use an
adrp/add pair to load it. Also, the range of the offset decides which
sequence to emit: I am using add with a :lo12: relocation in my patch,
but I think we probably want to avoid limiting ourselves to 4 KB here.

Comments?


Kernel patch
---------------------8<--------------------------
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1303e04110cd..4b639a3c15bb 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -164,5 +164,6 @@ int main(void)
   DEFINE(SDEI_EVENT_INTREGS,   offsetof(struct sdei_registered_event,
interrupted_regs));
   DEFINE(SDEI_EVENT_PRIORITY,  offsetof(struct sdei_registered_event,
priority));
 #endif
+  DEFINE(TSK_STACK_CANARY,     offsetof(struct task_struct, stack_canary));
   return 0;
 }
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index c7fcb232fe47..a4dd7350111a 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -18,6 +18,8 @@
 #ifndef __ASM_IMAGE_H
 #define __ASM_IMAGE_H

+#include <asm/asm-offsets.h>
+
 #ifndef LINKER_SCRIPT
 #error This file should only be included in vmlinux.lds.S
 #endif
@@ -118,4 +120,6 @@ __efistub_screen_info               =
KALLSYMS_HIDE(screen_info);

 #endif

+PROVIDE(__stack_chk_guard_tsk_offset = ABSOLUTE(TSK_STACK_CANARY));
+
 #endif /* __ASM_IMAGE_H */
---------------------8<--------------------------


GCC patch
---------------------8<--------------------------
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 174310c9f1bc..dc4928f1f14c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17208,6 +17208,12 @@ aarch64_select_early_remat_modes (sbitmap modes)
     }
 }

+static tree
+aarch64_init_stack_protect_guard (void)
+{
+  return NULL_TREE;
+}
+
 /* Target-specific selftests.  */

 #if CHECKING_P
@@ -17682,6 +17688,9 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
 #endif /* #if CHECKING_P */

+#undef TARGET_STACK_PROTECT_GUARD
+#define TARGET_STACK_PROTECT_GUARD aarch64_init_stack_protect_guard
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a6ecb3913094..cd591b65d8e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -168,6 +168,7 @@
     UNSPEC_INSR
     UNSPEC_CLASTB
     UNSPEC_FADDA
+    UNSPEC_TSK_STACK_CANARY
 ])

 (define_c_enum "unspecv" [
@@ -5834,6 +5835,15 @@
   DONE;
 })

+(define_insn "aarch64_load_current_stack_canary"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+       (unspec:DI [(const_int 0)] UNSPEC_TSK_STACK_CANARY))]
+  ""
+  "mrs\\t%0, sp_el0\;add\\t%0, %0, :lo12:__stack_chk_guard_tsk_offset"
+  [(set_attr "type" "multiple")
+   (set_attr "length" "8")]
+)
+
 ;; Named patterns for stack smashing protection.
 (define_expand "stack_protect_set"
   [(match_operand 0 "memory_operand")
@@ -5842,6 +5852,11 @@
 {
   machine_mode mode = GET_MODE (operands[0]);

+  rtx reg = gen_reg_rtx (Pmode);
+
+  operands[1] = gen_rtx_MEM (Pmode, reg);
+  emit_insn (gen_aarch64_load_current_stack_canary (reg));
+
   emit_insn ((mode == DImode
              ? gen_stack_protect_set_di
              : gen_stack_protect_set_si) (operands[0], operands[1]));
@@ -5867,6 +5882,11 @@
   rtx result;
   machine_mode mode = GET_MODE (operands[0]);

+  rtx reg = gen_reg_rtx (Pmode);
+
+  operands[1] = gen_rtx_MEM (Pmode, reg);
+  emit_insn (gen_aarch64_load_current_stack_canary (reg));
+
   result = gen_reg_rtx(mode);

   emit_insn ((mode == DImode

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-22 16:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17 18:24 per-task stack canaries for arm64 Ard Biesheuvel
2018-01-17 19:10 ` Kees Cook
2018-01-17 20:32   ` Ard Biesheuvel
2018-01-17 20:45     ` Kees Cook
2018-01-18 10:26       ` Ramana Radhakrishnan
2018-01-18 13:19         ` Ard Biesheuvel
2018-01-22 10:59           ` Ard Biesheuvel
2018-01-22 12:26             ` Kees Cook
2018-01-22 12:42               ` Ard Biesheuvel
2018-01-22 16:28                 ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.