* per-task stack canaries for arm64 @ 2018-01-17 18:24 Ard Biesheuvel 2018-01-17 19:10 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-17 18:24 UTC (permalink / raw) To: linux-arm-kernel Hi all, This is a followup to a discussion I had with Ramana in San Francisco 5 months ago. Apologies for the tardiness. The topic of the discussion was compiler support for per-task stack cookies in the arm64 kernel. From the compiler side, this would simply entail offsetting the address of __stack_chk_guard with value held in tpidr_el1, so we can make it a per-CPU variable. On the kernel side, we would need fairly straight-forward plumbing to detect the compiler support, and switching to a per-CPU variable when supported. Beyond that, we need to update the per-CPU value at context switch time, and perhaps some handling of the initial state when per-CPU variables are initialized. Ramana indicated at the time that he would be up for adding, e.g., -fstack-protector-linux-kernel as a command line option, and add the contents of tpidr_el1 to every reference of __stack_chk_guard when set. Would this be sufficient to implement this from the kernel side? Am I missing anything here? I am missing the cross-arch context entirely, so are there things we should take into account and/or learn from? Comments welcome. ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-17 18:24 per-task stack canaries for arm64 Ard Biesheuvel @ 2018-01-17 19:10 ` Kees Cook 2018-01-17 20:32 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Kees Cook @ 2018-01-17 19:10 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Hi all, > > This is a followup to a discussion I had with Ramana in San Francisco > 5 months ago. Apologies for the tardiness. Link to prior discussion, just for anyone following along: https://lkml.org/lkml/2017/6/27/227 > The topic of the discussion was compiler support for per-task stack > cookies in the arm64 kernel. From the compiler side, this would simply > entail offsetting the address of __stack_chk_guard with value held in > tpidr_el1, so we can make it a per-CPU variable. AIUI, some progress was made on this recently, and is somewhat discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 and spawned this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 which was done for x86 only, and provides both: -mstack-protector-guard-symbol=... -mstack-protector-guard-reg=... If this could be extended to arm64, I think we'd be in good shape (and it could be trivially detected at build time). > On the kernel side, > we would need fairly straight-forward plumbing to detect the compiler > support, and switching to a per-CPU variable when supported. Beyond > that, we need to update the per-CPU value at context switch time, and > perhaps some handling of the initial state when per-CPU variables are > initialized. Right. My mental list has been: We'll need to adjust how __stack_chk_guard is defined (and tweak boot_init_stack_canary()). x86 makes the canary updates in arch/x86/entry/entry_*.S as part of __switch_to's __switch_to_asm. Looks like arm64 would do it in arch/arm64/kernel/entry.S cpu_switch_to? x86 does percpu init in arch/x86/kernel/setup_percpu.c. I'm not sure where this needs to happen in arm64. > Ramana indicated at the time that he would be up for adding, e.g., > -fstack-protector-linux-kernel as a command line option, and add the > contents of tpidr_el1 to every reference of __stack_chk_guard when > set. I think we want to reuse the command-line names from the x86 options above, unless there's a good reason not to? > Would this be sufficient to implement this from the kernel side? Am I > missing anything here? I am missing the cross-arch context entirely, > so are there things we should take into account and/or learn from? > > Comments welcome. I think it's very close, yes. Thanks for poking at this! -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-17 19:10 ` Kees Cook @ 2018-01-17 20:32 ` Ard Biesheuvel 2018-01-17 20:45 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-17 20:32 UTC (permalink / raw) To: linux-arm-kernel On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote: > On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> Hi all, >> >> This is a followup to a discussion I had with Ramana in San Francisco >> 5 months ago. Apologies for the tardiness. > > Link to prior discussion, just for anyone following along: > https://lkml.org/lkml/2017/6/27/227 > Ah yes, I remember now. We implemented virtually mapped stacks without rearranging the system registers, and so we still use tpidr_el1 for per-cpu offsets. It does seem wise to allow for some flexibility in which register to use as the per-CPU offset, but it doesn't seem likely to me that we'd switch to a GPR to keep the CPU offset >> The topic of the discussion was compiler support for per-task stack >> cookies in the arm64 kernel. From the compiler side, this would simply >> entail offsetting the address of __stack_chk_guard with value held in >> tpidr_el1, so we can make it a per-CPU variable. > > AIUI, some progress was made on this recently, and is somewhat discussed here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 > and spawned this: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 > which was done for x86 only, and provides both: > -mstack-protector-guard-symbol=... > -mstack-protector-guard-reg=... > > If this could be extended to arm64, I think we'd be in good shape (and > it could be trivially detected at build time). > I'm not entirely sure what the point is of specifying the name of the symbol on the command line. It is ultimately up to the GCC developers to decide how much point there is to maintaining parity with x86 here. >> On the kernel side, >> we would need fairly straight-forward plumbing to detect the compiler >> support, and switching to a per-CPU variable when supported. Beyond >> that, we need to update the per-CPU value at context switch time, and >> perhaps some handling of the initial state when per-CPU variables are >> initialized. > > Right. My mental list has been: > > We'll need to adjust how __stack_chk_guard is defined (and tweak > boot_init_stack_canary()). > > x86 makes the canary updates in arch/x86/entry/entry_*.S as part of > __switch_to's __switch_to_asm. Looks like arm64 would do it in > arch/arm64/kernel/entry.S cpu_switch_to? > Yes. SP and the value of the per-cpu var should be updated at the same time, so that is the only place that makes sense. > x86 does percpu init in arch/x86/kernel/setup_percpu.c. I'm not sure > where this needs to happen in arm64. > I'm not sure if there's more to it than ensuring that the value in the percpu template section matches the value for the boot CPU, which should be the case already unless we update it with a random value at early boot. >> Ramana indicated at the time that he would be up for adding, e.g., >> -fstack-protector-linux-kernel as a command line option, and add the >> contents of tpidr_el1 to every reference of __stack_chk_guard when >> set. > > I think we want to reuse the command-line names from the x86 options > above, unless there's a good reason not to? > I'm perfectly happy to settle for whatever the GCC developers manage to agree on, as long as it gives us the ability to use tpidr_el1 as the offset. >> Would this be sufficient to implement this from the kernel side? Am I >> missing anything here? I am missing the cross-arch context entirely, >> so are there things we should take into account and/or learn from? >> >> Comments welcome. > > I think it's very close, yes. Thanks for poking at this! > > -Kees > > -- > Kees Cook > Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-17 20:32 ` Ard Biesheuvel @ 2018-01-17 20:45 ` Kees Cook 2018-01-18 10:26 ` Ramana Radhakrishnan 0 siblings, 1 reply; 10+ messages in thread From: Kees Cook @ 2018-01-17 20:45 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote: >> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: > [...] >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 >> which was done for x86 only, and provides both: >> -mstack-protector-guard-symbol=... >> -mstack-protector-guard-reg=... >> >> If this could be extended to arm64, I think we'd be in good shape (and >> it could be trivially detected at build time). >> > > I'm not entirely sure what the point is of specifying the name of the > symbol on the command line. It is ultimately up to the GCC developers > to decide how much point there is to maintaining parity with x86 here. > > [...] >>> Ramana indicated at the time that he would be up for adding, e.g., >>> -fstack-protector-linux-kernel as a command line option, and add the >>> contents of tpidr_el1 to every reference of __stack_chk_guard when >>> set. >> >> I think we want to reuse the command-line names from the x86 options >> above, unless there's a good reason not to? > > I'm perfectly happy to settle for whatever the GCC developers manage > to agree on, as long as it gives us the ability to use tpidr_el1 as > the offset. Ramana, Uro?, what's the best next step? Should we open a GCC bug specifically for arm64 here? -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-17 20:45 ` Kees Cook @ 2018-01-18 10:26 ` Ramana Radhakrishnan 2018-01-18 13:19 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Ramana Radhakrishnan @ 2018-01-18 10:26 UTC (permalink / raw) To: linux-arm-kernel On 1/17/18 8:45 PM, Kees Cook wrote: > On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote: >>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel >>> <ard.biesheuvel@linaro.org> wrote: >> [...] >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 >>> which was done for x86 only, and provides both: >>> -mstack-protector-guard-symbol=... >>> -mstack-protector-guard-reg=... >>> >>> If this could be extended to arm64, I think we'd be in good shape (and >>> it could be trivially detected at build time). >>> >> >> I'm not entirely sure what the point is of specifying the name of the >> symbol on the command line. It is ultimately up to the GCC developers >> to decide how much point there is to maintaining parity with x86 here. >> >> [...] >>>> Ramana indicated at the time that he would be up for adding, e.g., >>>> -fstack-protector-linux-kernel as a command line option, and add the >>>> contents of tpidr_el1 to every reference of __stack_chk_guard when >>>> set. Wow, that was a hall-way conversation eons ago. It took me a while to page that in. >>> >>> I think we want to reuse the command-line names from the x86 options >>> above, unless there's a good reason not to? >> >> I'm perfectly happy to settle for whatever the GCC developers manage >> to agree on, as long as it gives us the ability to use tpidr_el1 as >> the offset. > > Ramana, Uro?, what's the best next step? Should we open a GCC bug > specifically for arm64 here? The next best step is someone opening a GCC feature request with some more details - CC'ing me on ramana at gcc.gnu.org should work. What I would like to see is a feature request on GCC bugzilla along with some comments / buyin from the AArch64 kernel maintainers whether they would like to see such a feature and what the behaviour should be and get some feedback from the AArch64 GCC maintainers upstream before starting the work. I think the following (unoptimized) or some derivative of this should work. adrp x19, __stack_chk_guard add x19, x19, :lo12:__stack_chk_guard mrs x2, tpidr_el1 add x2, x2, x19 ldr x2, [x2] It's also not likely that this will be done in time for GCC 8 as we are now in stage 4 and we're probably looking at GCC 9 for this assuming this goes ahead. Hope this helps. regards Ramana > > -Kees > ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-18 10:26 ` Ramana Radhakrishnan @ 2018-01-18 13:19 ` Ard Biesheuvel 2018-01-22 10:59 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-18 13:19 UTC (permalink / raw) To: linux-arm-kernel On 18 January 2018 at 10:26, Ramana Radhakrishnan <ramana.radhakrishnan@arm.com> wrote: > On 1/17/18 8:45 PM, Kees Cook wrote: >> On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >>> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote: >>>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel >>>> <ard.biesheuvel@linaro.org> wrote: >>> [...] >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 >>>> which was done for x86 only, and provides both: >>>> -mstack-protector-guard-symbol=... >>>> -mstack-protector-guard-reg=... >>>> >>>> If this could be extended to arm64, I think we'd be in good shape (and >>>> it could be trivially detected at build time). >>>> >>> >>> I'm not entirely sure what the point is of specifying the name of the >>> symbol on the command line. It is ultimately up to the GCC developers >>> to decide how much point there is to maintaining parity with x86 here. >>> >>> [...] >>>>> Ramana indicated at the time that he would be up for adding, e.g., >>>>> -fstack-protector-linux-kernel as a command line option, and add the >>>>> contents of tpidr_el1 to every reference of __stack_chk_guard when >>>>> set. > > Wow, that was a hall-way conversation eons ago. It took me a while to > page that in. > > >>>> >>>> I think we want to reuse the command-line names from the x86 options >>>> above, unless there's a good reason not to? >>> >>> I'm perfectly happy to settle for whatever the GCC developers manage >>> to agree on, as long as it gives us the ability to use tpidr_el1 as >>> the offset. >> >> Ramana, Uro?, what's the best next step? Should we open a GCC bug >> specifically for arm64 here? > > The next best step is someone opening a GCC feature request with some > more details - CC'ing me on ramana at gcc.gnu.org should work. What I would > like to see is a feature request on GCC bugzilla along with some > comments / buyin from the AArch64 kernel maintainers whether they would > like to see such a feature and what the behaviour should be and get some > feedback from the AArch64 GCC maintainers upstream before starting the > work. > > I think the following (unoptimized) or some derivative of this should work. > > adrp x19, __stack_chk_guard > add x19, x19, :lo12:__stack_chk_guard > mrs x2, tpidr_el1 > add x2, x2, x19 > ldr x2, [x2] > > > It's also not likely that this will be done in time for GCC 8 as we are > now in stage 4 and we're probably looking at GCC 9 for this assuming > this goes ahead. > > Hope this helps. > Thanks Ramana. I guess there is one snag here: if we get scheduled to another CPU between the mrs and the ldr, we will load the wrong value. I guess this is what Mark alluded to in his reference to atomics in the quoted email thread. Also, we select between tpidr_el2 and tpidr_el1 at runtime depending on whether VHE is available. So I guess it isn't as simple as I thought, unfortunately ... ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-18 13:19 ` Ard Biesheuvel @ 2018-01-22 10:59 ` Ard Biesheuvel 2018-01-22 12:26 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-22 10:59 UTC (permalink / raw) To: linux-arm-kernel On 18 January 2018 at 13:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 18 January 2018 at 10:26, Ramana Radhakrishnan > <ramana.radhakrishnan@arm.com> wrote: >> On 1/17/18 8:45 PM, Kees Cook wrote: >>> On Wed, Jan 17, 2018 at 12:32 PM, Ard Biesheuvel >>> <ard.biesheuvel@linaro.org> wrote: >>>> On 17 January 2018 at 19:10, Kees Cook <keescook@chromium.org> wrote: >>>>> On Wed, Jan 17, 2018 at 10:24 AM, Ard Biesheuvel >>>>> <ard.biesheuvel@linaro.org> wrote: >>>> [...] >>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 >>>>> which was done for x86 only, and provides both: >>>>> -mstack-protector-guard-symbol=... >>>>> -mstack-protector-guard-reg=... >>>>> >>>>> If this could be extended to arm64, I think we'd be in good shape (and >>>>> it could be trivially detected at build time). >>>>> >>>> >>>> I'm not entirely sure what the point is of specifying the name of the >>>> symbol on the command line. It is ultimately up to the GCC developers >>>> to decide how much point there is to maintaining parity with x86 here. >>>> >>>> [...] >>>>>> Ramana indicated at the time that he would be up for adding, e.g., >>>>>> -fstack-protector-linux-kernel as a command line option, and add the >>>>>> contents of tpidr_el1 to every reference of __stack_chk_guard when >>>>>> set. >> >> Wow, that was a hall-way conversation eons ago. It took me a while to >> page that in. >> >> >>>>> >>>>> I think we want to reuse the command-line names from the x86 options >>>>> above, unless there's a good reason not to? >>>> >>>> I'm perfectly happy to settle for whatever the GCC developers manage >>>> to agree on, as long as it gives us the ability to use tpidr_el1 as >>>> the offset. >>> >>> Ramana, Uro?, what's the best next step? Should we open a GCC bug >>> specifically for arm64 here? >> >> The next best step is someone opening a GCC feature request with some >> more details - CC'ing me on ramana at gcc.gnu.org should work. What I would >> like to see is a feature request on GCC bugzilla along with some >> comments / buyin from the AArch64 kernel maintainers whether they would >> like to see such a feature and what the behaviour should be and get some >> feedback from the AArch64 GCC maintainers upstream before starting the >> work. >> >> I think the following (unoptimized) or some derivative of this should work. >> >> adrp x19, __stack_chk_guard >> add x19, x19, :lo12:__stack_chk_guard >> mrs x2, tpidr_el1 >> add x2, x2, x19 >> ldr x2, [x2] >> >> >> It's also not likely that this will be done in time for GCC 8 as we are >> now in stage 4 and we're probably looking at GCC 9 for this assuming >> this goes ahead. >> >> Hope this helps. >> > > Thanks Ramana. > > I guess there is one snag here: if we get scheduled to another CPU > between the mrs and the ldr, we will load the wrong value. I guess > this is what Mark alluded to in his reference to atomics in the quoted > email thread. > > Also, we select between tpidr_el2 and tpidr_el1 at runtime depending > on whether VHE is available. > > So I guess it isn't as simple as I thought, unfortunately ... OK, so I have done a bit of homework, and I think this shouldn't be too difficult after all. I realized that we don't really need per-CPU references to __stack_chk_guard, given that we already keep the task struct pointer in sp_el0. This means we should be able to do mrs x0, sp_el0 movz x1, :abs_g0:__stack_chk_guard_tsk_offset add x0, x0, x1 ldr x2, [x0] where __stack_chk_guard_tsk_offset is exposed via an ELF symbol. (Analogous to x86, this could be implemented at the GCC level using arbitrary sysreg and symbol names to remain flexible) So my next question (mainly to Ramana): would it be possible to emit the mrs/movz/add sequence above from targetm.stack_protect_guard() in a way that will allow expand_normal() to expand it in a useful manner (e.g., as an INDIRECT_REF that can be matched by the memory_operand in Aarch64's stack_protect_set)? If that is the case, we could implement this in a GCC plugin in the kernel rather than relying on upstream GCC changes (and having to commit to this ABI long term) ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-22 10:59 ` Ard Biesheuvel @ 2018-01-22 12:26 ` Kees Cook 2018-01-22 12:42 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Kees Cook @ 2018-01-22 12:26 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > OK, so I have done a bit of homework, and I think this shouldn't be > too difficult after all. > > I realized that we don't really need per-CPU references to > __stack_chk_guard, given that we already keep the task struct pointer > in sp_el0. This means we should be able to do > > mrs x0, sp_el0 > movz x1, :abs_g0:__stack_chk_guard_tsk_offset > add x0, x0, x1 > ldr x2, [x0] > > where __stack_chk_guard_tsk_offset is exposed via an ELF symbol. > (Analogous to x86, this could be implemented at the GCC level using > arbitrary sysreg and symbol names to remain flexible) > > So my next question (mainly to Ramana): would it be possible to emit > the mrs/movz/add sequence above from targetm.stack_protect_guard() in > a way that will allow expand_normal() to expand it in a useful manner > (e.g., as an INDIRECT_REF that can be matched by the memory_operand in > Aarch64's stack_protect_set)? If that is the case, we could implement > this in a GCC plugin in the kernel rather than relying on upstream GCC > changes (and having to commit to this ABI long term) Though this would ultimately be implemented in core GCC, yes? (I don't want to have GCC plugins be a dependency for stack protector...) -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-22 12:26 ` Kees Cook @ 2018-01-22 12:42 ` Ard Biesheuvel 2018-01-22 16:28 ` Ard Biesheuvel 0 siblings, 1 reply; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-22 12:42 UTC (permalink / raw) To: linux-arm-kernel On 22 January 2018 at 12:26, Kees Cook <keescook@chromium.org> wrote: > On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> OK, so I have done a bit of homework, and I think this shouldn't be >> too difficult after all. >> >> I realized that we don't really need per-CPU references to >> __stack_chk_guard, given that we already keep the task struct pointer >> in sp_el0. This means we should be able to do >> >> mrs x0, sp_el0 >> movz x1, :abs_g0:__stack_chk_guard_tsk_offset >> add x0, x0, x1 >> ldr x2, [x0] >> >> where __stack_chk_guard_tsk_offset is exposed via an ELF symbol. >> (Analogous to x86, this could be implemented at the GCC level using >> arbitrary sysreg and symbol names to remain flexible) >> >> So my next question (mainly to Ramana): would it be possible to emit >> the mrs/movz/add sequence above from targetm.stack_protect_guard() in >> a way that will allow expand_normal() to expand it in a useful manner >> (e.g., as an INDIRECT_REF that can be matched by the memory_operand in >> Aarch64's stack_protect_set)? If that is the case, we could implement >> this in a GCC plugin in the kernel rather than relying on upstream GCC >> changes (and having to commit to this ABI long term) > > Though this would ultimately be implemented in core GCC, yes? (I don't > want to have GCC plugins be a dependency for stack protector...) > Oh absolutely. But being able to expose this more widely early on, while GCC 9 is still under development, would help prevent cockups where we end up with a flawed ABI and it's too late to do something about it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* per-task stack canaries for arm64 2018-01-22 12:42 ` Ard Biesheuvel @ 2018-01-22 16:28 ` Ard Biesheuvel 0 siblings, 0 replies; 10+ messages in thread From: Ard Biesheuvel @ 2018-01-22 16:28 UTC (permalink / raw) To: linux-arm-kernel On 22 January 2018 at 12:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 22 January 2018 at 12:26, Kees Cook <keescook@chromium.org> wrote: >> On Mon, Jan 22, 2018 at 9:59 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >>> OK, so I have done a bit of homework, and I think this shouldn't be >>> too difficult after all. >>> >>> I realized that we don't really need per-CPU references to >>> __stack_chk_guard, given that we already keep the task struct pointer >>> in sp_el0. This means we should be able to do >>> >>> mrs x0, sp_el0 >>> movz x1, :abs_g0:__stack_chk_guard_tsk_offset >>> add x0, x0, x1 >>> ldr x2, [x0] >>> >>> where __stack_chk_guard_tsk_offset is exposed via an ELF symbol. >>> (Analogous to x86, this could be implemented at the GCC level using >>> arbitrary sysreg and symbol names to remain flexible) >>> >>> So my next question (mainly to Ramana): would it be possible to emit >>> the mrs/movz/add sequence above from targetm.stack_protect_guard() in >>> a way that will allow expand_normal() to expand it in a useful manner >>> (e.g., as an INDIRECT_REF that can be matched by the memory_operand in >>> Aarch64's stack_protect_set)? If that is the case, we could implement >>> this in a GCC plugin in the kernel rather than relying on upstream GCC >>> changes (and having to commit to this ABI long term) >> >> Though this would ultimately be implemented in core GCC, yes? (I don't >> want to have GCC plugins be a dependency for stack protector...) >> > > Oh absolutely. But being able to expose this more widely early on, > while GCC 9 is still under development, would help prevent cockups > where we end up with a flawed ABI and it's too late to do something > about it. OK, so it appears the only feasible way to implement this is via a backend function, which is what i386 and rs6000 do as well. Taking that code as an example, I managed to piece together a GCC patch that uses what is basically the above sequence, and interestingly, I could just compile the mainline kernel with virtually no changes (see below), and it just works, using different values for the stack canary for each task. This would mean we would not need any build time compiler feature test to enable the functionality, which is an advantage. So all we need is agree on how to parameterize this so that we don't paint ourselves into a corner. Using a system register name and offset symbol name like x86 does sounds flexible enough, although an absolute offset value such as __stack_chk_guard_tsk_offset is rather different from an ordinary symbol reference, given that we cannot use an adrp/add pair to load it. Also, the range of the offset decides which sequence to emit: I am using add with a :lo12: relocation in my patch, but I think we probably want to avoid limiting ourselves to 4 KB here. Comments? Kernel patch ---------------------8<-------------------------- diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 1303e04110cd..4b639a3c15bb 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -164,5 +164,6 @@ int main(void) DEFINE(SDEI_EVENT_INTREGS, offsetof(struct sdei_registered_event, interrupted_regs)); DEFINE(SDEI_EVENT_PRIORITY, offsetof(struct sdei_registered_event, priority)); #endif + DEFINE(TSK_STACK_CANARY, offsetof(struct task_struct, stack_canary)); return 0; } diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h index c7fcb232fe47..a4dd7350111a 100644 --- a/arch/arm64/kernel/image.h +++ b/arch/arm64/kernel/image.h @@ -18,6 +18,8 @@ #ifndef __ASM_IMAGE_H #define __ASM_IMAGE_H +#include <asm/asm-offsets.h> + #ifndef LINKER_SCRIPT #error This file should only be included in vmlinux.lds.S #endif @@ -118,4 +120,6 @@ __efistub_screen_info = KALLSYMS_HIDE(screen_info); #endif +PROVIDE(__stack_chk_guard_tsk_offset = ABSOLUTE(TSK_STACK_CANARY)); + #endif /* __ASM_IMAGE_H */ ---------------------8<-------------------------- GCC patch ---------------------8<-------------------------- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 174310c9f1bc..dc4928f1f14c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17208,6 +17208,12 @@ aarch64_select_early_remat_modes (sbitmap modes) } } +static tree +aarch64_init_stack_protect_guard (void) +{ + return NULL_TREE; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -17682,6 +17688,9 @@ aarch64_libgcc_floating_mode_supported_p #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests #endif /* #if CHECKING_P */ +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD aarch64_init_stack_protect_guard + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a6ecb3913094..cd591b65d8e6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -168,6 +168,7 @@ UNSPEC_INSR UNSPEC_CLASTB UNSPEC_FADDA + UNSPEC_TSK_STACK_CANARY ]) (define_c_enum "unspecv" [ @@ -5834,6 +5835,15 @@ DONE; }) +(define_insn "aarch64_load_current_stack_canary" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(const_int 0)] UNSPEC_TSK_STACK_CANARY))] + "" + "mrs\\t%0, sp_el0\;add\\t%0, %0, :lo12:__stack_chk_guard_tsk_offset" + [(set_attr "type" "multiple") + (set_attr "length" "8")] +) + ;; Named patterns for stack smashing protection. (define_expand "stack_protect_set" [(match_operand 0 "memory_operand") @@ -5842,6 +5852,11 @@ { machine_mode mode = GET_MODE (operands[0]); + rtx reg = gen_reg_rtx (Pmode); + + operands[1] = gen_rtx_MEM (Pmode, reg); + emit_insn (gen_aarch64_load_current_stack_canary (reg)); + emit_insn ((mode == DImode ? gen_stack_protect_set_di : gen_stack_protect_set_si) (operands[0], operands[1])); @@ -5867,6 +5882,11 @@ rtx result; machine_mode mode = GET_MODE (operands[0]); + rtx reg = gen_reg_rtx (Pmode); + + operands[1] = gen_rtx_MEM (Pmode, reg); + emit_insn (gen_aarch64_load_current_stack_canary (reg)); + result = gen_reg_rtx(mode); emit_insn ((mode == DImode ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-01-22 16:28 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-17 18:24 per-task stack canaries for arm64 Ard Biesheuvel 2018-01-17 19:10 ` Kees Cook 2018-01-17 20:32 ` Ard Biesheuvel 2018-01-17 20:45 ` Kees Cook 2018-01-18 10:26 ` Ramana Radhakrishnan 2018-01-18 13:19 ` Ard Biesheuvel 2018-01-22 10:59 ` Ard Biesheuvel 2018-01-22 12:26 ` Kees Cook 2018-01-22 12:42 ` Ard Biesheuvel 2018-01-22 16:28 ` Ard Biesheuvel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.