From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754620AbdGNPPl (ORCPT ); Fri, 14 Jul 2017 11:15:41 -0400 Received: from mail-it0-f46.google.com ([209.85.214.46]:36235 "EHLO mail-it0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754269AbdGNPPj (ORCPT ); Fri, 14 Jul 2017 11:15:39 -0400 MIME-Version: 1.0 In-Reply-To: <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> References: <1499898783-25732-7-git-send-email-mark.rutland@arm.com> <20170713104950.GB26194@leverpostej> <20170713161050.GG26194@leverpostej> <20170713175543.GA32528@leverpostej> <20170714103258.GA16128@leverpostej> <20170714140605.GB16687@leverpostej> <188731af-269c-4197-1c55-78e485e7af46@arm.com> <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> From: Ard Biesheuvel Date: Fri, 14 Jul 2017 16:15:37 +0100 Message-ID: Subject: Re: [kernel-hardening] Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP To: Robin Murphy Cc: Mark Rutland , Kees Cook , Kernel Hardening , Catalin Marinas , Will Deacon , "linux-kernel@vger.kernel.org" , James Morse , Takahiro Akashi , Dave Martin , "linux-arm-kernel@lists.infradead.org" , Laura Abbott Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14 July 2017 at 16:03, Robin Murphy wrote: > On 14/07/17 15:39, Robin Murphy wrote: >> On 14/07/17 15:06, Mark Rutland wrote: >>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote: >>>> On 14 July 2017 at 11:48, Ard Biesheuvel wrote: >>>>> On 14 July 2017 at 11:32, Mark Rutland wrote: >>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote: >>> >>>>>>> OK, so here's a crazy idea: what if we >>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks >>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack >>>>>>> size, and map the stack inside it, leaving the remaining space >>>>>>> unmapped >>> >>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate >>>>>> on XZR rather than SP, so to do this we need to get the SP value into a >>>>>> GPR. >>>>>> >>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence >>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs. >>>>>> >>>>>> However, I now realise I was being thick, since we can stash the GPR >>>>>> in the SP: >>>>>> >>>>>> sub sp, sp, x0 // sp = orig_sp - x0 >>>>>> add x0, sp, x0 // x0 = x0 - (orig_sp - x0) == orig_sp >>> >>> That comment is off, and should say x0 = x0 + (orig_sp - x0) == orig_sp >>> >>>>>> sub x0, x0, #S_FRAME_SIZE >>>>>> tb(nz) x0, #THREAD_SHIFT, overflow >>>>>> add x0, x0, #S_FRAME_SIZE >>>>>> sub x0, sp, x0 >>>> >>>> You need a neg x0, x0 here I think >>> >>> Oh, whoops. I'd mis-simplified things. >>> >>> We can avoid that by storing orig_sp + orig_x0 in sp: >>> >>> add sp, sp, x0 // sp = orig_sp + orig_x0 >>> sub x0, sp, x0 // x0 = orig_sp >>> < check > >>> sub x0, sp, x0 // x0 = orig_x0 >> >> Haven't you now forcibly cleared the top bit of x0 thanks to overflow? > > ...or maybe not. I still can't quite see it, but I suppose it must > cancel out somewhere, since Mr. Helpful C Program[1] has apparently > proven me mistaken :( > > I guess that means I approve! > > Robin. > > [1]: > #include > #include > > int main(void) { > for (int i = 0; i < 256; i++) { > for (int j = 0; j < 256; j++) { > uint8_t x = i; > uint8_t y = j; > y = y + x; > x = y - x; > x = y - x; > y = y - x; > assert(x == i && y == j); > } > } > } > Yeah, I think the carry out in the first instruction can be ignored, given that we don't care about the magnitude of the result, only about the lower 64-bits. The subtraction that inverts it will be off by exactly 2^64 From mboxrd@z Thu Jan 1 00:00:00 1970 From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Date: Fri, 14 Jul 2017 16:15:37 +0100 Subject: [kernel-hardening] Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP In-Reply-To: <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> References: <1499898783-25732-7-git-send-email-mark.rutland@arm.com> <20170713104950.GB26194@leverpostej> <20170713161050.GG26194@leverpostej> <20170713175543.GA32528@leverpostej> <20170714103258.GA16128@leverpostej> <20170714140605.GB16687@leverpostej> <188731af-269c-4197-1c55-78e485e7af46@arm.com> <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 14 July 2017 at 16:03, Robin Murphy wrote: > On 14/07/17 15:39, Robin Murphy wrote: >> On 14/07/17 15:06, Mark Rutland wrote: >>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote: >>>> On 14 July 2017 at 11:48, Ard Biesheuvel wrote: >>>>> On 14 July 2017 at 11:32, Mark Rutland wrote: >>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote: >>> >>>>>>> OK, so here's a crazy idea: what if we >>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks >>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack >>>>>>> size, and map the stack inside it, leaving the remaining space >>>>>>> unmapped >>> >>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate >>>>>> on XZR rather than SP, so to do this we need to get the SP value into a >>>>>> GPR. >>>>>> >>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence >>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs. >>>>>> >>>>>> However, I now realise I was being thick, since we can stash the GPR >>>>>> in the SP: >>>>>> >>>>>> sub sp, sp, x0 // sp = orig_sp - x0 >>>>>> add x0, sp, x0 // x0 = x0 - (orig_sp - x0) == orig_sp >>> >>> That comment is off, and should say x0 = x0 + (orig_sp - x0) == orig_sp >>> >>>>>> sub x0, x0, #S_FRAME_SIZE >>>>>> tb(nz) x0, #THREAD_SHIFT, overflow >>>>>> add x0, x0, #S_FRAME_SIZE >>>>>> sub x0, sp, x0 >>>> >>>> You need a neg x0, x0 here I think >>> >>> Oh, whoops. I'd mis-simplified things. >>> >>> We can avoid that by storing orig_sp + orig_x0 in sp: >>> >>> add sp, sp, x0 // sp = orig_sp + orig_x0 >>> sub x0, sp, x0 // x0 = orig_sp >>> < check > >>> sub x0, sp, x0 // x0 = orig_x0 >> >> Haven't you now forcibly cleared the top bit of x0 thanks to overflow? > > ...or maybe not. I still can't quite see it, but I suppose it must > cancel out somewhere, since Mr. Helpful C Program[1] has apparently > proven me mistaken :( > > I guess that means I approve! > > Robin. > > [1]: > #include > #include > > int main(void) { > for (int i = 0; i < 256; i++) { > for (int j = 0; j < 256; j++) { > uint8_t x = i; > uint8_t y = j; > y = y + x; > x = y - x; > x = y - x; > y = y - x; > assert(x == i && y == j); > } > } > } > Yeah, I think the carry out in the first instruction can be ignored, given that we don't care about the magnitude of the result, only about the lower 64-bits. The subtraction that inverts it will be off by exactly 2^64 From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> References: <1499898783-25732-7-git-send-email-mark.rutland@arm.com> <20170713104950.GB26194@leverpostej> <20170713161050.GG26194@leverpostej> <20170713175543.GA32528@leverpostej> <20170714103258.GA16128@leverpostej> <20170714140605.GB16687@leverpostej> <188731af-269c-4197-1c55-78e485e7af46@arm.com> <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com> From: Ard Biesheuvel Date: Fri, 14 Jul 2017 16:15:37 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [kernel-hardening] Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP To: Robin Murphy Cc: Mark Rutland , Kees Cook , Kernel Hardening , Catalin Marinas , Will Deacon , "linux-kernel@vger.kernel.org" , James Morse , Takahiro Akashi , Dave Martin , "linux-arm-kernel@lists.infradead.org" , Laura Abbott List-ID: On 14 July 2017 at 16:03, Robin Murphy wrote: > On 14/07/17 15:39, Robin Murphy wrote: >> On 14/07/17 15:06, Mark Rutland wrote: >>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote: >>>> On 14 July 2017 at 11:48, Ard Biesheuvel wrote: >>>>> On 14 July 2017 at 11:32, Mark Rutland wrote: >>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote: >>> >>>>>>> OK, so here's a crazy idea: what if we >>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks >>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack >>>>>>> size, and map the stack inside it, leaving the remaining space >>>>>>> unmapped >>> >>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate >>>>>> on XZR rather than SP, so to do this we need to get the SP value into a >>>>>> GPR. >>>>>> >>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence >>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs. >>>>>> >>>>>> However, I now realise I was being thick, since we can stash the GPR >>>>>> in the SP: >>>>>> >>>>>> sub sp, sp, x0 // sp = orig_sp - x0 >>>>>> add x0, sp, x0 // x0 = x0 - (orig_sp - x0) == orig_sp >>> >>> That comment is off, and should say x0 = x0 + (orig_sp - x0) == orig_sp >>> >>>>>> sub x0, x0, #S_FRAME_SIZE >>>>>> tb(nz) x0, #THREAD_SHIFT, overflow >>>>>> add x0, x0, #S_FRAME_SIZE >>>>>> sub x0, sp, x0 >>>> >>>> You need a neg x0, x0 here I think >>> >>> Oh, whoops. I'd mis-simplified things. >>> >>> We can avoid that by storing orig_sp + orig_x0 in sp: >>> >>> add sp, sp, x0 // sp = orig_sp + orig_x0 >>> sub x0, sp, x0 // x0 = orig_sp >>> < check > >>> sub x0, sp, x0 // x0 = orig_x0 >> >> Haven't you now forcibly cleared the top bit of x0 thanks to overflow? > > ...or maybe not. I still can't quite see it, but I suppose it must > cancel out somewhere, since Mr. Helpful C Program[1] has apparently > proven me mistaken :( > > I guess that means I approve! > > Robin. > > [1]: > #include > #include > > int main(void) { > for (int i = 0; i < 256; i++) { > for (int j = 0; j < 256; j++) { > uint8_t x = i; > uint8_t y = j; > y = y + x; > x = y - x; > x = y - x; > y = y - x; > assert(x == i && y == j); > } > } > } > Yeah, I think the carry out in the first instruction can be ignored, given that we don't care about the magnitude of the result, only about the lower 64-bits. The subtraction that inverts it will be off by exactly 2^64