From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934831AbcKVWh6 (ORCPT ); Tue, 22 Nov 2016 17:37:58 -0500 Received: from mail-io0-f170.google.com ([209.85.223.170]:36239 "EHLO mail-io0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754825AbcKVWh4 (ORCPT ); Tue, 22 Nov 2016 17:37:56 -0500 MIME-Version: 1.0 In-Reply-To: References: <147977413859.13657.2181994710415174471.stgit@djiang5-desk3.ch.intel.com> <20161122084754.GA25596@gmail.com> From: Kees Cook Date: Tue, 22 Nov 2016 14:37:51 -0800 X-Google-Sender-Auth: ZkR2LEnW-qg54HQ_q4wKb54MtNc Message-ID: Subject: Re: [PATCH] x86: fix kaslr and memmap collision To: Dan Williams Cc: Ingo Molnar , Dave Jiang , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , X86 ML , david , "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 22, 2016 at 11:01 AM, Dan Williams wrote: > On Tue, Nov 22, 2016 at 10:54 AM, Kees Cook wrote: >> On Tue, Nov 22, 2016 at 9:26 AM, Dan Williams wrote: >>> [ replying for Dave since he's offline today and tomorrow ] >>> >>> On Tue, Nov 22, 2016 at 12:47 AM, Ingo Molnar wrote: >>>> >>>> * Dave Jiang wrote: >>>> >>>>> CONFIG_RANDOMIZE_BASE relocates the kernel to a random base address. >>>>> However it does not take into account the memmap= parameter passed in from >>>>> the kernel commandline. >>>> >>>> memmap= parameters are often used as a list. >>>> >>>>> [...] This results in the kernel sometimes being put in the middle of the user >>>>> memmap. [...] >>>> >>>> What does this mean? If memmap= is used to re-define the memory map then the >>>> kernel getting in the middle of a RAM area is what we want, isn't it? What we >>>> don't want is for the kernel to get into reserved areas, right? >>> >>> Right, this is about teaching kaslr to not land the kernel in newly >>> defined reserved regions that were not marked reserved in the initial >>> e820 map from platform firmware. >>> >>>>> [...] Check has been added in the kaslr in order to avoid the region marked by >>>>> memmap. >>>> >>>> What does this mean? >>> >>> Is this clearer? "Update the set of 'mem_avoid' entries to exclude >>> 'memmap=' defined reserved regions from the set of valid address range >>> to land the kernel image." >>> >>>> >>>>> Signed-off-by: Dave Jiang >>>>> --- >>>>> arch/x86/boot/boot.h | 2 ++ >>>>> arch/x86/boot/compressed/kaslr.c | 45 ++++++++++++++++++++++++++++++++++++++ >>>>> arch/x86/boot/string.c | 25 +++++++++++++++++++++ >>>>> 3 files changed, 72 insertions(+) >>>>> >>>>> diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h >>>>> index e5612f3..0d5fe5b 100644 >>>>> --- a/arch/x86/boot/boot.h >>>>> +++ b/arch/x86/boot/boot.h >>>>> @@ -332,6 +332,8 @@ int strncmp(const char *cs, const char *ct, size_t count); >>>>> size_t strnlen(const char *s, size_t maxlen); >>>>> unsigned int atou(const char *s); >>>>> unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base); >>>>> +unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base); >>>>> +long simple_strtol(const char *cp, char **endp, unsigned int base); >>>>> size_t strlen(const char *s); >>>>> >>>>> /* tty.c */ >>>>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c >>>>> index a66854d..6fb8f1ec 100644 >>>>> --- a/arch/x86/boot/compressed/kaslr.c >>>>> +++ b/arch/x86/boot/compressed/kaslr.c >>>>> @@ -11,6 +11,7 @@ >>>>> */ >>>>> #include "misc.h" >>>>> #include "error.h" >>>>> +#include "../boot.h" >>>>> >>>>> #include >>>>> #include >>>>> @@ -61,6 +62,7 @@ enum mem_avoid_index { >>>>> MEM_AVOID_INITRD, >>>>> MEM_AVOID_CMDLINE, >>>>> MEM_AVOID_BOOTPARAMS, >>>>> + MEM_AVOID_MEMMAP, >>>>> MEM_AVOID_MAX, >>>>> }; >>>>> >>>>> @@ -77,6 +79,37 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two) >>>>> return true; >>>>> } >>>>> >>>>> +#include "../../../../lib/cmdline.c" >>>>> + >>>>> +static int >>>>> +parse_memmap(char *p, unsigned long long *start, unsigned long long *size) >>>>> +{ >>>>> + char *oldp; >>>>> + >>>>> + if (!p) >>>>> + return -EINVAL; >>>>> + >>>>> + /* we don't care about this option here */ >>>>> + if (!strncmp(p, "exactmap", 8)) >>>>> + return -EINVAL; >>>>> + >>>>> + oldp = p; >>>>> + *size = memparse(p, &p); >>>>> + if (p == oldp) >>>>> + return -EINVAL; >>>>> + >>>>> + switch (*p) { >>>>> + case '@': >>>>> + case '#': >>>>> + case '$': >>>>> + case '!': >>>>> + *start = memparse(p+1, &p); >>>>> + return 0; >>>>> + } >>>>> + >>>>> + return -EINVAL; >>>>> +} >>>>> + >>>>> /* >>>>> * In theory, KASLR can put the kernel anywhere in the range of [16M, 64T). >>>>> * The mem_avoid array is used to store the ranges that need to be avoided >>>>> @@ -158,6 +191,8 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, >>>>> u64 initrd_start, initrd_size; >>>>> u64 cmd_line, cmd_line_size; >>>>> char *ptr; >>>>> + char arg[38]; >>>> >>>> Where does the magic '38' come from? >>>> >>>>> + unsigned long long memmap_start, memmap_size; >>>>> >>>>> /* >>>>> * Avoid the region that is unsafe to overlap during >>>>> @@ -195,6 +230,16 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, >>>>> add_identity_map(mem_avoid[MEM_AVOID_BOOTPARAMS].start, >>>>> mem_avoid[MEM_AVOID_BOOTPARAMS].size); >>>>> >>>>> + /* see if we have any memmap areas */ >>>>> + if (cmdline_find_option("memmap", arg, sizeof(arg)) > 0) { >>>>> + int rc = parse_memmap(arg, &memmap_start, &memmap_size); >>>>> + >>>>> + if (!rc) { >>>>> + mem_avoid[MEM_AVOID_MEMMAP].start = memmap_start; >>>>> + mem_avoid[MEM_AVOID_MEMMAP].size = memmap_size; >>>>> + } >>>>> + } >>>>> + >>>> >>>> This only handles a single (first) memmap argument, is that sufficient? >>> >>> No, you're right, we need to handle multiple ranges. Since the >>> mem_avoid array is statically allocated perhaps we can handle up to 4 >>> memmap= entries, but past that point disable kaslr for that boot? >> >> Yeah, that seems fine to me. I assume it's rare to have 4? >> > > It should be rare to have *one* since ACPI 6.0 added support for > communicating persistent memory ranges. However there are legacy > nvdimm users that I know are doing at least 2, but I have hard time > imagining they would ever do more than 4. Cool. As long as it announces KASLR being disabled (as in some of the other conditions) that should be fine. -Kees -- Kees Cook Nexus Security