From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF7A0C43141 for ; Thu, 21 Jun 2018 17:27:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 911D721884 for ; Thu, 21 Jun 2018 17:27:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linaro.org header.i=@linaro.org header.b="jUqPU68R" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 911D721884 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932978AbeFUR1E (ORCPT ); Thu, 21 Jun 2018 13:27:04 -0400 Received: from mail-it0-f68.google.com ([209.85.214.68]:40878 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932508AbeFUR1D (ORCPT ); Thu, 21 Jun 2018 13:27:03 -0400 Received: by mail-it0-f68.google.com with SMTP id 188-v6so5857286ita.5 for ; Thu, 21 Jun 2018 10:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=M/sCkxHonakF/UgrZBysvMaz1vTxvoog09hsyyFMiSQ=; b=jUqPU68R7/BS+/LfDE/CqjhHHnUSzKeVvdCHOXnTkFbJsR5LAlrU/xx7s3fOLTSXT1 SUQFjBy/LKq2DC6AJzX2sdjBLeMtHUph2n/jnhNhctMV/OHqM6djFrptHNUKM19ph3Uk E12j4sdknUQf1gxsKuckO5B+duQY2Tcgjo5zA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=M/sCkxHonakF/UgrZBysvMaz1vTxvoog09hsyyFMiSQ=; b=L9j22GtU7dlpBKaxGVHt7dIx1TwFrLTN05DPz1R+TVQsExiqbEkL+l/1W1MZmkypMO H5uhJC4mwLe/XX98jWzv6T51QVv4hIeOnXAAvEH0RpbHdijNchq9n/cbj9dHqA0nOwfY OnTsfqbndu9IcDIsl11WpmIO1Jzwswo+jYhJ8Z764FEalf1cWmKr4e/DotUDH9iRyUYo 9XTVgNz8YfCfuklVE7oCjwu/Om8kczqb8Z18fWbRTqjz0QHbYDu+FxbA/csgrAR6y1jJ xoTf5zkAtqPUSr/dFQP6WDrCEnzcfK8ydG95BxshSSxNFA5IeLqHp1BjZu+NjHa1Mm8E WiwA== X-Gm-Message-State: APt69E2sNJbSfC/U4zjZvZhr+Rcmt+9IVhrnuwFqZAorOZ1GKYWaMFO4 BcHAaqLpHswFyoPDTOYYuNn+C3dPgMfJg0fPCy0a0jgf X-Google-Smtp-Source: ADUXVKJ64NijROyB/hqe1NAfabRK+BPN2ysPJ9KfLlkJh5ShlHuxGVbsCmLYwyF5t/qMVxaUrej+tpF8mX3Btpvhxq4= X-Received: by 2002:a24:e105:: with SMTP id n5-v6mr6013588ith.68.1529602022335; Thu, 21 Jun 2018 10:27:02 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:bbc7:0:0:0:0:0 with HTTP; Thu, 21 Jun 2018 10:27:01 -0700 (PDT) In-Reply-To: <67a238a8-e4db-6bf1-6da5-62ca9536191f@arm.com> References: <20180620085755.20045-1-yaojun8558363@gmail.com> <20180620085755.20045-2-yaojun8558363@gmail.com> <20180621025141.GB11276@toy> <70ebe6d7-0745-5606-ae89-8a7b2fb62008@arm.com> <67a238a8-e4db-6bf1-6da5-62ca9536191f@arm.com> From: Ard Biesheuvel Date: Thu, 21 Jun 2018 19:27:01 +0200 Message-ID: Subject: Re: [PATCH 1/1] arm64/mm: move {idmap_pg_dir,tramp_pg_dir,swapper_pg_dir} to .rodata section To: James Morse Cc: Jun Yao , linux-arm-kernel , Catalin Marinas , Will Deacon , Linux Kernel Mailing List , Kernel Hardening Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21 June 2018 at 19:04, James Morse wrote: > Hi Ard, > > On 21/06/18 10:29, Ard Biesheuvel wrote: >> On 21 June 2018 at 10:59, James Morse wrote: >>> On 21/06/18 07:39, Ard Biesheuvel wrote: >>>> On 21 June 2018 at 04:51, Jun Yao wrote: >>>>> On Wed, Jun 20, 2018 at 12:09:49PM +0200, Ard Biesheuvel wrote: >>>>>> On 20 June 2018 at 10:57, Jun Yao wrote: >>>>>>> Move {idmap_pg_dir,tramp_pg_dir,swapper_pg_dir} to .rodata >>>>>>> section. And update the swapper_pg_dir by fixmap. >>>>>> >>>>>> I think we may be able to get away with not mapping idmap_pg_dir and >>>>>> tramp_pg_dir at all. >>>>> >>>>> I think we need to move tramp_pg_dir to .rodata. The attacker can write >>>>> a block-mapping(AP=01) to tramp_pg_dir and then he can access kernel >>>>> memory. >>> >>>> Why does it need to be mapped at all? When do we ever access it from the code? >>> >>> (We would want to make its fixmap entry read-only too) >> >> It already is. > > Sorry, I missed that, > > >>>>>> As for swapper_pg_dir, it would indeed be nice if we could keep those >>>>>> mappings read-only most of the time, but I'm not sure how useful this >>>>>> is if we apply it to the root level only. >>>>> >>>>> The purpose of it is to make 'KSMA' harder, where an single arbitrary >>>>> write is used to add a block mapping to the page-tables, giving the >>>>> attacker full access to kernel memory. That's why we just apply it to >>>>> the root level only. If the attacker can arbitrary write multiple times, >>>>> I think it's hard to defend. >>>> >>>> So the assumption is that the root level is more easy to find? >>>> Otherwise, I'm not sure I understand why being able to write a level 0 >>>> entry is so harmful, given that we don't have block mappings at that >>>> level. >>> >>> I think this thing assumes 3-level page tables with 39bit VA. > >> The attack, you mean? Because this code is unlikely to build with that >> configuration, given that __pgd_populate() BUILD_BUG()s in that case. > > Yes, the attack. (I struggle to think of it as an 'attack' because you already > have arbitrary write...) > OK, so in that case, you can abuse your single arbitrary write to map an entire 1 GB block of memory with arbitrary permissions, allowing userland to take control of the contents, right? And if you know the virtual and physical addresses of swapper_pg_dir, you can make sure this block covers the entire kernel, allowing the attacker to manipulate all core kernel code and statically allocated data structures. What I don't understand about this patch is how it is sufficient to only remap swapper_pg_dir r/w for updates on kernels that use 4 level paging. > >>>>>> @@ -417,12 +421,22 @@ static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start, >>>>>>> >>>>>>> void __init mark_linear_text_alias_ro(void) >>>>>>> { >>> >>>>>>> + size = (unsigned long)__init_begin - (unsigned long)swapper_pg_end; >>>>>>> + update_mapping_prot(__pa_symbol(swapper_pg_end), >>>>>>> + (unsigned long)lm_alias(swapper_pg_end), >>>>>>> + size, PAGE_KERNEL_RO); >>>>>> >>>>>> I don't think this is necessary. Even if some pages are freed, it >>>>>> doesn't harm to keep a read-only alias of them here since the new >>>>>> owner won't access them via this mapping anyway. So we can keep >>>>>> .rodata as a single region. >>>>> >>>>> To be honest, I didn't think of this issue at first. I later found a >>>>> problem when testing the code on qemu: >>>> >>>> OK, you're right. I missed the fact that this operates on the linear >>>> alias, not the kernel mapping itself. >>>> >>>> What I don't like is that we lose the ability to use block mappings >>>> for the entire .rodata section this way. Isn't it possible to move >>>> these pgdirs to the end of the .rodata segment, perhaps by using a >>>> separate input section name and placing that explicitly? We could even >>>> simply forget about freeing those pages, given that [on 4k pages] the >>>> benefit of freeing 12 KB of space is likely to get lost in the >>>> rounding noise anyway [segments are rounded up to 64 KB in size] >>> >>> I assumed that to move swapper_pg_dir into the .rodata section we would need to >>> break it up. Today its ~3 levels, which we setup in head.S, then do a dance in >>> paging_init() so that swapper_pg_dir is always the top level. >>> >>> We could generate all leves of the 'init_pg_dir' in the __initdata section, then >>> copy only the top level into swapper_pg_dir into the rodata section during >>> paging_init(). > >> Is that complexity truly justified for a security sensitive piece of >> code? > > Wouldn't this be less complex? (I've probably explained it badly.) > > Today head.S builds the initial page tables in ~3 levels of swapper_pg_dir, then > during paging_init() build new tables with a temporary top level. > We switch to the temporary top level, then copy over the first level of > swapper_pg_dir, then switch back to swapper_pg_dir. Finally we free the > no-longer-used levels of swapper_pg_dir. > > This looks like re-inventing __initdata for the bits of page table we eventually > free. > > What I tried to describe is building the head.S/initial-page-tables in a > reserved area of the the __initdata section. We no longer need a temporary > top-level, we can build the final page tables directly in swapper_pg_dir, which > means one fewer rounds of cpu_replace_ttbr1(). > Ah fair enough. So either the initial page tables are never referred to via swapper_pg_dir in the first place, or we copy the first level over from __initdata after setting it up (which is probably easier than teaching the asm code about non-consecutive page ranges). So indeed, that would be an improvement in its own right. > >> Can't we just drop the memblock_free() and be done with it? > > That works, I assumed it would be at least frowned on! > I think I prefer your suggestion above. But we do need to teach this code to deal with folded page table levels.