From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE7AAC2BC61 for ; Tue, 30 Oct 2018 21:07:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 695DE20827 for ; Tue, 30 Oct 2018 21:07:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="mxS5QgvV" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 695DE20827 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-integrity-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727864AbeJaGC5 (ORCPT ); Wed, 31 Oct 2018 02:02:57 -0400 Received: from mail-yb1-f196.google.com ([209.85.219.196]:37911 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727863AbeJaGC5 (ORCPT ); Wed, 31 Oct 2018 02:02:57 -0400 Received: by mail-yb1-f196.google.com with SMTP id v92-v6so5685731ybi.5 for ; Tue, 30 Oct 2018 14:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=jyiOPOCckzrbIKAshXrPMU0L5xcGklBFrQXLee5VrHU=; b=mxS5QgvVZduXsqT44pfmSvL3VNLgxeTwCignfntI8Cw8waLM7cUr9tzKaulQIdYPBC y0cU84bcvv5mHPjU6sMaLoiVCD11xR+MCWQtRRLNBFiC3gYjgZJs8QxWG+o+aMWZGTId F9O44F7hYnRzDFMUa+x+azpcbUxfCPjboI4tg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=jyiOPOCckzrbIKAshXrPMU0L5xcGklBFrQXLee5VrHU=; b=ZG+p5XgMvqh3EBYwODIyZ8LsRhRz+VO3kcTVI3bxWQUmhWWjxYnmFljzQZB15mSnez QmIFt5BV2xJU0E8yMSm2D1ED6zX1fkVSaRk7l8gS5Cua43HLn7zrI0wz4UlVKTmgq576 09BLTGA5n3WX4hzT0z3Mte9/0ic0h8G61wYVH+wGrceuoh7vwxOvZs8GJm3gDQexvcG/ 4S4PIxTTSt3sgp1kKfuO/anVivzIW2gLQWRGIdIh1mCM+Z2FccxqL4cQ5l0cHRr65vl6 I50UxrRq7k3zerDJp0KjPEHN8Kirxv8rSY8sonlgGVtnCkIMAN3z18Tm0n+jMvwHvkid KoNA== X-Gm-Message-State: AGRZ1gIH4vaVfIStwRa9bMcS+u9WAGQ5U2eQ/4+fdtsz5hbOUePZ0x9W cEPlTNHJdBJWyXTmaCI9oJwnyp4rNyU= X-Google-Smtp-Source: AJdET5dZU/cBgXskAbsZtmqp+jmeGdtd7jbKkw8RrkxDRysjHT9VYzXqm/hueodFZ5rMb5CbaO8bxQ== X-Received: by 2002:a25:4f89:: with SMTP id d131-v6mr356824ybb.379.1540933671396; Tue, 30 Oct 2018 14:07:51 -0700 (PDT) Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com. [209.85.219.175]) by smtp.gmail.com with ESMTPSA id f7-v6sm14568056ywf.108.2018.10.30.14.07.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Oct 2018 14:07:49 -0700 (PDT) Received: by mail-yb1-f175.google.com with SMTP id k132-v6so5690112ybc.2 for ; Tue, 30 Oct 2018 14:07:47 -0700 (PDT) X-Received: by 2002:a25:8409:: with SMTP id u9-v6mr366343ybk.421.1540933667165; Tue, 30 Oct 2018 14:07:47 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a25:3990:0:0:0:0:0 with HTTP; Tue, 30 Oct 2018 14:07:45 -0700 (PDT) In-Reply-To: References: <20181023213504.28905-11-igor.stoppa@huawei.com> <20181026092609.GB3159@worktop.c.hoisthospitality.com> <20181028183126.GB744@hirez.programming.kicks-ass.net> <40cd77ce-f234-3213-f3cb-0c3137c5e201@gmail.com> <20181030152641.GE8177@hirez.programming.kicks-ass.net> <0A7AFB50-9ADE-4E12-B541-EC7839223B65@amacapital.net> <20181030175814.GB10491@bombadil.infradead.org> <20181030182841.GE7343@cisco> <20181030192021.GC10491@bombadil.infradead.org> <9edbdf8b-b5fb-5a82-43b4-b639f5ec8484@gmail.com> From: Kees Cook Date: Tue, 30 Oct 2018 14:07:45 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 10/17] prmem: documentation To: Andy Lutomirski Cc: Igor Stoppa , Matthew Wilcox , Tycho Andersen , Peter Zijlstra , Mimi Zohar , Dave Chinner , James Morris , Michal Hocko , Kernel Hardening , linux-integrity , linux-security-module , Igor Stoppa , Dave Hansen , Jonathan Corbet , Laura Abbott , Randy Dunlap , Mike Rapoport , "open list:DOCUMENTATION" , LKML , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-integrity-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-integrity@vger.kernel.org On Tue, Oct 30, 2018 at 2:02 PM, Andy Lutomirski wrot= e: > > >> On Oct 30, 2018, at 1:43 PM, Igor Stoppa wrote: >> >>> On 30/10/2018 21:20, Matthew Wilcox wrote: >>>> On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote: >>>>> On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote: >>>>> On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote: >>>>>>> On Oct 30, 2018, at 9:37 AM, Kees Cook wrot= e: >>>>>> I support the addition of a rare-write mechanism to the upstream ker= nel. >>>>>> And I think that there is only one sane way to implement it: using a= n >>>>>> mm_struct. That mm_struct, just like any sane mm_struct, should only >>>>>> differ from init_mm in that it has extra mappings in the *user* regi= on. >>>>> >>>>> I'd like to understand this approach a little better. In a syscall p= ath, >>>>> we run with the user task's mm. What you're proposing is that when w= e >>>>> want to modify rare data, we switch to rare_mm which contains a >>>>> writable mapping to all the kernel data which is rare-write. >>>>> >>>>> So the API might look something like this: >>>>> >>>>> void *p =3D rare_alloc(...); /* writable pointer */ >>>>> p->a =3D x; >>>>> q =3D rare_protect(p); /* read-only pointer */ >> >> With pools and memory allocated from vmap_areas, I was able to say >> >> protect(pool) >> >> and that would do a swipe on all the pages currently in use. >> In the SELinux policyDB, for example, one doesn't really want to individ= ually protect each allocation. >> >> The loading phase happens usually at boot, when the system can be assume= d to be sane (one might even preload a bare-bone set of rules from initramf= s and then replace it later on, with the full blown set). >> >> There is no need to process each of these tens of thousands allocations = and initialization as write-rare. >> >> Would it be possible to do the same here? > > I don=E2=80=99t see why not, although getting the API right will be a tad= complicated. > >> >>>>> >>>>> To subsequently modify q, >>>>> >>>>> p =3D rare_modify(q); >>>>> q->a =3D y; >>>> >>>> Do you mean >>>> >>>> p->a =3D y; >>>> >>>> here? I assume the intent is that q isn't writable ever, but that's >>>> the one we have in the structure at rest. >>> Yes, that was my intent, thanks. >>> To handle the list case that Igor has pointed out, you might want to >>> do something like this: >>> list_for_each_entry(x, &xs, entry) { >>> struct foo *writable =3D rare_modify(entry); >> >> Would this mapping be impossible to spoof by other cores? >> > > Indeed. Only the core with the special mm loaded could see it. > > But I dislike allowing regular writes in the protected region. We really = only need four write primitives: > > 1. Just write one value. Call at any time (except NMI). > > 2. Just copy some bytes. Same as (1) but any number of bytes. > > 3,4: Same as 1 and 2 but must be called inside a special rare write regio= n. This is purely an optimization. > > Actually getting a modifiable pointer should be disallowed for two reason= s: > > 1. Some architectures may want to use a special write-different-address-s= pace operation. Heck, x86 could, too: make the actual offset be a secret an= d shove the offset into FSBASE or similar. Then %fs-prefixed writes would d= o the rare writes. > > 2. Alternatively, x86 could set the U bit. Then the actual writes would u= se the uaccess helpers, giving extra protection via SMAP. > > We don=E2=80=99t really want a situation where an unchecked pointer in th= e rare write region completely defeats the mechanism. We still have to deal with certain structures under the write-rare window. For example, see: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=3D= kspp/write-rarely&id=3D60430b4d3b113aae4adab66f8339074986276474 They are wrappers to non-inline functions that have the same sanity-checkin= g. --=20 Kees Cook