From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FCA3C4332E for ; Wed, 18 Mar 2020 23:34:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C596620768 for ; Wed, 18 Mar 2020 23:34:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DrrXM1Tf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727281AbgCRXeM (ORCPT ); Wed, 18 Mar 2020 19:34:12 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:39472 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727020AbgCRXeM (ORCPT ); Wed, 18 Mar 2020 19:34:12 -0400 Received: by mail-oi1-f194.google.com with SMTP id d63so738112oig.6 for ; Wed, 18 Mar 2020 16:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6IKlL+I/c+Hjy4y3Xx8YKRHHyf6Y2ndtzTeKefGZ0X0=; b=DrrXM1TfGXM7JpdPjJELh2Jet/0olmv7sgsJVcrfO7NNAm/C/uyNtme7E3RQnm7e+i 2YJureu5k61DYblkRP+iPdXjAj6EmoxPiSARalPXqDlODHBYpQUrTo+7+LWZQEaGtq4z PWxPkuzUsv3k04l7Z0iBS0WqR/llQh3FVVrQOOulX84B2lD338s2/H/o5WrE5DPjv/Ky ROl0qCPSz4ymdkxl27Z6MsJBAKiP17fV4389dy1Dn5iuD3LMNBb4AGc09RyXd4B7XEHE ySE3qNl/9EKEnQS5sd1SsSKs/oXsgIHSjA6vuGQmMLZGV+j7AlmeM0VBLcPchf3vk+8Q fJag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6IKlL+I/c+Hjy4y3Xx8YKRHHyf6Y2ndtzTeKefGZ0X0=; b=Ft0HloJgvCcVyLu3/6x/2x1Wt8l9J16uSkNSMrXtr1r0iVz2Mbi9sIj/AsaVD/yrjN U9g3R5Ui/xP35v6/0NjfJtjYWi4h0DQwGbdF1d0JpTyRTNXwftRzpV2QBnzjmgSuiy3U NGyhI9rDVqeWlg3iAb5/pXaqCZ4NXtxYXrNhLfIsAk/gRSwFFjC38hG8YrSueGKgY/9q /33ROvlR3TP2dHpgSWZL4M6kK3UDlC/moBoq3Yi7+d628bwJAY62DassZWDQmve6DRVv AS6dh3/QybBn0fE50KBI29eWbUANKkbZ/IxlIH/vhLHiySAc9nBpzAF2WT+0R/4DeEVz 0ZtQ== X-Gm-Message-State: ANhLgQ1DoeclDiqN7gOFU6TeLOmd66qCya09RZpAuAzC0ofiEUB7ONVp v36CY+cciBJUrD1DEXrMts/nFGpADfNs/7hD7qXCuQ== X-Google-Smtp-Source: ADFU+vv8F3Y9wGStW1nBrbFs3yrWKsAXE0IC+0piatRYJ3uczWe5YH3FoAj96UZ0SHVWroel259FmrrZ8VibD0XoKz8= X-Received: by 2002:aca:5e88:: with SMTP id s130mr335083oib.47.1584574450560; Wed, 18 Mar 2020 16:34:10 -0700 (PDT) MIME-Version: 1.0 References: <20200224160215.4136-1-mic@digikod.net> <873d7419-bdd9-8a52-0a9b-dddbe31df4f9@digikod.net> <688dda0f-0907-34eb-c19e-3e9e5f613a74@digikod.net> In-Reply-To: From: Jann Horn Date: Thu, 19 Mar 2020 00:33:44 +0100 Message-ID: Subject: Re: [RFC PATCH v14 00/10] Landlock LSM To: =?UTF-8?B?TWlja2HDq2wgU2FsYcO8bg==?= Cc: kernel list , Al Viro , Andy Lutomirski , Arnd Bergmann , Casey Schaufler , Greg Kroah-Hartman , James Morris , Jann Horn , Jonathan Corbet , Kees Cook , Michael Kerrisk , =?UTF-8?B?TWlja2HDq2wgU2FsYcO8bg==?= , "Serge E . Hallyn" , Shuah Khan , Vincent Dagonneau , Kernel Hardening , Linux API , linux-arch , linux-doc@vger.kernel.org, linux-fsdevel , "open list:KERNEL SELFTEST FRAMEWORK" , linux-security-module , "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 18, 2020 at 1:06 PM Micka=C3=ABl Sala=C3=BCn = wrote: > On 17/03/2020 20:45, Jann Horn wrote: > > On Tue, Mar 17, 2020 at 6:50 PM Micka=C3=ABl Sala=C3=BCn wrote: > >> On 17/03/2020 17:19, Jann Horn wrote: > >>> On Thu, Mar 12, 2020 at 12:38 AM Micka=C3=ABl Sala=C3=BCn wrote: > >>>> On 10/03/2020 00:44, Jann Horn wrote: > >>>>> On Mon, Feb 24, 2020 at 5:03 PM Micka=C3=ABl Sala=C3=BCn wrote: > >> > >> [...] > >> > >>>>> Aside from those things, there is also a major correctness issue wh= ere > >>>>> I'm not sure how to solve it properly: > >>>>> > >>>>> Let's say a process installs a filter on itself like this: > >>>>> > >>>>> struct landlock_attr_ruleset ruleset =3D { .handled_access_fs =3D > >>>>> ACCESS_FS_ROUGHLY_WRITE}; > >>>>> int ruleset_fd =3D landlock(LANDLOCK_CMD_CREATE_RULESET, > >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > >>>>> struct landlock_attr_path_beneath path_beneath =3D { > >>>>> .ruleset_fd =3D ruleset_fd, > >>>>> .allowed_access =3D ACCESS_FS_ROUGHLY_WRITE, > >>>>> .parent_fd =3D open("/tmp/foobar", O_PATH), > >>>>> }; > >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > >>>>> sizeof(path_beneath), &path_beneath); > >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > >>>>> struct landlock_attr_enforce attr_enforce =3D { .ruleset_fd =3D rul= eset_fd }; > >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET= , > >>>>> sizeof(attr_enforce), &attr_enforce); > >>>>> > >>>>> At this point, the process is not supposed to be able to write to > >>>>> anything outside /tmp/foobar, right? But what happens if the proces= s > >>>>> does the following next? > >>>>> > >>>>> struct landlock_attr_ruleset ruleset =3D { .handled_access_fs =3D > >>>>> ACCESS_FS_ROUGHLY_WRITE}; > >>>>> int ruleset_fd =3D landlock(LANDLOCK_CMD_CREATE_RULESET, > >>>>> LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > >>>>> struct landlock_attr_path_beneath path_beneath =3D { > >>>>> .ruleset_fd =3D ruleset_fd, > >>>>> .allowed_access =3D ACCESS_FS_ROUGHLY_WRITE, > >>>>> .parent_fd =3D open("/", O_PATH), > >>>>> }; > >>>>> landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > >>>>> sizeof(path_beneath), &path_beneath); > >>>>> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > >>>>> struct landlock_attr_enforce attr_enforce =3D { .ruleset_fd =3D rul= eset_fd }; > >>>>> landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET= , > >>>>> sizeof(attr_enforce), &attr_enforce); > >>>>> > >>>>> As far as I can tell from looking at the source, after this, you wi= ll > >>>>> have write access to the entire filesystem again. I think the idea = is > >>>>> that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileg= es, > >>>>> not increase them, right? > >>>> > >>>> There is an additionnal check in syscall.c:get_path_from_fd(): it is > >>>> forbidden to add a rule with a path which is not accessible (accordi= ng > >>>> to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open()= , > >>>> but this is definitely not perfect. > >>> > >>> Ah, I missed that. > >>> > >>>>> I think the easy way to fix this would be to add a bitmask to each > >>>>> rule that says from which ruleset it originally comes, and then let > >>>>> check_access_path() collect these bitmasks from each rule with OR, = and > >>>>> check at the end whether the resulting bitmask is full - if not, at > >>>>> least one of the rulesets did not permit the access, and it should = be > >>>>> denied. > >>>>> > >>>>> But maybe it would make more sense to change how the API works > >>>>> instead, and get rid of the concept of "merging" two rulesets > >>>>> together? Instead, we could make the API work like this: > >>>>> > >>>>> - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose > >>>>> ->private_data contains a pointer to the old ruleset of the process= , > >>>>> as well as a pointer to a new empty ruleset. > >>>>> - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be > >>>>> permitted by the old ruleset, then adds the rule to the new ruleset > >>>>> - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in > >>>>> ->private_data doesn't match the current ruleset of the process, th= en > >>>>> replaces the old ruleset with the new ruleset. > >>>>> > >>>>> With this, the new ruleset is guaranteed to be a subset of the old > >>>>> ruleset because each of the new ruleset's rules is permitted by the > >>>>> old ruleset. (Unless the directory hierarchy rotates, but in that c= ase > >>>>> the inaccuracy isn't much worse than what would've been possible > >>>>> through RCU path walk anyway AFAIK.) > >>>>> > >>>>> What do you think? > >>>>> > >>>> > >>>> I would prefer to add the same checks you described at first (with > >>>> check_access_path), but only when creating a new ruleset with > >>>> merge_ruleset() (which should probably be renamed). This enables not= to > >>>> rely on a parent ruleset/domain until the enforcement, which is the = case > >>>> anyway. > >>>> Unfortunately this doesn't work for some cases with bind mounts. Bec= ause > >>>> check_access_path() goes through one path, another (bind mounted) pa= th > >>>> could be illegitimately allowed. > >>> > >>> Hmm... I'm not sure what you mean. At the moment, landlock doesn't > >>> allow any sandboxed process to change the mount hierarchy, right? Can > >>> you give an example where this would go wrong? > >> > >> Indeed, a Landlocked process must no be able to change its mount > >> namespace layout. However, bind mounts may already exist. > >> Let's say a process sandbox itself to only access /a in a read-write > >> way. > > > > So, first policy: > > > > /a RW > > > >> Then, this process (or one of its children) add a new restriction > >> on /a/b to only be able to read this hierarchy. > > > > You mean with the second policy looking like this? > > Right. > > > > > /a RW > > /a/b R > > > > Then the resulting policy would be: > > > > /a RW policy_bitmask=3D0x00000003 (bits 0 and 1 set) > > /a/b R policy_bitmask=3D0x00000002 (bit 1 set) > > required_bits=3D0x00000003 (bits 0 and 1 set) > > > >> The check at insertion > >> time would allow this because this access right is a subset of the > >> access right allowed with the parent directory. However, If /a/b is bi= nd > >> mounted somewhere else, let's say in /private/b, then the second > >> enforcement just gave new access rights to this hierarchy too. > > > > But with the solution I proposed, landlock's path walk would see > > something like this when accessing a file at /private/b/foo: > > /private/b/foo > > policies seen until now: 0x00000000 > > /private/b > > policies seen until now: 0x00000002 > > /private > > policies seen until now: 0x00000002 > > / > > policies seen until now: 0x00000002 > > > > It wouldn't encounter any rule from the first policy, so the OR of the > > seen policy bitmasks would be 0x00000002, which is not the required > > value 0x00000003, and so the access would be denied. > As I understand your proposition, we need to build the required_bits > when adding a rule or enforcing/merging a ruleset with a domain. The > issue is that a rule only refers to a struct inode, not a struct path. > For your proposition to work, we would need to walk through the file > path when adding a rule to a ruleset, which means that we need to depend > of the current view of the process (i.e. its mount namespace), and its > Landlock domain. I don't see why that is necessary. Why would we have to walk the file path when adding a rule? > If the required_bits field is set when the ruleset is > merged with the domain, it is not possible anymore to walk through the > corresponding initial file path, which makes the enforcement step too > late to check for such consistency. The important point is that a > ruleset/domain doesn't have a notion of file hierarchy, a ruleset is > only a set of tagged inodes. > > I'm not sure I got your proposition right, though. When and how would > you generate the required_bits? Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.) To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=3D0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |=3D access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits =3D=3D 0`. AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking. > Here is my updated proposition: add a layer level and a depth to each > rule (once enforced/merged with a domain), and a top layer level for a > domain. When enforcing a ruleset (i.e. merging a ruleset into the > current domain), the layer level of a new rule would be the incremented > top layer level. > If there is no rule (from this domain) tied to the same > inode, then the depth of the new rule is 1. However, if there is already > a rule tied to the same inode and if this rule's layer level is the > previous top layer level, then the depth and the layer level are both > incremented and the rule is updated with the new access rights (boolean > AND). > > The policy looks like this: > domain top_layer=3D2 > /a RW policy_bitmask=3D0x00000003 layer=3D1 depth=3D1 > /a/b R policy_bitmask=3D0x00000002 layer=3D2 depth=3D1 > > The path walk access check walks through all inodes and start with a > layer counter equal to the top layer of the current domain. For each > encountered inode tied to a rule, the access rights are checked and a > new check ensures that the layer of the matching rule is the same as the > counter (this may be a merged ruleset containing rules pertaining to the > same hierarchy, which is fine) or equal to the decremented counter (i.e. > the path walk just reached the underlying layer). If the path walk > encounter a rule with a layer strictly less than the counter minus one, > there is a whole in the layers which means that the ruleset > hierarchy/subset does not match, and the access must be denied. > > When accessing a file at /private/b/foo for a read access: > /private/b/foo > allowed_access=3Dunknown layer_counter=3D2 > /private/b > allowed_access=3Dallowed layer_counter=3D2 > /private > allowed_access=3Dallowed layer_counter=3D2 > / > allowed_access=3Dallowed layer_counter=3D2 > > Because the layer_counter didn't reach 1, the access request is then deni= ed. > > This proposition enables not to rely on a parent ruleset at first, only > when enforcing/merging a ruleset with a domain. This also solves the > issue with multiple inherited/nested rules on the same inode (in which > case the depth just grows). Moreover, this enables to safely stop the > path walk as soon as we reach the layer 1. (FWIW, you could do the same optimization with the seen_policy_bits approac= h.) I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)? first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?