archive mirror
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <>
To: Jann Horn <>
Cc: "kernel list" <>,
	"Al Viro" <>,
	"Andy Lutomirski" <>,
	"Arnd Bergmann" <>,
	"Casey Schaufler" <>,
	"Greg Kroah-Hartman" <>,
	"James Morris" <>, "Jann Horn" <>,
	"Jonathan Corbet" <>,
	"Kees Cook" <>,
	"Michael Kerrisk" <>,
	"Mickaël Salaün" <>,
	"Serge E . Hallyn" <>,
	"Shuah Khan" <>,
	"Vincent Dagonneau" <>,
	"Kernel Hardening" <>,
	"Linux API" <>,
	linux-arch <>,,
	linux-fsdevel <>,
	linux-security-module <>,
	"the arch/x86 maintainers" <>
Subject: Re: [RFC PATCH v14 00/10] Landlock LSM
Date: Mon, 30 Mar 2020 20:26:03 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 19/03/2020 22:17, Jann Horn wrote:
> On Thu, Mar 19, 2020 at 5:58 PM Mickaël Salaün <> wrote:
>> On 19/03/2020 00:33, Jann Horn wrote:
>>> On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün <> wrote:
> [...]
>>>> As I understand your proposition, we need to build the required_bits
>>>> when adding a rule or enforcing/merging a ruleset with a domain. The
>>>> issue is that a rule only refers to a struct inode, not a struct path.
>>>> For your proposition to work, we would need to walk through the file
>>>> path when adding a rule to a ruleset, which means that we need to depend
>>>> of the current view of the process (i.e. its mount namespace), and its
>>>> Landlock domain.
>>> I don't see why that is necessary. Why would we have to walk the file
>>> path when adding a rule?
>>>> If the required_bits field is set when the ruleset is
>>>> merged with the domain, it is not possible anymore to walk through the
>>>> corresponding initial file path, which makes the enforcement step too
>>>> late to check for such consistency. The important point is that a
>>>> ruleset/domain doesn't have a notion of file hierarchy, a ruleset is
>>>> only a set of tagged inodes.
>>>> I'm not sure I got your proposition right, though. When and how would
>>>> you generate the required_bits?
>>> Using your terminology:
>>> A domain is a collection of N layers, which are assigned indices 0..N-1.
>>> For each possible access type, a domain has a bitmask containing N
>>> bits that stores which layers control that access type. (Basically a
>>> per-layer version of fs_access_mask.)
>> OK, so there is a bit for each domain, which means that you get a limit
>> of, let's say 64 layers? Knowing that each layer can be created by a
>> standalone application, potentially nested in a bunch of layers, this
>> seems artificially limiting.
> Yes, that is a downside of my approach.
>>> To validate an access, you start by ORing together the bitmasks for
>>> the requested access types; that gives you the required_bits mask,
>>> which lists all layers that want to control the access.
>>> Then you set seen_policy_bits=0, then do the
>>> check_access_path_continue() loop while keeping track of which layers
>>> you've seen with "seen_policy_bits |= access->contributing_policies",
>>> or something like that.
>>> And in the end, you check that seen_policy_bits is a superset of
>>> required_bits - something like `(~seen_policy_bits) & required_bits ==
>>> 0`.
>>> AFAICS to create a new domain from a bunch of layers, you wouldn't
>>> have to do any path walking.
>> Right, I misunderstood your previous email.
>>>> Here is my updated proposition: add a layer level and a depth to each
>>>> rule (once enforced/merged with a domain), and a top layer level for a
>>>> domain. When enforcing a ruleset (i.e. merging a ruleset into the
>>>> current domain), the layer level of a new rule would be the incremented
>>>> top layer level.
>>>> If there is no rule (from this domain) tied to the same
>>>> inode, then the depth of the new rule is 1. However, if there is already
>>>> a rule tied to the same inode and if this rule's layer level is the
>>>> previous top layer level, then the depth and the layer level are both
>>>> incremented and the rule is updated with the new access rights (boolean
>>>> AND).
>>>> The policy looks like this:
>>>> domain top_layer=2
>>>> /a RW policy_bitmask=0x00000003 layer=1 depth=1
>>>> /a/b R policy_bitmask=0x00000002 layer=2 depth=1
>>>> The path walk access check walks through all inodes and start with a
>>>> layer counter equal to the top layer of the current domain. For each
>>>> encountered inode tied to a rule, the access rights are checked and a
>>>> new check ensures that the layer of the matching rule is the same as the
>>>> counter (this may be a merged ruleset containing rules pertaining to the
>>>> same hierarchy, which is fine) or equal to the decremented counter (i.e.
>>>> the path walk just reached the underlying layer). If the path walk
>>>> encounter a rule with a layer strictly less than the counter minus one,
>>>> there is a whole in the layers which means that the ruleset
>>>> hierarchy/subset does not match, and the access must be denied.
>>>> When accessing a file at /private/b/foo for a read access:
>>>> /private/b/foo <no rules>
>>>>   allowed_access=unknown layer_counter=2
>>>> /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1>
>>>>   allowed_access=allowed layer_counter=2
>>>> /private <no rules>
>>>>   allowed_access=allowed layer_counter=2
>>>> / <no rules>
>>>>   allowed_access=allowed layer_counter=2
>>>> Because the layer_counter didn't reach 1, the access request is then denied.
>>>> This proposition enables not to rely on a parent ruleset at first, only
>>>> when enforcing/merging a ruleset with a domain. This also solves the
>>>> issue with multiple inherited/nested rules on the same inode (in which
>>>> case the depth just grows). Moreover, this enables to safely stop the
>>>> path walk as soon as we reach the layer 1.
>>> (FWIW, you could do the same optimization with the seen_policy_bits approach.)
>>> I guess the difference between your proposal and mine is that in my
>>> proposal, the following would work, in effect permitting W access to
>>> /foo/bar/baz (and nothing else)?
>>> first ruleset:
>>>   /foo W
>>> second ruleset:
>>>   /foo/bar/baz W
>>> third ruleset:
>>>   /foo/bar W
>>> whereas in your proposal, IIUC it wouldn't be valid for a new ruleset
>>> to whitelist a superset of what was whitelisted in a previous ruleset?
>> This behavior seems dangerous because a process which sandbox itself to
>> only access /foo/bar W can bypass the restrictions from one of its
>> parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is
>> (most of the time) a different and standalone security policy.
> It isn't actually bypassing the restriction: You still can't actually
> access files like /foo/bar/blah, because a path walk from there
> doesn't encounter any rules from the second ruleset.

Right, this use case is legitimate, e.g. first giving access to
~/Downloads and then another layer giving access to ~/ (because it
doesn't know about the current restriction).

I think that neither my initial approach nor yours fit well, but I found
a new one inspired from both approaches. The first solution I gave, and
since implemented in the v15 [1], can manage 2^31-1 layers but it only
works when refining a security policy *knowing the parent one* (i.e.
refining an access tied to an inode, not a full file hierarchy). Instead
of having a layer level and a layer depth, my new implementation (for
v16) use a layer bitfield for each rule (and ruleset). We still AND
access rights when merging rulesets, but instead of storing the last
layer lever and depth, we set the corresponding bit in the layer
bitfield of the rule. This way we don't consume more memory than the v15
implementation (for 64 layers top, which would be 64 bits * number of
access types with your approach, i.e. between 1KB and 2KB) and Landlock
can properly manage supersets of access rights in nested hierarchies,
whatever their stacking order. However, I don't see another solution to
better handle more than 64 layers than a VLA, but that could come later.


>> To sum up, the bitmask approach doesn't have the notion of layers
>> ordering. It is then not possible to check that a rule comes from a
>> domain which is the direct ancestor of a child's domain. I want each
>> policy/layer to be really nested in the sense that a process sandboxing
>> itself can only add more restriction to itself with regard to its parent
>> domain (and the whole hierarchy). This is a similar approach to
>> seccomp-bpf (with chained filters), except there is almost no overhead
>> to nest several policies/layers together because they are flattened.
>> Using the layer level and depth approach enables to implement this.

      reply	other threads:[~2020-03-30 18:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 16:02 [RFC PATCH v14 00/10] Landlock LSM Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
2020-02-25 20:49   ` Jann Horn
2020-02-26 15:31     ` Mickaël Salaün
2020-02-26 20:24       ` Jann Horn
2020-02-27 16:46         ` Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 02/10] landlock: Add ruleset and domain management Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 03/10] landlock: Set up the security framework and manage credentials Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 04/10] landlock: Add ptrace restrictions Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 05/10] fs,landlock: Support filesystem access-control Mickaël Salaün
2020-02-26 20:29   ` Jann Horn
2020-02-27 16:50     ` Mickaël Salaün
2020-02-27 16:51       ` Jann Horn
2020-02-24 16:02 ` [RFC PATCH v14 06/10] landlock: Add syscall implementation Mickaël Salaün
2020-03-17 16:47   ` Al Viro
2020-03-17 17:51     ` Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 07/10] arch: Wire up landlock() syscall Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 08/10] selftests/landlock: Add initial tests Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 09/10] samples/landlock: Add a sandbox manager example Mickaël Salaün
2020-02-24 16:02 ` [RFC PATCH v14 10/10] landlock: Add user and kernel documentation Mickaël Salaün
2020-02-29 17:23   ` Randy Dunlap
2020-03-02 10:03     ` Mickaël Salaün
2020-02-25 18:49 ` [RFC PATCH v14 00/10] Landlock LSM J Freyensee
2020-02-26 15:34   ` Mickaël Salaün
     [not found] ` <>
2020-02-27 17:01   ` [RFC PATCH v14 01/10] landlock: Add object and rule management Mickaël Salaün
2020-03-09 23:44 ` [RFC PATCH v14 00/10] Landlock LSM Jann Horn
2020-03-11 23:38   ` Mickaël Salaün
2020-03-17 16:19     ` Jann Horn
2020-03-17 17:50       ` Mickaël Salaün
2020-03-17 19:45         ` Jann Horn
2020-03-18 12:06           ` Mickaël Salaün
2020-03-18 23:33             ` Jann Horn
2020-03-19 16:58               ` Mickaël Salaün
2020-03-19 21:17                 ` Jann Horn
2020-03-30 18:26                   ` Mickaël Salaün [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).