Linux-Security-Module Archive on lore.kernel.org
 help / Atom feed
From: Igor Stoppa <igor.stoppa@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>,
	Matthew Wilcox <willy@infradead.org>,
	Paul Moore <paul@paul-moore.com>,
	Stephen Smalley <sds@tycho.nsa.gov>
Cc: Tycho Andersen <tycho@tycho.ws>,
	Kees Cook <keescook@chromium.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>,
	Dave Chinner <david@fromorbit.com>,
	James Morris <jmorris@namei.org>,
	Michal Hocko <mhocko@kernel.org>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	Igor Stoppa <igor.stoppa@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Laura Abbott <labbott@redhat.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	selinux@vger.kernel.org
Subject: Re: [PATCH 10/17] prmem: documentation
Date: Wed, 31 Oct 2018 21:38:21 +0200
Message-ID: <9f91adeb-0517-3955-6602-49bf6a892593@gmail.com> (raw)
In-Reply-To: <1d8e2d20-ba18-763e-03ff-d061e98d86ff@gmail.com>



On 31/10/2018 11:08, Igor Stoppa wrote:
> Adding SELinux folks and the SELinux ml
> 
> I think it's better if they participate in this discussion.
> 
> On 31/10/2018 06:41, Andy Lutomirski wrote:
>> On Tue, Oct 30, 2018 at 2:36 PM Matthew Wilcox <willy@infradead.org> 
>> wrote:
>>>
>>> On Tue, Oct 30, 2018 at 10:43:14PM +0200, Igor Stoppa wrote:
>>>> On 30/10/2018 21:20, Matthew Wilcox wrote:
>>>>>>> So the API might look something like this:
>>>>>>>
>>>>>>>          void *p = rare_alloc(...);      /* writable pointer */
>>>>>>>          p->a = x;
>>>>>>>          q = rare_protect(p);            /* read-only pointer */
>>>>
>>>> With pools and memory allocated from vmap_areas, I was able to say
>>>>
>>>> protect(pool)
>>>>
>>>> and that would do a swipe on all the pages currently in use.
>>>> In the SELinux policyDB, for example, one doesn't really want to
>>>> individually protect each allocation.
>>>>
>>>> The loading phase happens usually at boot, when the system can be 
>>>> assumed to
>>>> be sane (one might even preload a bare-bone set of rules from 
>>>> initramfs and
>>>> then replace it later on, with the full blown set).
>>>>
>>>> There is no need to process each of these tens of thousands 
>>>> allocations and
>>>> initialization as write-rare.
>>>>
>>>> Would it be possible to do the same here?
>>>
>>> What Andy is proposing effectively puts all rare allocations into
>>> one pool.  Although I suppose it could be generalised to multiple pools
>>> ... one mm_struct per pool.  Andy, what do you think to doing that?
>>
>> Hmm.  Let's see.
>>
>> To clarify some of this thread, I think that the fact that rare_write
>> uses an mm_struct and alias mappings under the hood should be
>> completely invisible to users of the API. 
> 
> I agree.
> 
>> No one should ever be
>> handed a writable pointer to rare_write memory (except perhaps during
>> bootup or when initializing a large complex data structure that will
>> be rare_write but isn't yet, e.g. the policy db).
> 
> The policy db doesn't need to be write rare.
> Actually, it really shouldn't be write rare.
> 
> Maybe it's just a matter of wording, but effectively the policyDB can be 
> trated with this sequence:
> 
> 1) allocate various data structures in writable form
> 
> 2) initialize them
> 
> 3) go back to 1 as needed
> 
> 4) lock down everything that has been allocated, as Read-Only
> The reason why I stress ReadOnly is that differentiating what is really 
> ReadOnly from what is WriteRare provides an extra edge against attacks, 
> because attempts to alter ReadOnly data through a WriteRare API could be 
> detected
> 
> 5) read any part of the policyDB during regular operations
> 
> 6) in case of update, create a temporary new version, using steps 1..3
> 
> 7) if update successful, use the new one and destroy the old one
> 
> 8) if the update failed, destroy the new one
> 
> The destruction at points 7 and 8 is not so much a write operation, as 
> it is a release of the memory.
> 
> So, we might have a bit different interpretation of what write-rare 
> means wrt destroying the memory and its content.
> 
> To clarify: I've been using write-rare to indicate primarily small 
> operations that one would otherwise achieve with "=", memcpy or memset 
> or more complex variants, like atomic ops, rcu pointer assignment, etc.
> 
> Tearing down an entire set of allocations like the policyDB doesn't fit 
> very well with that model.
> 
> The only part which _needs_ to be write rare, in the policyDB, is the 
> set of pointers which are used to access all the dynamically allocated 
> data set.
> 
> These pointers must be updated when a new policyDB is allocated.
> 
>> For example, there could easily be architectures where having a
>> writable alias is problematic.  On such architectures, an entirely
>> different mechanism might work better.  And, if a tool like KNOX ever
>> becomes a *part* of the Linux kernel (hint hint!)
> 
> Something related, albeit not identical is going on here [1]
> Eventually, it could be expanded to deal also with write rare.
> 
>> If you have multiple pools and one mm_struct per pool, you'll need a
>> way to find the mm_struct from a given allocation. 
> 
> Indeed. In my patchset, based on vmas, I do the following:
> * a private field from the page struct points to the vma using that page
> * inside the vma there is alist_head used only during deletion
>    - one pointer is used to chain vmas fro mthe same pool
>    - one pointer points to the pool struct
> * the pool struct has the property to use for all the associated 
> allocations: is it write-rare, read-only, does it auto protect, etc.
> 
>> Regardless of how
>> the mm_structs are set up, changing rare_write memory to normal memory
>> or vice versa will require a global TLB flush (all ASIDs and global
>> pages) on all CPUs, so having extra mm_structs doesn't seem to buy
>> much.
> 
> 1) it supports differnt levels of protection:
>    temporarily unprotected vs read-only vs write-rare
> 
> 2) the change of write permission should be possible only toward more 
> restrictive rules (writable -> write-rare -> read-only) and only to the 
> point that was specified while creating the pool, to avoid DOS attacks, 
> where a write-rare is flipped into read-only and further updates fail
> (ex: prevent IMA from registering modifications to a file, by not 
> letting it store new information - I'm not 100% sure this would work, 
> but it gives the idea, I think)
> 
> 3) being able to track all the allocations related to a pool would allow 
> to perform mass operations, like reducing the writabilty or destroying 
> all the allocations.
> 
>> (It's just possible that changing rare_write back to normal might be
>> able to avoid the flush if the spurious faults can be handled
>> reliably.)
> 
> I do not see the need for such a case of degrading the write permissions 
> of an allocation, unless it refers to the release of a pool of 
> allocations (see updating the SELinux policy DB)

A few more thoughts about pools. Not sure if they are all correct.
Note: I stick to "pool", instead of mm_struct, because what I'll say is 
mostly independent from the implementation.

- As Peter Zijlstra wrote earlier, protecting a target moves the focus 
of the attack to something else. In this case, probably, the "something 
else" would be the metadata of the pool(s).

- The amount of pools needed should be known at compilation time, so the 
metadata used for pools could be statically allocated and any change to 
it could be treated as write-rare.

- only certain fields of a pool structure would be writable, even as 
write-rare, after the pool is initialized.
In case the pool is a mm_struct or a superset (containing also 
additional properties, like the type of writability: RO or WR), the field

struct vm_area_struct *mmap;

is an example of what could be protected. It should be alterable only 
when creating/destroying the pool and making the first initialization.

- to speed up and also improve the validation of the target of a 
write-rare operation, it would be really desirable if the target had 
some intrinsic property which clearly differentiates it from 
non-write-rare memory. Its address, for example. The amount of write 
rare data needed by a full kernel should not exceed a few tens of 
megabytes. On a 64 bit system it shouldn't be so bad to reserve an 
address range maybe one order of magnitude lager than that.
It could even become a parameter for the creation of a pool.
SELinux, for example, should fit within 10-20MB. Or it could be a 
command line parameter.

- even if an hypervisor was present, it would be preferable to use it 
exclusively as extra protection, which triggers an exception only when 
something abnormal happens. The hypervisor should not become aware of 
the actual meaning of kernel (meta)data. Ideally, it would be mostly 
used for trapping unexpected writes to pages which are not supposed to 
be modified.

- one more reason for using pools is that, if each pool was also acting 
as memory cache for its users, attacks relying on use-after-free would 
not have access to possible vulnerability, because the memory and 
addresses associated with a pool would stay with it.

--
igor

  reply index

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23 21:34 [RFC v1 PATCH 00/17] prmem: protected memory Igor Stoppa
2018-10-23 21:34 ` [PATCH 01/17] prmem: linker section for static write rare Igor Stoppa
2018-10-23 21:34 ` [PATCH 02/17] prmem: write rare for static allocation Igor Stoppa
2018-10-25  0:24   ` Dave Hansen
2018-10-29 18:03     ` Igor Stoppa
2018-10-26  9:41   ` Peter Zijlstra
2018-10-29 20:01     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 03/17] prmem: vmalloc support for dynamic allocation Igor Stoppa
2018-10-25  0:26   ` Dave Hansen
2018-10-29 18:07     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 04/17] prmem: " Igor Stoppa
2018-10-23 21:34 ` [PATCH 05/17] prmem: shorthands for write rare on common types Igor Stoppa
2018-10-25  0:28   ` Dave Hansen
2018-10-29 18:12     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 06/17] prmem: test cases for memory protection Igor Stoppa
2018-10-24  3:27   ` Randy Dunlap
2018-10-24 14:24     ` Igor Stoppa
2018-10-25 16:43   ` Dave Hansen
2018-10-29 18:16     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 07/17] prmem: lkdtm tests " Igor Stoppa
2018-10-23 21:34 ` [PATCH 08/17] prmem: struct page: track vmap_area Igor Stoppa
2018-10-24  3:12   ` Matthew Wilcox
2018-10-24 23:01     ` Igor Stoppa
2018-10-25  2:13       ` Matthew Wilcox
2018-10-29 18:21         ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 09/17] prmem: hardened usercopy Igor Stoppa
2018-10-29 11:45   ` Chris von Recklinghausen
2018-10-29 18:24     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 10/17] prmem: documentation Igor Stoppa
2018-10-24  3:48   ` Randy Dunlap
2018-10-24 14:30     ` Igor Stoppa
2018-10-24 23:04   ` Mike Rapoport
2018-10-29 19:05     ` Igor Stoppa
2018-10-26  9:26   ` Peter Zijlstra
2018-10-26 10:20     ` Matthew Wilcox
2018-10-29 19:28       ` Igor Stoppa
2018-10-26 10:46     ` Kees Cook
2018-10-28 18:31       ` Peter Zijlstra
2018-10-29 21:04         ` Igor Stoppa
2018-10-30 15:26           ` Peter Zijlstra
2018-10-30 16:37             ` Kees Cook
2018-10-30 17:06               ` Andy Lutomirski
2018-10-30 17:58                 ` Matthew Wilcox
2018-10-30 18:03                   ` Dave Hansen
2018-10-31  9:18                     ` Peter Zijlstra
2018-10-30 18:28                   ` Tycho Andersen
2018-10-30 19:20                     ` Matthew Wilcox
2018-10-30 20:43                       ` Igor Stoppa
2018-10-30 21:02                         ` Andy Lutomirski
2018-10-30 21:07                           ` Kees Cook
2018-10-30 21:25                             ` Igor Stoppa
2018-10-30 22:15                           ` Igor Stoppa
2018-10-31 10:11                             ` Peter Zijlstra
2018-10-31 20:38                               ` Andy Lutomirski
2018-10-31 20:53                                 ` Andy Lutomirski
2018-10-31  9:45                           ` Peter Zijlstra
2018-10-30 21:35                         ` Matthew Wilcox
2018-10-30 21:49                           ` Igor Stoppa
2018-10-31  4:41                           ` Andy Lutomirski
2018-10-31  9:08                             ` Igor Stoppa
2018-10-31 19:38                               ` Igor Stoppa [this message]
2018-10-31 10:02                             ` Peter Zijlstra
2018-10-31 20:36                               ` Andy Lutomirski
2018-10-31 21:00                                 ` Peter Zijlstra
2018-10-31 22:57                                   ` Andy Lutomirski
2018-10-31 23:10                                     ` Igor Stoppa
2018-10-31 23:19                                       ` Andy Lutomirski
2018-10-31 23:26                                         ` Igor Stoppa
2018-11-01  8:21                                           ` Thomas Gleixner
2018-11-01 15:58                                             ` Igor Stoppa
2018-11-01 17:08                                     ` Peter Zijlstra
2018-10-30 18:51                   ` Andy Lutomirski
2018-10-30 19:14                     ` Kees Cook
2018-10-30 21:25                     ` Matthew Wilcox
2018-10-30 21:55                       ` Igor Stoppa
2018-10-30 22:08                         ` Matthew Wilcox
2018-10-31  9:29                       ` Peter Zijlstra
2018-10-30 23:18                     ` Nadav Amit
2018-10-31  9:08                       ` Peter Zijlstra
2018-11-01 16:31                         ` Nadav Amit
2018-11-02 21:11                           ` Nadav Amit
2018-10-31  9:36                   ` Peter Zijlstra
2018-10-31 11:33                     ` Matthew Wilcox
2018-11-13 14:25                 ` Igor Stoppa
2018-11-13 17:16                   ` Andy Lutomirski
2018-11-13 17:43                     ` Nadav Amit
2018-11-13 17:47                       ` Andy Lutomirski
2018-11-13 18:06                         ` Nadav Amit
2018-11-13 18:31                         ` Igor Stoppa
2018-11-13 18:33                           ` Igor Stoppa
2018-11-13 18:36                             ` Andy Lutomirski
2018-11-13 19:03                               ` Igor Stoppa
2018-11-21 16:34                               ` Igor Stoppa
2018-11-21 17:36                                 ` Nadav Amit
2018-11-21 18:01                                   ` Igor Stoppa
2018-11-21 18:15                                 ` Andy Lutomirski
2018-11-22 19:27                                   ` Igor Stoppa
2018-11-22 20:04                                     ` Matthew Wilcox
2018-11-22 20:53                                       ` Andy Lutomirski
2018-12-04 12:34                                         ` Igor Stoppa
2018-11-13 18:48                           ` Andy Lutomirski
2018-11-13 19:35                             ` Igor Stoppa
2018-11-13 18:26                     ` Igor Stoppa
2018-11-13 18:35                       ` Andy Lutomirski
2018-11-13 19:01                         ` Igor Stoppa
2018-10-31  9:27               ` Igor Stoppa
2018-10-26 11:09     ` Markus Heiser
2018-10-29 19:35       ` Igor Stoppa
2018-10-26 15:05     ` Jonathan Corbet
2018-10-29 19:38       ` Igor Stoppa
2018-10-29 20:35     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 11/17] prmem: llist: use designated initializer Igor Stoppa
2018-10-23 21:34 ` [PATCH 12/17] prmem: linked list: set alignment Igor Stoppa
2018-10-26  9:31   ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 13/17] prmem: linked list: disable layout randomization Igor Stoppa
2018-10-24 13:43   ` Alexey Dobriyan
2018-10-29 19:40     ` Igor Stoppa
2018-10-26  9:32   ` Peter Zijlstra
2018-10-26 10:17     ` Matthew Wilcox
2018-10-30 15:39       ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 14/17] prmem: llist, hlist, both plain and rcu Igor Stoppa
2018-10-24 11:37   ` Mathieu Desnoyers
2018-10-24 14:03     ` Igor Stoppa
2018-10-24 14:56       ` Tycho Andersen
2018-10-24 22:52         ` Igor Stoppa
2018-10-25  8:11           ` Tycho Andersen
2018-10-28  9:52       ` Steven Rostedt
2018-10-29 19:43         ` Igor Stoppa
2018-10-26  9:38   ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 15/17] prmem: test cases for prlist and prhlist Igor Stoppa
2018-10-23 21:35 ` [PATCH 16/17] prmem: pratomic-long Igor Stoppa
2018-10-25  0:13   ` Peter Zijlstra
2018-10-29 21:17     ` Igor Stoppa
2018-10-30 15:58       ` Peter Zijlstra
2018-10-30 16:28         ` Will Deacon
2018-10-31  9:10           ` Peter Zijlstra
2018-11-01  3:28             ` Kees Cook
2018-10-23 21:35 ` [PATCH 17/17] prmem: ima: turn the measurements list write rare Igor Stoppa
2018-10-24 23:03 ` [RFC v1 PATCH 00/17] prmem: protected memory Dave Chinner
2018-10-29 19:47   ` Igor Stoppa

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f91adeb-0517-3955-6602-49bf6a892593@gmail.com \
    --to=igor.stoppa@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=igor.stoppa@huawei.com \
    --cc=jmorris@namei.org \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=labbott@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@kernel.org \
    --cc=paul@paul-moore.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sds@tycho.nsa.gov \
    --cc=selinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=willy@infradead.org \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Security-Module Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-security-module/0 linux-security-module/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-security-module linux-security-module/ https://lore.kernel.org/linux-security-module \
		linux-security-module@vger.kernel.org linux-security-module@archiver.kernel.org
	public-inbox-index linux-security-module


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-security-module


AGPL code for this site: git clone https://public-inbox.org/ public-inbox