RFC v2: post-init-read-only protection for data allocated dynamically

* RFC v2: post-init-read-only protection for data allocated dynamically
@ 2017-05-03 12:06 Igor Stoppa
       [not found] ` <70a9d4db-f374-de45-413b-65b74c59edcb@intel.com>
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Igor Stoppa @ 2017-05-03 12:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, linux-kernel

Hello,

please review my (longish) line of thoughts, below.

I've restructured them so that they should be easier to follow.

Observations
------------

* it is currently possible, by using prefix "__read_only", to have the
linker place a static variable into a special memory region, which will
become write-protected at the end of the init phase.

* the purpose is to write-protect data which is not expected to change,
ever, after it has been initialized.

* The mechanism used for locking down the memory region is to program
the MMU to trap writes to said region. It is fairly efficient and
HW-backed, so it doesn't introduce any major overhead, but the MMU deals
only with pages or supersets of pages, hence the need to collect all the
soon-to-be-read-only data - and only that - into the "special region".
The "__read_only" modifier is the admission ticket.

* the write-protecting feature helps supporting memory integrity in
general and can also help spotting rogue writes, whatever their origin
might be: uninitialized or expired pointers, wrong pointer arithmetic, etc.

Problem
-------

The feature is available only for *static* data - it will not work with
something like a linked list that is put together during init, for example.

Wish
----

My starting point are the policy DB of SE Linux and the LSM Hooks, but
eventually I would like to extend the protection also to other
subsystems, in a way that can be merged into mainline.

Analysis
--------

* the solution I come up with has to be as little invasive as possible,
at least for what concerns the various subsystems whose integrity I want
to enhance.

* In most, if not all, the cases that could be enhanced, the code will
be calling kmalloc/vmalloc, indicating GFP_KERNEL as the desired type of
memory.

* I suspect/hope that the various maintainer won't object too much if my
changes are limited to replacing GFP_KERNEL with some other macro, for
example what I previously called GFP_LOCKABLE, provided I can ensure that:

  -1) no penalty is introduced, at least when the extra protection
      feature is not enabled, iow nobody has to suffer from my changes.
      This means that GFP_LOCKABLE should fall back to GFP_KERNEL, when
      it's not enabled.

  -2) when the extra protection feature is enabled, the code still
      works as expected, as long as the data identified for this
      enhancement is really unmodified after init.

* In my quest for improved memory integrity, I will deal with very
different memory size being allocated, so if I start writing my own
memory allocator, starting from a page-aligned chunk of normal memory,
at best I will end up with a replica of kmalloc, at worst with something
buggy. Either way, it will be extremely harder to push other subsystems
to use it.
I probably wouldn't like it either, if I was a maintainer.

* While I do not strictly need a new memory zone, memory zones are what
kmalloc understands at the moment: AFAIK, it is not possible to tell
kmalloc from which memory pool it should fish out the memory, other than
having a reference to a memory zone.
If it was possible to aim kmalloc at arbitrary memory pools, probably we
would not be having this exchange right now. And probably there would
not be so many other folks trying to have their memory zone of interest
being merged. However I suspect this solution would be sub-optimal for
the normal use cases, because there would be the extra overhead of
passing the reference to the memory pool, instead of encoding it into
bitfields, together with other information.

* there are very slim chances (to be optimistic :) that I can get away
with having my custom zone merged, because others are trying with
similar proposals and they get rejected, so maybe I can have better luck
if I propose something that can also work for others.

* currently memory zones are mapped 1:1 to bits in crowded a bitmask,
but not all these zones are really needed in a typical real system, some
are kept for backward compatibility and supporting distros, which cannot
know upfront the quirks of the HW they will be running on.

Conclusions
-----------

* the solution that seems to be more likely to succeed is to remove the
1:1 mapping between optional zones and their respective bits.

* the bits previously assigned to the optional zones would become
available for mapping whatever zone a system integrator wants to support.

Cons:
There would be still a hard constraint on the maximum number of zones
available simultaneously, so one will have to choose which of the
optional zones to enable, and be ready to deal with own zone
disappearing (ex: fall back to normal memory and give up some/all
functionality)

Pros:
* No bit would go to waste: those who want to have own custom zone could
make a better use of the allocated-but-not-necessary-to-them bits.
* There would be a standard way for people to add non-standard zones.
* It doesn't alter the hot paths that are critical to efficient memory
handling.

So it seems a win-win scenario, apart from the fact that I will probably
have to reshuffle a certain amount of macros :-)

P.S.
There was an early advice of creating and using a custom-made memory
allocator, I hope it's now clear why I don't think it's viable: it might
work if I use it only for further code that I will write, but it really
doesn't seem the best way to convince other subsystem maintainers to
take in my changes, if I suggest them to give up the super optimized
kmalloc (and friends) in favor of some homebrew allocator I wrote :-/

---
thanks, igor

^ permalink raw reply	[flat|nested] 23+ messages in thread