All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KASLR feature to randomize each loadable module
@ 2018-06-20 22:09 Rick Edgecombe
  2018-06-20 22:09 ` [PATCH 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Rick Edgecombe @ 2018-06-20 22:09 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening
  Cc: kristen.c.accardi, dave.hansen, arjan.van.de.ven, Rick Edgecombe

Hi,
This is to add a KASLR feature for stronger randomization for the location of
the text sections of dynamically loaded kernel modules.

Today the RANDOMIZE_BASE feature randomizes the base address where the module
allocations begin with 10 bits of entropy. From here, a highly deterministic
algorithm allocates space for the modules as they are loaded and un-loaded. If
an attacker can predict the order and identities for modules that will be
loaded, then a single text address leak can give the attacker access to the
locations of all the modules.

This patch changes the module loading KASLR algorithm to randomize the position
of each module text section allocation with at least 18 bits of entropy in the
typical case. It used on x86_64 only for now.

Allocation Algorithm
====================
The algorithm evenly breaks the module space in two, a random area and a backup
area. For module text allocations, it first tries to allocate up to 10 randomly
located starting pages inside the random section. If this fails, it will
allocate in the backup area. The backup area base will be offset in the same
way as current algorithm does for the base area, which has 10 bits of entropy.

Randomness and Fragmentation
============================
The advantages of this algorithm over the existing one are higher entropy and
that each module text section is randomized in relation to the other sections,
so that if one location is leaked the location of other sections cannot be
inferred.

However, unlike the existing algorithm, the amount of randomness provided has a
dependency on the number of modules allocated and the sizes of the modules text
sections.

The following estimates are based on simulations done with core section
allocation sizes recorded from all in-tree x86_64 modules, and with a module
space size of 1GB (the size when KASLR is enabled). The entropy provided for the
Nth allocation will come from three sources of randomness, the address picked
for the random area, the probability the section will be allocated in the backup
area and randomness from the number of modules already allocated in the backup
area. For computing a lower bound entropy in the following calculations, the
randomness of the modules already in the backup area, or overlapping from the
random area, is ignored since it is usually small for small numbers of modules
and will only increase the entropy.

For probability of the Nth module being in the backup area, p, a lower bound
entropy estimate is calculated here as:
Entropy = -((1-p)*log2((1-p)/(1073741824/4096)) + p*log2(p/1024))

Nth Modules	Probability Nth in Backup (p<0.01)	Entropy (bits)
200		0.00015658918				18.0009525805
300		0.00061754750				18.0025340517
400		0.00092257674				18.0032512276
500		0.00143354729				18.0041398771
600		0.00199926260				18.0048133611
700		0.00303342527				18.0054763676
800		0.00375362443				18.0056209924
900		0.00449013182				18.0055609282
1000		0.00506372420				18.0053909502
2000		0.01655518527				17.9891937614

For the subclass of control flow attacks, a wrong guess can often crash the
process or even the system if is wrong, so the probability of the first guess
being right can be more important than the Nth guess. KASLR schemes usually have
equal probability for each possible position, but in this scheme that is not the
case. So a more conservative comparison to existing schemes is the amount of
information that would have to be guessed correctly for the position that has
the highest probability for having the Nth module allocated (as that would be
the attackers best guess).

This next table shows the bits that would have to be guessed for a most likely
position for the Nth module, assuming no other address has leaked:

Min Info = MIN(-log2(p/1024), -log2((1-p)/(1073741824/4096)))

Nth Modules	Min Info		Random Area 		Backup Area
200		18.00022592813		18.00022592813		22.64072780584
300		18.00089120792		18.00089120792		20.66116227856
400		18.00133161125		18.00133161125		20.08204345143
500		18.00206965540		18.00206965540		19.44619478537
600		18.00288721335		18.00288721335		18.96631630463
700		18.00438295865		18.00438295865		18.36483651470
800		18.00542552443		18.00542552443		18.05749997547
900		17.79902648177		18.00649247790		17.79902648177
1000		17.62558545623		18.00732396876		17.62558545623
2000		15.91657303366		18.02408399587		15.91657303366

So the defensive strength of this algorithm in typical usage (<800 modules) for
x86_64 should be at least 18 bits, even if an address from the random area
leaks.

If an address from a section in the backup area leaks however, the remaining
information that would have to be guessed is reduced. To get at a lower bound,
the following assumes the address of the leak is the first module in the backup
area and ignores the probability of guessing the identity.

Nth Modules	P of At Least 2 in Backup (p<0.01)	Info (bits)
200		0.00005298177				14.20414443057
300		0.00005298177				14.20414443057
400		0.00034665456				11.49421363374
500		0.00310895422				8.32935491164
600		0.01299838019				6.26552433915
700		0.04042051772				4.62876838940
800		0.09812051823				3.34930133623
900		0.19325547277				2.37141882470
1000		0.32712329132				1.61209361130

So the in typical usage, the entropy will still be decent if an address in the
backup leaks as well.

As for fragmentation, this algorithm reduces the average number of modules that
can be loaded without an allocation failure by about 6% (~17000 to ~16000)
(p<0.05). It can also reduce the largest module executable section that can be
loaded by half to ~500MB in the worst case.

Implementation
==============
This patch adds a new function in vmalloc (__vmalloc_node_try_addr) that tries
to allocate at a specific address. In the x86 module loader, this new vmalloc
function is used to implement the algorithm described above.

The new __vmalloc_node_try_addr function uses the existing function 
__vmalloc_node_range, in order to introduce this algorithm with the least
invasive change. The side effect is that each time there is a collision when
trying to allocate in the random area a TLB flush will be triggered. There is 
a more complex, more efficient implementation that can be used instead if 
there is interest in improving performance.


Rick Edgecombe (3):
  vmalloc: Add __vmalloc_node_try_addr function
  x86/modules: Increase randomization for modules
  vmalloc: Add debugfs modfraginfo

 arch/x86/include/asm/pgtable_64_types.h |   1 +
 arch/x86/kernel/module.c                |  80 +++++++++++++++--
 include/linux/vmalloc.h                 |   3 +
 mm/vmalloc.c                            | 151 +++++++++++++++++++++++++++++++-
 4 files changed, 227 insertions(+), 8 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH RFC V2 0/3] KASLR feature to randomize each loadable module
@ 2018-07-07  0:35 Rick Edgecombe
  2018-07-07  0:35 ` [PATCH 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
  0 siblings, 1 reply; 32+ messages in thread
From: Rick Edgecombe @ 2018-07-07  0:35 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, linux-kernel, linux-mm, kernel-hardening
  Cc: kristen, dave.hansen, arjan, Rick Edgecombe

Hi,

This is v2 of the "KASLR feature to randomize each loadable module" patchset.
The purpose is to increase the randomization and makes the modules randomized
in relation to each other instead of just the base, so that if one module leaks,
the location of the others can't be inferred.

This code needs some refactoring and simplification, but I was hoping to get
some feedback on the benchmarks and provide an update.

V2 brings the TLB flushes down to close to the existing algorithm and increases
the modules that get high randomness based on the concerns raised by Jann Horn
about the BPF JIT use case. It also tries to address Kees Cook's comments about
possible minimal boot time regression by measuring the average allocation time
to be below the existing allocator. It also addresses Mathew Wilcox's comment
on the GFP_NOWARN not being needed. There is also some data on PTE memory use
which is higher than the original algorithm, as suggested by Jann.

This is off of 4.18-RC3.

Changes since v1:
 - New implementation of __vmalloc_node_try_addr based on the
   __vmalloc_node_range implementation, that only flushes TLB when needed.
 - Modified module loading algorithm to try to reduce the TLB flushes further.
 - Increase "random area" tries in order to increase the number of modules that
   can get high randomness.
 - Increase "random area" size to 2/3 of module area in order to increase the
   number of modules that can get high randomness.
 - Fix for 0day failures on other architectures.
 - Fix for wrong debugfs permissions. (thanks to Jann)
 - Spelling fix .(thanks to Jann)
 - Data on module_alloc performance and TLB flushes. (brought up by Kees and
   Jann)
 - Data on memory usage. (suggested by Jann)
 
Todo:
 - Refactor __vmalloc_node_try_addr to be smaller and share more code with the
   normal vmalloc path, and misc cleanup
 - More real world testing other than the synthetic micro benchmarks described
   below. BPF kselftest brought up by Daniel Borkman.

New Algorithm
=============
In addition to __vmalloc_node_try_addr only purging the lazy free areas when it
needs to, it also now supports a mode where it will fail to allocate instead of
doing any purge. In this case it reports when the allocation would have
succeeded if it was allowed to purge. It returns this information via an
ERR_PTR.

The logic for the selection of a location in the random ara is changed as well.
The number of tries is increased from 10 to 10000, which actually still gives
good performance. At a high level, the vmalloc algorithm quickly traverses an
RB-Tree to find a start position and then more slowly traverses a link list to
look for an open spot. Since this new algorithm randomly picks a spot, it
mostly just needs to traverse the RB-Tree, and as a result the "tries" are fast
enough that the number can be high and still be faster than traversing the
linked list. In the data below you can see that the random algorithm is on
average actually faster than the existing one.

The increase in number of tries is also to support the BPF JIT use case, by
increasing the number of modules that can get high randomness.

Since the __vmalloc_node_try_addr now can optionally fail instead of purging,
for the first half of the tries, the algorithm tries to find a spot where it
doesn't need to do a purge. For the second half it allows purges. The 50:50
split is to try to be a happy medium between reducing TLB flushes and reducing
average allocation time.

Randomness
==========
In the last patchset the size of the random area used in the calculations was
incorrect. The entropy should have been closer to 17 bits, not 18, which is why
its lower here even though the number of random area tries is cranked up. 17.3
bits is likely maintained to much higher number of allocations than shown here
in reality, since it seems that the BPF JIT allocations are usually smaller than
modules. If you assume the majority of allocations are 1 page, 17 bits is
maintained to 8000 modules.

Modules		Min Info
1000		17.3
2000		17.3
3000		17.3
4000		17.3
5000		17.08
6000		16.30
7000		15.67
8000		14.92

Allocation Time
===============
The average module_alloc time over N modules was actually always faster with the
random algorithm:

Modules	Existing(ns)	New(ns)
1000	4,761		1,134
2000	9,730		1,149
3000	15,572		1,396
4000	20,723		2,161
5000	26,206		4,349
6000	31,374		8,615
7000	36,123		14,009
8000	40,174		23,396

Average Nth Allocation time was usually better than the existing algorithm,
until the modules get very high.

Module	Original(ns)	New(ns)
1000	8,800		1,288
2000	20,949		1,477
3000	31,980		2,583
4000	44,539		9,250
5000	55,212		25,986
6000	65,968		39,540
7000	74,883		57,798
8000	85,392		97,319

TLB Flushes Per Allocation
==========================
The new algorithm flushes the TLB a little bit more than the existing algorithm.
For the sake of comparison to the old simpler __vmalloc_node_try_addr
implementation, this is about a 238x improvement in some cases.

Modules	Existing	New
1000	0.0014		0.001407
2000	0.001746	0.0018155
3000	0.0018186667	0.0021186667
4000	0.00187525	0.00249675
5000	0.001897	0.0025334
6000	0.0019066667	0.0025228333
7000	0.001925	0.0025315714
8000	0.0019325	0.002553

Memory Usage
============
A downside is that since the random area is fragmented, it uses extra PTE pages.
It first approaches 1.3 MB of PTEs as the random area fills. After that it
increases more slowly. I am not sure this can be improved without reducing
randomness.

Modules	Existing(pages)	New(pages)
100	6		159
200	11		240
300	15		285
400	20		307
500	23		315
1000	41		330
2000	80		335
3000	118		338

Module Capacity
===============
The estimate of module capacity also now goes back up to ~17000, so the real
value should be close to the existing algorithm.


Rick Edgecombe (3):
  vmalloc: Add __vmalloc_node_try_addr function
  x86/modules: Increase randomization for modules
  vmalloc: Add debugfs modfraginfo

 arch/x86/include/asm/pgtable_64_types.h |   1 +
 arch/x86/kernel/module.c                | 103 +++++++++++-
 include/linux/vmalloc.h                 |   3 +
 mm/vmalloc.c                            | 275 +++++++++++++++++++++++++++++++-
 4 files changed, 375 insertions(+), 7 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-07-19  5:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20 22:09 [PATCH 0/3] KASLR feature to randomize each loadable module Rick Edgecombe
2018-06-20 22:09 ` [PATCH 1/3] vmalloc: Add __vmalloc_node_try_addr function Rick Edgecombe
2018-06-20 22:16   ` Randy Dunlap
2018-06-20 22:35     ` Kees Cook
2018-06-20 22:44       ` Randy Dunlap
2018-06-20 23:05         ` Kees Cook
2018-06-20 23:16           ` Randy Dunlap
2018-06-20 22:26   ` Matthew Wilcox
2018-06-21 22:02     ` Edgecombe, Rick P
2018-06-21 22:02       ` Edgecombe, Rick P
2018-06-20 22:09 ` [PATCH 2/3] x86/modules: Increase randomization for modules Rick Edgecombe
2018-06-20 22:09 ` [PATCH 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
2018-06-21  0:53   ` kbuild test robot
2018-06-21  0:53     ` kbuild test robot
2018-06-21  1:17   ` kbuild test robot
2018-06-21  1:17     ` kbuild test robot
2018-06-21 12:32   ` Jann Horn
2018-06-21 18:56     ` Edgecombe, Rick P
2018-06-21 18:56       ` Edgecombe, Rick P
2018-06-20 22:33 ` [PATCH 0/3] KASLR feature to randomize each loadable module Kees Cook
2018-06-21 13:37   ` Jann Horn
2018-06-21 13:39     ` Jann Horn
2018-06-21 18:59     ` Edgecombe, Rick P
2018-06-21 18:59       ` Edgecombe, Rick P
2018-06-21 21:23       ` Daniel Borkmann
2018-06-21 21:23         ` Daniel Borkmann
2018-06-21 21:23         ` Daniel Borkmann
2018-06-21 18:56   ` Edgecombe, Rick P
2018-06-21 18:56     ` Edgecombe, Rick P
2018-07-07  0:35 [PATCH RFC V2 " Rick Edgecombe
2018-07-07  0:35 ` [PATCH 3/3] vmalloc: Add debugfs modfraginfo Rick Edgecombe
2018-07-19  5:24   ` kbuild test robot
2018-07-19  5:24     ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.